Insights

What is a neural network and how will it shape the future of technology?.

by Omar Mauri

What is a neural network? How do AI models know what you mean when you ask them to produce or interpret images and text? And more to the point, do they really know what you mean at all? We spoke to Omar Mauri, a Full Stack Developer here at Aiimi, about neural networks and Artificial Intelligence. In this blog, he’ll demystify some highly technical stuff, explaining how AI models can be trained to classify images and documents through a type of Machine Learning known as neural networks. He'll also challenge you to a game of quick-draw and create a doodle of an ice-cream cone using 1s and 0s... Over to Omar.

In an age where Artificial Intelligence models can now generate almost photorealistic images based on increasingly complex prompts, it’s easy to forget that computers are actually quite stupid when it comes to the human world. While it does a great job of creating the ‘Star Wars cat on top of a train’ you asked for, the computer that created it has very little knowledge about Star Wars. In fact, it doesn’t know what a cat is, nor has it ever heard of a train.

You may have heard before that computers only understand ones and zeroes and in essence this is true - everything boils down to whether an electrical signal somewhere is on (1) or off (0). So how are they able to simplify complex topics like Star Wars, cats, trains, and anything else we may be requesting, down to binary signals, and still produce such astonishing results? Well, in order to understand this, it may be easier to look at the inverse process and how AI models use neural networks to interpret prompts. But what is a neural network? To answer that question, we first need to look at how AI models are trained to classify images.

Star Wars cat on top of a train
Star Wars cat on top of a train (Dall.E 2)

How are AI models trained to classify images through neural networks?

AI models are now very good at understanding image prompts. You can upload an image to ChatGPT and have it describe it for you pretty accurately. But given its contextual stupidity, it’s actually quite interesting to discover how exactly it’s able to do this. Let’s simplify the example by removing color from the equation. Imagine you want an AI to classify a black doodle on a white background and tell you what it is. In this scenario, we can imagine that in a 64x64-pixel grid, you could assign a 1 to any pixel that has been drawn on, and a 0 to any background pixel left white. In this way, the computer could understand your image as an array of 1s and 0s that is 4096 digits long (check out my doodle of an ice cream cone).

Binary ice cream cone
Doodle of an ice cream cone (as seen by a computer)

This is no random example - Google has created an AI model for doing exactly this and you can give it a try at quickdraw.withgoogle.com. In this game, you have to draw a doodle of a ‘thing’ and keep adding to it until the Google AI recognises it. While it’s quite fun as a game, what you are actually doing each time you draw is helping to train the model, by adding to its large pool of drawings of that prompt, helping it get better at classifying that item next time.

Let’s take a look at another example ­­– calculators. You may be able to look at these images and spot some clear patterns - the rectangle border, the screen at the top and maybe a few buttons. But now try to think about it from a pixel-by-pixel perspective. All the drawings are in slightly different sizes, or rotated, or even just have features drawn in a different place. Would you be able to recognise that these 10 doodles were all of the same thing if you were just given 10 arrays of 1s and 0s, each over 4000 digits long? Probably not, as the variances we just discussed would cause the arrays to be completely different from each other.

Some doodles of calculators
Some doodles of calculators

But this is exactly where computing power has the advantage ­­– its job is not to look for specific pixels in specific places, but to instead look for patterns and similarities that emerge between the pixels themselves. With enough training data (the Quickdraw AI now has over 120,000 examples of calculator doodles), it can find these patterns more accurately and classify the images with much more precision by leveraging neural networks.

So, what is a neural network?

The simple answer is that a neural network is a bunch of inputs and outputs - but it’s what happens between the inputs and outputs that really counts. These are called the hidden layers. The number of inputs and outputs depends on the problem we are trying to solve. For our example of categorising doodles, we would have 4096 input nodes, each with a value of either 0 or 1. Our output layer would then have a node for each possible classification, so ‘calculator’ would be a potential output. Google’s Quickdraw AI can actually identify 330 different doodles, from ‘aircraft carrier’ to ‘zebra’, giving us 330 output nodes.

Quite simply, the goal of the neural network is to take values from the input nodes and use them to assign values to each of the output nodes. The node with the highest value is then selected as the classification. The magic happens in the hidden layers: each node in the hidden layer is simply a calculation, known as an activation function, which takes the input and converts it to a different value in a predefined way. The links between the nodes, meanwhile, each have a weight (w) and a bias (b) associated with them. The link takes the value of the previous node, multiplies it by the weight, and adds the bias before feeding it to the next node. Each node therefore takes the sum of the weighted values of all its links and applies that to its activation function. The process continues through the network until the output layer is reached. You can easily imagine how changing the (w) and (b) value for each link can drastically change the output values, and that is exactly what training the neural network is all about - finding the optimum values for each of those weights so the provided inputs give the expected outputs.

But that’s easier said than done.

Large networks can have hundreds of thousands of links, meaning it can take time and a lot of processing power to produce an output. Each time a network is trained, you take a large quantity of example data with known classifications. In the case of our doodle example, Google uses millions of doodles of each of its 330 different classifications. To be able to train the model, we need to give it an indication of how good a job it has done of classifying each of these training examples. This is where the concept of ‘cost’ comes in: if we normalise the output values between 0 and 1, for each doodle the output values are expected to be 1 for the node with the correct classification and 0 for all the other nodes. In this way, we can simply increase the ‘cost’ value for this network based on how far the real output values are from this expected solution.

An example of a neural network
An example of a neural network

So, all you have to do is pick a set of values for each (w) and (b) in the network and calculate the output values, then calculate the cost value, and repeat this process for each of the millions of items in the training data... Then repeat the whole process while making very small changes to the (w) and (b) values and see if the cost function decreases. We keep repeating the process until the cost value for the network is as low as possible - there are some very clever mechanisms in place to help computers do this in an efficient way, it’s not just random trial and error, but I won’t go into that here. Then each time we have new training data, we repeat the whole process again. I’m sure you are starting to see why computers are much better at this than people.

With enough training, you can make any set of inputs predict a suitable outcome, and this is also the basis for large language models (LLMs) like ChatGPT. An LLM simply predicts one word at a time; its input values are all the previous words in its answer and your prompts, and the output values are simply to select the next word. And the same network could be produced with switched outputs to help AI to generate images.

How can neural networks be used to classify documents?

The strength of using a specifically trained network is discrete classification. Just like the way Google uses its own trained network to determine which of 330 doodles you are drawing, it is possible to leverage these networks to help classify things like business documents. It’s possible to use sentence transformer models to convert entire documents into large vectors (which are essentially just arrays of numbers). So just like my doodle of an ice cream cone was converted into an array of 1s and 0s, it’s possible to convert any document into an array of numbers from -1 to 1 using these models. These vectors can then be used as input values to train a specific model to recognise the patterns in the content of the document and help to classify it.

Once a network has been trained specifically, it can be much more reliable than simply asking an LLM to classify a document. Firstly, although a network built to classify documents is very large, its orders of magnitude are smaller than the network for an LLM, meaning it will run a lot faster. Also, LLMs are trained on a very broad range of data, so training a model on a specific dataset means you will get much more accurate classification results. And finally, the statistical nature of classifying documents in this way will mean the same model will always produce the same classification each time for the same input. The natural variance built into LLMs to feel more conversational means this cannot be guaranteed when using them to classify.

Neural networks have revolutionised the way computers are able to understand and classify complex data from doodles to documents. By breaking down things into discrete input signals and leveraging layers of interconnected nodes, they are able to make classification predictions with astonishing levels of accuracy. And once a document has been discretely classified, it becomes much more efficient to enrich and find entities - for example, through Named Entity Recognition (NER). It’s clear Artificial Intelligence and neural networks will play a major part in shaping the future of technology and its applications in our lives - but for now, I’ve at least got a cool screensaver of a Star Wars cat, riding on a train.

Find out more about how AI can be used to automatically classify data at scale and drive value for business operations with the Aiimi Insight Engine. Get more intel on AI from our in-house data and AI strategy consultants on our blog, from ideal first use cases for AI, to the rise of edge AI, and our latest insights on generative AI.

Stay in the know with updates, articles, and events from Aiimi.

Discover more from Aiimi - we’ll keep you updated with our latest thought leadership, product news, and research reports, direct to your inbox.

You may unsubscribe from these communications at any time. By submitting this form you consent to us processing and storing the information you provide in accordance with our Privacy Policy.


Enjoyed this insight? Share the post with your network.

Read more on Aiimi Blog

Discover our latest data and AI insights, opinions, and news, all in one place.