[Article.Ai] The easiest artificial neural network work tutorial you’ll probably ever find. (revised)
I wrote an artificial neural network from scratch 2 years ago, and at the same time, I didn’t grasp how an artificial neural network actually worked.
But how??
▽
▽
So two years ago, I saw a nice artificial neural network tutorial on youtube by David Miller in c++.
I didn’t know any c++ back then, but java and c++ are somewhat similar looking.
So back then I watched the tutorial, then for a little while I began to think about an overall picture of what was going on.
After 2 weeks or so of thinking about how David’s code was organized back then, I had developed a mental model of the neural network, including all the functions, and all the relevant attributes (Sometimes my memory doesn’t betray me). So, from scratch, I then transcribed what I learnt in the form of java code.
Surprisingly, based on my mental model, the neural net worked well!!
▽
▽
Admittedly, that was not the best way of learning how neural nets actually work. That was not an appropriate mental model to have of neural nets!
I may have gotten the model to nicely work back then, but I can’t say I possessed an intuitive picture of what was going on.
Of course this has now changed, as reflected in my actual neural network discussion throughout this answer on quora.
▽
▽
Turns out to understand things effectively, takes more than actually being able to successfully write an elementary artificial neural network, based on thoughts about how some artificial neural net code implementation may be organized!!
Key thing is, if one has an intuitive understanding, the programming language or the way in which one organizes the code, has little bearing on actually understanding the fundamental mechanism of artificial neural nets!
This means that studying how some code somewhere is organized, likely won’t win you an intuitive grasp of the basic neural network layout. As a result, you may not be motivated to further pursue the exciting field of artificial neural networks, which is a core part of perhaps mankind’s most important task …. primarily because you may lack the intuitive building blocks needed to properly venture beyond the scope of elementary artificial neural networks.
The following answer shall likely equip you with some good intuition. (i.e. Fun Introduction + Detailed Maths)!
▽
▽
▽
▽
This post is for both ‘kids’, and experts! (This feat was not easy to pull off)
I will add a nice extra topping over what is the probably the clearest, and most intuitive short 4-video neural network series on youtube by a channel called “3blue1brown”.
- First I will explain the intuitive extra topping I describe above while avoiding math.
- Then I will attach a youtube link to one of 3blue1brown’s relevant video, while going into a little math that will be easy to grasp, and easier if one watches said video.
- This answer (my extra topping) should make 3blue1brown’s enchantingly clear videos, even clearer!!
Lets say we want to classify some digits from individual pictures of digits (aka enable a model to tell what digits are on pictures it is fed).
Humans find it easy to tell the difference between many creatively written 2’s and 1’s, but this is non trivial for the computer to do. Well, we can make it trivial by thinking about the structure of a simple neural net to do it.
What we want to do is make the neural net learn what the digits are, by adjusting the neural net’s structure based on many inputs which are actually examples of correctly labelled digits.
But why adjust its structure? And what the heck is this structure anyway?
The structure is a collection of data or “parameters” that are simply a way to hold aka store or memorize what the images are, based on labels.
Otherwise where else to store what each picture of a digit means? Certainly not nowhere.
A neural network will have input structure to receive values from the input pictures with digits written on them, a “hidden” structure which acts as an extra way to represent the inputs, and a final output structure to tag or store answers about digit picture.
The more hidden structure there is, is the more opportunity there is to memorize our digits' correct labels.
Neural nets will have layers of each type of structure, and multiple nodes (aka neurons) per layer. Apart from that, there is also more structure connecting each layer, by forming paths from nodes of one layer, to nodes of another layer. These connections are weights. So in the end there are weights and neurons, a set of neurons making up each layer, and weights connecting them all.
What we do is we pass input data such as picture of a digit, in the form of a collection of numbers representing each pixel from the picture, straight to the layer of input receiver nodes/neurons. There is then a memorization process that uses the hidden or extra storage layer of neurons, and then the weights communicate this information to the final layer, which has 10 neurons.
The classification of digits involves 10 possible answers (0−9), and so the neural net will have 10 output neurons, each corresponding to 0 or 1 or 2…up to 9.
The digit classifier will output 1 for any digit it’s thinks its seeing, otherwise 0 because it doesn’t think its seen another number.
I advise you to watch these videos in order (video 1, video 2, video 3, and video 4), whether you don’t yet grasp the overall idea of the neural net, or not. Videos 1 -3 explain the overall idea, then video 4 gets into the work. They belong to the same guy I was telling you about, and are the most intuitive lessons I’ve seen on youtube thus far. I just add the cherry on top here to make it super clearer!
Now let’s discuss video 4, where all the “magic”/calculus happens:
Now, if one payed close attention, one can formulate what is occurring in massive, “plain” neural networks, in the 3 simple parts below:
But how do we know our neural net has “generalized” beyond the training set (aka how do we know whether the neural net actually learnt to detect digits from the input pictures)? We know when we actually try to “test” our neural net. When we are done correcting our weights, w.r.t. to some correctly labelled inputs aka training examples, we can now test our model by taking a value from the output layer, where our answers are stored.
We test our model by showing it a picture of a digit that was not a part the training set, to see if it really did learn (Showing it a picture of a digit in the training set would not entirely reflect if it really did learn…so to be sure, it’s better to expose it to sets of pixels aka images of digits it’s not yet seen…then again, the entire point is to make a digit detector, not just a digit detector on predefined examples of correctly labelled images, but a digit detector overall, that is something that can detect freshly seen digits, in particular, unlabelled digits!!!! Would it make sense if we always showed our artificial neural net only correctly labelled images aka always tell it what each image is representing?? How then would we know it has learnt anyway???? So this is why we test on fresh unlabelled data of unseen pictures of digits).
Now, for an input with a digit 3, if steepest descent aka gradient descent worked out well, although all output neurons aka answer neurons would have a value, the neuron for 3 would have an especially higher value than the other neurons, because the neural net thinks it’s seeing a 3 (i.e. counting from the zeroeth neuron, we find the answer neuron for 3 in the fourth neuron, because the neuron count started at 0).
So to get our final answer, all we do is take the maximum of the set of numbers stored in our output layer! (where each answer/output neuron has a number)
Although all the nuts and bolts of an elementary digit-detection-capable artificial neural net are easy to understand in the long run, the neural net unfortunately consists of thousands of moving parts, so it is perhaps tedious to grasp the whole picture.
▼
As such, the entirety of an elementary yet powerful artificial neural network can be compacted into merely 3 parts:
1) “Part I — Trailing hypersurfaces” & “Training set averages”:
https://i.imgur.com/yCNJo99.png
2) “Part II — Partner neuron sums — An emphasis on “trailing hypersurfaces”:
https://i.imgur.com/yBmDBYT.png
3) “Part III — Error correction — Application of costs from the trailing hypersurfaces”:
https://i.imgur.com/PMohvFy.png
▼
▼
▼
Notably, Part II is merely a way to clarify part II, so basically the neural network is just 2 things:
1) A “hypersurface” computation wrt to some cost function.
2) An application of costs aka “negative gradients” (using the hypersurface computations) to update the neural network structure as it is exposed to more and more training examples. And thus the neural net improves over time. (aka error correction)
That’s it, that’s the infamous back-prop algorithm, together with an entire elementary but powerful artificial neural network!!
Now you’re ready to take on Geoffrey Hinton’s “hard” online course in neural nets!!
Amazon book format: https://www.amazon.com/dp/B077FX57ZZ
Free copy on Quora:
https://www.quora.com/What-is-the-most-intuitive-explanation-of-artificial-neural-networks/answer/Jordan-Bennett-9?srid=Jj6I
Free copy with equations that are nicely coloured differently than their surrounding text content (instead of equations with the same colouring as their surrounding text content)…on research gate:
https://www.researchgate.net/publication/321162382_Artificial_Neural_Nets_For_Kids
Cool youtube video: https://www.youtube.com/watch?v=aP66xxe8z1g
▼
▼
Author:
I am a casual body builder, and software engineer.