So I came up with this idea of starting from scratch, delve into the history of the neural nets field, from the basics until the complex neural network architectures that are being applied today.
This series has two personal challenges for me: filling missing gaps in knowledge (I am pretty sure I don’t know everything), and making sure I can start one idea and taking it to the end without giving up.
I decided to name this chapter as Chapter 0 because in this article I’m only going to explain the first historical steps in the field, why neural networks exist, what are they used for and what are the bridges between neurobiology and computer science.
Neural networksNeural networks are an Artificial Intelligence connectionist approach that are able to learn how to perform pattern recognition or process controlling tasks without being explicitly programmed for that.
Some examples of neural network applications can be (1) controlling the temperature of a room, (2) recognising faces in a picture/video, (3) generating text sequences.
They are inspired by the way our brains work, by connecting and causing the interaction between multiple basic entities towards a final goal.
These entities are called neurons (as per the whole resemblance between these algorithms and the way our brain works).
Neurons can interact with other neurons depending on the way they are connected to each other and how strong their connection is.
How is this process translated from biology to computer science?Biologically speaking, a neuron is composed by 3 main parts: the dendrites, the nucleus/cell body and the axon.
Its full structure is depicted in figure 1.
Figure 1: Structure of a neuron.
Image taken from this lifehacker articleDendrites receive information from other neurons in the form of an electrical pulse.
The cell body is the place where the inputs coming from the dendrites are received and processed.
The result of this processing is then conducted through the axon until it reaches the terminals (the synapses).
The information is then passed to another dendrites’ neuron through a chemical process known as synaptic process.
Throughout human life many neurons are created and die, many connections between neurons are created while others cease to exist (this is an oversimplification but it gives you an idea of our brains’ dynamics).
According to the Hebbian theory, the strength of these connections will be bigger when the synaptic process between them occurs repeatedly.
In other words, the more two neurons fire together, the stronger the connection between them will be.
On the other hand, if two neurons never get fired together, their connection gets weaker.
This rule is known as Hebb’s Rule.
Some see Hebb’s Rule as a form of classic conditioning (also known as Pavlovian learning).
Classic conditioning is a form of associative learning applied to stimuli.
The idea is to link two stimuli together so that they can share the same response.
In Pavlov’s experiment, the person who had the concept named after him, he trained his dogs by giving them food while ringing a bell at the same time.
Because dogs would start to salivate once they were given food, after repeating this food-bell association a reasonable amount of times his dogs started to salivate once they heard the bell ringing.
The same goes for Hebbian Learning: if an eventual neuron that represents ‘getting food’ is associated with the salivation response, and if the connection between ‘hearing bell’ neuron and ‘getting food’ neuron is stimulated so that it gets stronger, then at some point firing the ‘hearing bell’ neuron can be enough to cause the same salivation response.
In 1943 McCulloch and Pitts proposed a mathematical model which was able to capture these biological features of neurons.
The schema of this mathematical model of a neuron is depicted in Figure 2.
This one is known as McCulloch-Pitts neuron.
Figure 2: Schema of the McCulloch-Pitts neuronThree components arise from the McCulloch-Pitts neuron:A set of weighted inputs (the x’s and w’s).
They represent the inputs and connections strengths, respectively.
Hebb’s rule is about stimulating or fading a connection between neurons depending on its importance for firing two neurons together.
The w’s role is exactly meant for that: describe connections’ strengths.
They are known as weights because they quantify the weight of the connection.
The x’s map to the inputs that were passed from one neuron’s axon terminal in the synapses to other neuron’s dendrites.
An adder (the z).
This represents the nucleus/cell body that is responsible for collating and aggregating all the inputs signals into an unified electrical pulse that is going to be transported along the axon later.
In mathematical words, McCulloch and Pitts translated this aggregation into a dot product between the inputs (x) and weights (w).
But what does this mean?.Let’s use the example from figure 2.
The vector x is composed by x₀, x₁, x₂ whereas the vector w has the elements w₀, w₁, w₂.
The dot product of x and w will then be: x₀ × w₀﹢x₁ × w₁﹢x₂ × w₂.
Let’s formalise this logic in a mathematical notation.
An activation function (the f(z) block).
The activation function’s role is to decide the output of the collated signals and check if the neuron should fire or not based on the collated signals.
This part of the neuron is a mathematical function that grabs z and computes the output y based on the function.
The one suggested by McCulloch and Pitts was the threshold function.
Basically one defines a threshold level ????, if z is greater than ????.the output will be 1, otherwise the output will be 0.
Let’s try an end-to-end example.
Imagine you have a neuron with inputs: x₀﹦2, x₁﹦-0.
75, x₂﹦2 and with weights w₀﹦-1, w₁﹦-1, w₂﹦1.
So, the very first thing to do is to compute z.
According to the equation defined before, z﹦2 × (-1)﹢(-0.
75) × (-1)﹢2 × 1 ﹦0.
The final step is to compute the activation function f(z).
For instance, let’s assume a threshold function with ????﹦0.
75>0 our output will be 1.
I have now explained all about how the most basic neural network works.
But is it really all?.You have now understood how the network computes its output/prediction/result based on external inputs or sensors from the environment, but where is the learning process described in this explanation?.The answer is: nowhere!.The way the neural network learns is by finding the right values for the vector w, the ones that will increase the number of times the network guessed correctly the answer based on the inputs provided.
The vector w is pretty much what is ‘internal’ in the neural network, so trying different values for it is what we can do to train a network.
So how are these weights w modified in order to improve the quality of the network?.I am going to talk about this in my next chapter: the perceptron network.
Stay tuned!Thanks for reading!.Did you like this article?.All your feedback ????.is greatly appreciated.
Feel free to reach me out on Twitter or LinkedIn, or just follow me on Medium if you are interested in being up-to-date about the next chapters ????.
Some related readings: Introduction to Neural Networks, Advantages and Applications by Jahnavi Mahanta; INTRODUCTION TO NEURAL NETWORKS by John Olafenwa; A Visual Introduction to Neural Networks by Shikhar Sharma; Artificial Neural Network — A brief introduction by Tharani Gnanasegaram.
 McCulloch-Pitts Neuron — Mankind’s First Mathematical Model Of A Biological Neuron by Akshay Chandra Lagandula.