After reading this article you should have a rough understanding of the internal mechanics of neural nets, and convolution neural networks, and be able to code your own simple neural network model in Python.
What are Neural NetworksNeural nets take inspiration from the learning process occurring in human brains.
They consists of an artificial network of functions, called parameters, which allows the computer to learn, and to fine tune itself, by analyzing new data.
Each parameter, sometimes also referred to as neurons, is a function which produces an output, after receiving one or multiple inputs.
Those outputs are then passed to the next layer of neurons, which use them as inputs of their own function, and produce further outputs.
Those outputs are then passed on to the next layer of neurons, and so it continues until every layer of neurons have been considered, and the terminal neurons have received their input.
Those terminal neurons then output the final result for the model.
Figure 1 shows a visual representation of such a network.
The initial input is x, which is then passed to the first layer of neurons (the h bubbles in Figure 1), where three functions consider the input that they receive, and generate an output.
That output is then passed to the second layer (the g bubbles in Figure 1).
There further output is calculated, based on the output from the first layer.
That secondary output is then combined to yield a final output of the model.
Figure 1: A Visual Representation of a Simple Neural NetImage from: https://en.
org/wiki/Artificial_neural_networkHow Do Neural Networks Learn?An alternative way of thinking about a neural net is to think of it as one massive function which takes inputs and arrives at a final output.
The intermediary functions, which are done by the neurons in their many layers, are usually unobserved, and thankfully automated.
The mathematics behind them is as interesting as it is complex, and deserves a further look.
As previously mentioned, the neurons within the network interact with the neurons in the next layer, with every output acting as an input for a future function.
Every function, including the initial neuron receives a numeric input, and produces a numeric output, based on a internalized function, which includes the addition of a bias term, which is unique for every neuron.
That output is then converted to the numeric input for the function in the next layer, by being multiplied with an appropriate weight.
This continues until one final output for the network is produced.
The difficulty lies in determining the optimal value for each bias term, as well as finding the best weighted value for each pass in the neural network.
To accomplish this, one must choose a cost function.
A cost function is a way of calculating how far a particular solution is from the best possible solution.
There are many different possible cost functions, each with advantages and drawbacks, each best suited under certain conditions.
Thus, the cost function should be tailored and selected based on individual research needs.
Once a cost function has been determined, the neural net can be altered in a way to minimize that cost function.
A simple way of optimizing the weights and bias, is therefore to simply run the network multiple times.
On the first try, the predictions will by necessity be random.
After each iteration, the cost function will be analyzed, to determine how the model performed, and how it can be improved.
The information gotten from the cost function is then passed onto the optimizing function, which calculates new weight values, as well as new bias values.
With those new values integrated into the model, the model is rerun.
This is continued until no alteration improves the cost function.
There are three methods of learning: supervised, unsupervised, and reinforcement learning.
The simplest of these learning paradigms is supervised learning, where the neural net is given labelled inputs.
The labelled examples, are then used to infer generalizable rules which can be applied to unlabeled cases.
It is the simplest learning method, since it can be thought of operating with a ‘teacher’, in the form of a function that allows the net to compare its predictions to the true, and desired results.
Unsupervised methods do not require labelled initial inputs, but rather infers the rules and functions, based not only on the given data, but also on the output of the net.
This hampers the type of predictions which can be made.
Instead of being able to classify, such a model is limited to clustering.
What are Convolution Neural Networks?A variation of the vanilla neural network is the convolution neural network.
ConvNets, as they are sometimes known offer some significant advantages over normal neural nets, especially when it comes to image classification.
In such a case, the initial inputs would be images, made up of pixels.
The traditional issue with image classification is that with big images, with many color channels, is that it quickly becomes computationally infeasible to train some models.
What CNN tries to do is transform the images into a form which is easier to process, while still retaining the most important features.
This is done by passing a filter over the initial image, which conducts matrix multiplication over a subsection of the pixels in the initial image, it iterates through subsets until it has considered all subsets.
The filter aims at capturing the most crucial features, while allowing the redundant features to be eliminated.
This passing of a filter over the initial pixels is known as the Convolution Layer.
After the convolution layer comes the pooling layer, where the spatial size of the convoluted features will be attempted to be reduced.
The reduction in complexity, sometimes known as dimensionality reduction will decrease the computational cost of performing analysis on the data set, allowing the method to be more robust.
In this layer, a kernel once again passes over all subsets of pixels of the image.
There are two types of pooling kernels which are commonly used.
The first one is Max Pooling, which retains the maximum value of the subset.
The alternative kernel is average pooling, which does exactly what you’d expect: it retains the average value of all the pixels in the subset.
Figure 2 visually shows the processes of the pooling phase.
Figure 2: The Pooling Phase of Convolution Neural NetworksImage from: https://towardsdatascience.
com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53After the pooling phase, the information will hopefully be compressed enough to be used in a regular neural network model.
The last remaining thing to do is to flatten the final output of the pooling phase and feed it into the model.
The flattening is done by changing the matrix of pixels into a vector of pixels, which can then be used for the neural net model.
From there, the convolution neural network acts just like a regular neural network, in that the information will be passed to a set of neurons, which will pass on values to other layers until a final output is reached.
Convolution neural networks thus allow neural networks to be feasible for large data sets, or for complex images, since it reduces the computational power needed for the analysis.
What are Some Applications of Neural Networks?There are many applications for machine learning methods such as neural nets.
Most of these applications focus on classification of images.
Those images could be of anything, from whether or not something is a hot dog, to identifying handwriting.
The practical possibilities of such a model are broad, and lucrative.
Let’s have a look at an example.
Many companies would love to be able to automate a model which would classify articles of clothing.
Doing so would allow them to draw insight into fashion trends, buying habits, and differences between cultural, and socio-economic groups.
To that end we will use a neural network, to see if an adequate classification model can be constructed, when given a set of 60,000 images, with labels identifying what type of clothing they were.
All of those pictures are made up of pixels, which since we will be doing a simple neural network, and not a convolution neural net, will be passed directly into the network as a vector of pixels.
Image taken from: https://medium.
com/tensorist/classifying-fashion-articles-using-tensorflow-fashion-mnist-f22e8a04728aIn order to create a neural network, one must specify the number of layers within the model.
For the sake of simplicity, I shall limit the model to two layers.
One must also select the type of activation for each layer.
Again, to keep it simple, I will select a sigmoid activation for the first layers, with the final layer having 10 nodes, and set to return 10 probability scores, indicating the probability whether the image belongs to one of the ten possible articles of clothing.
To compile the model, a loss function must be defined, which will be how the model evaluates its own performance.
An optimizer must also be determined, which is how the information from the cost function is used to change the weights and the bias of each node.
model = keras.
Flatten(input_shape (28,28)), keras.
Dense(128,activation = tf.
Dense(10,activation = tf.
compile(optimizer = 'adam',loss='sparse_categorical_crossentropy',metrics =['accuracy'])With those parameters input, a model can be trained.
Once the model has been trained, it must be evaluated based on the testing data.
To accomplish that, one must denote the number of epochs which the model will consider.
Epochs determine how many iterations through the data shall be done.
More epochs will be more computationally expensive, but should allow for a better fit.
I shall consider 5 epochs.
One can see how the accuracy of the model on the training data improves after each iteration, as the optimizing function alters the weights.
fit(x_train, y_train,epochs = 5)The Change in Accuracy for Each EpochThe real fit of how a model performs is by running the model on a testing set, which was not used to construct the model.
In this case, the neural net had an accuracy of 0.
7203: Not Bad!SummaryNeural networks are complex models, which try to mimic the way the human brain develops classification rules.
A neural net consists of many different layers of neurons, with each layer receiving inputs from previous layers, and passing outputs to further layers.
The way each layer output becomes the input for the next layer depends on the weight given to that specific link, which depends on the cost function, and the optimizer.
The neural net iterates for a predetermined number of iterations, called epochs.
After each epoch, the cost function is analyzed to see where the model could be improved.
The optimizing function then alters the internal mechanics of the network, such as the weights, and the biases, based on the information provided by the cost function, until the cost function is minimized.
A convolution neural network is a twist of a normal neural network, which attempts to deal with the issue of high dimensionality by reducing the number of pixels in image classification through two separate phases: the convolution phase, and the pooling phase.
After that it performs much like an ordinary neural network.
Key WordNeural NetworksConvolution Neural NetworkPooling phaseNeuronLayerFiltersCost FunctionOptimizer.. More details