# Neural Networks — A Solid Practical Guide

Neural Networks — A Solid Practical GuideExplaining How Neural Networks Work With Practical ExamplesFarhad MalikBlockedUnblockFollowFollowingMay 16This article aims to present a transparent view on Neural Networks.

It is a not-to-miss article for anyone who is interested in learning machine learning.

By understanding this article, you will gain a solid foundation of neural network.

Why Am I Focusing On Neural Networks?Recently I came across an article that explained how machine learning algorithms are being used to accurately predict heart attacks, and another article demonstrated how financial organisations utilise the networks to generate revenue and cut costs.

The number of neural network specific projects are growing at an exponential rate.

It’s time to learn about them!An example neural networkOne thing is clear though; neural networks sit at the core of revolutionary machine learning projects.

Neural networks are treated as black-box by many users.

We cannot avoid not-knowing about how neural networks operate.

I will be explaining the concept in a very easy-to-understand and succinct manner.

Photo by Nicolas Picard on UnsplashArticle AimThis article will cover following topics:A very brief introduction on Neural NetworkThorough overview of what each neural network component isI will then present a number of practical examples of neural networks so that we can see how the network is trainedI will also implement a neural network in Python.

I will be presenting this article in a flow so that the knowledge is built incrementally.

My aim is to use Tensorflow, a Google Brain’s library, to implement the neural network.

1.

What Is A Neural Network?The geniuses of Alexander Bain (1873) and William James (1890) proposed the concept of artificial neural network.

Artificial Neural Networks Are Inspired By Biological Neural NetworksJust like biological neural network, artificial neural network is constantly learning and updating its knowledge and understanding of the environment based on experiences that it encountered.

An artificial neural network is simply a set of mathematical algorithms that work together to perform operations on the input.

These operations then produce an output.

Therefore, these mathematically inter-connected formulae are known as artificial neural network (ANN).

Let’s Stick With The Simple Definition For Now And We’ll Build On It As We ProceedNeural networks can help us understand relationships between complex data structures.

The neural networks can use the trained knowledge to make predictions on the behavior of the complex structures.

Neural networks can be utilised to predict linear and non-linear relationships in data.

Neural networks can process images and even make complex decisions such as on how to drive a car, or which financial trade to execute next.

Although, neural networks can be sophisticated and can solve complex problems, they are slower than most machine algorithms.

They can also end up overfitting the training data.

2.

How Does A Neural Network Look Like?This is an example artificial neural network:Let’s review this neural networkA neural network can contain multiple layers.

The artificial neural network shown above has 4 layers:One Input layerOne Output layerTwo Hidden LayersThere are in total 10 neurons:2 input neurons6 hidden neurons — 3 neurons within each hidden layer2 output neuronsThis is an example of a feed-forward neural network as the data is flowing in one direction only; from the input layer to the output layer.

Each neuron is connected with another neuron via synapses.

Each neuron takes in an input from one-or-more neurons along with the weights and a bias which I will explain in detail later on.

Let’s understand this diagram in-depth.

3.

Key Neural Network ComponentsThis section will outline each of the key neural network components:3.

1 What Is A Neuron?Let’s take a look inside a neuron:A neuron is a container that contains following key components:A mathematical function which is known as an activation functionInputsA vector of weightsA biasA neuron first computes the weighted sum of the inputs.

As an instance, if the inputs are:And the weights are:Then a weighted sum is computed as:Subsequently, a bias (constant) is added to the weighted sumFinally, the computed value is fed into the activation function, which then prepares an output.

Think of the activation function as a mathematical operation that normalises the input and produces an output.

The output is then passed forward onto the neurons on the subsequent layer.

Let’s understand what these layers are.

3.

2 What Is A Neural Network Layer?Think of a layer as a container of neurons.

A layer groups a number of neurons together.

It is used for holding a collection of neurons.

There will always be an input and output layer.

We can have zero or more hidden layers in a neural network.

The learning process of a neural network is performed with the layers.

The key to note is that the neurons are placed within layers and each layer has its purpose.

The neurons, within each of the layer of a neural network, perform the same function.

They simply calculate the weighted sum of inputs and weights, add the bias and execute an activation function.

Let’s analyse the different types of layers.

3.

3 What Is An Input Layer?The input layer is responsible for receiving the inputs.

These inputs can be loaded from an external source such as a web service or a csv file.

There must always be one input layer in a neural network.

The input layer takes in the inputs, performs the calculations via its neurons and then the output is transmitted onto the subsequent layers.

Input layer takes in the inputs.

Output layer produces the final results.

3.

4 What Is An Output Layer?The output layer is responsible for producing the final result.

There must always be one output layer in a neural network.

The output layer takes in the inputs which are passed in from the layers before it, performs the calculations via its neurons and then the output is computed.

In a complex neural network with multiple hidden layers, the output layer receives inputs from the previous hidden layer.

3.

5 What Is A Hidden Layer?The introduction of hidden layers make neural networks superior to most of the machine learning algorithms.

Hidden layers reside in-between input and output layers and this is the primary reason why they are referred to as hidden.

The word “hidden” implies that they are not visible to the external systems and are “private” to the neural network.

There could be zero or more hidden layers in a neural network.

Usually, each hidden layer contains the same number of neurons.

The larger the number of hidden layers in a neural network, the longer it will take for the neural network to produce the output and the more complex problems the neural network can solve.

The neurons simply calculate the weighted sum of inputs and weights, add the bias and execute an activation function.

3.

6 What Is An Activation Function?Activation function is nothing but a mathematical function that takes in an input and produces an output.

The function is activated when the computed result reaches the specified threshold.

The input in this instance is the weighted sum plus bias:And the thresholds are pre-defined in the function.

This very nature of the activation functions can add non-linearity to the output.

Subsequently, this very feature of activation function makes neural network solve non-linear problems.

Non-linear problems are those where there is no direct linear relationship between the input and output.

To handle these complex scenarios, a number of activation functions are introduced which can be configured on the inputs.

Let’s review a number of common activation functions.

Before I explain each of the activation function, have a look at this table.

I am demonstrating how the values differ for the five most well known activation functions which I will be explaining in detail.

Each activation function has its own formula which is used to convert the input.

Let’s understand each of them in detail.

3.

6.

1 Linear Activation Function:The activation function simply scales an input by a factor, implying that there is a linear relationship between the inputs and the output.

This is the mathematical formula:y is a scalar value, as an instance 2, and x is the input.

This is how the graph looks if y = 2:3.

6.

2 Sigmoid Activation Function:The sigmoid activation function is “S” shaped.

It can add non-linearity to the output and returns a binary value of 0 or 1.

Consider this non linear exampleLet’s assume you buy an European call option.

The concept of an European call option is that a premium amount P is paid to buy an option on an underlying, such as on a stock of a company.

The buyer and seller agree on a strike price.

Strike price is the amount when the buyer of the option can exercise it.

Now, let’s understand this scenario in practice:When the price of the underlying stock goes above the strike price, the buyer ends up making profit.

However as soon as the price goes below the strike price, the loss is capped and only the premium P is lost.

This is a non linear relationship.

This binary relationship of whether to exercise an option or not, can be computed by the sigmoid activation function:If your output is going to be either 0 or 1 then simply use the sigmoid activation function.

This is the example graph:3.

6.

3 Tanh Activation Function:Tanh is an extension of the sigmoid activation function.

Hence Tanh can be used to add non-linearity to the output.

The output is within the range of -1 to 1.

Tanh function shifts the result of the sigmoid activation function:3.

6.

4 Rectified Linear Unit Activation Function (RELU)RELU is one of the most used activation functions.

It is preferred to use RELU in the hidden layer.

The concept is very straight forward.

It also adds non-linearity to the output.

However the result can range from 0 to infinity.

If you are unsure of which activation function you want to use then use RELU.

5.

Softmax Activation Function:Softmax is an extension of the Sigmoid activation function.

Softmax function adds non-linearity to the output, however it is mainly used for classification examples where multiple classes of results can be computed.

Understand with an exampleLet’s assume you are building a neural network that is expected to predict the possibility of rainfall in the future.

The softmax activation function can be used in the output layer as it can compute the probability of the event occurring in the future.

The activation functions normalise the input and produces a range of values from 0 to 1.

The weights along with the bias can change the way neural networks operate.

3.

7 What Is Bias?Bias is simply a constant value (or a constant vector) that is added to the product of inputs and weights.

Bias is utilised to offset the result.

The bias is used to shift the result of activation function towards the positive or negative side.

Imagine this scenario:Let’s assume you want your neural network to return 2 when the input is 0.

As the sum of product of weight and input is going to be 0, how will you ensure the neuron of the network returns 2?You can add a bias of 2.

If we do not include the bias then the neural network is simply performing a matrix multiplication on the inputs and weights.

This can easily end up over-fitting the data set.

The addition of bias reduces the variance and hence introduces flexibility and better generlisation to the neural network.

Bias is essentially the negative of the threshold, therefore the value of bias controls when to activate the activation function.

3.

8 What Are Weights?The weights are possibly the most important concept of a neural network.

When the inputs are transmitted between neurons, the weights are applied to the inputs and passed into an activation function along with the bias.

The weights are essentially reflecting how important an input is.

Weights are the co-efficients of the equation which you are trying to resolve.

Negative weights reduce the value of an output.

When a neural network is trained on the training set, it is initialised with a set of weights.

These weights are then optimised during the training period and the optimum weights are produced.

Let’s understand with a scenario:Assume you are predicting the price of a car in dollars.

Your understanding is that the price of the car is dependent on the year it was made and the number of miles it has driven.

Let’s assume that your hypothesis is that the higher the year of the car, the pricey the car.

And subsequently, the more the car is driven, the cheaper the car.

This example should help you see that there is a positive relationship between the price of the car and the year it was made and a negative relationship between the price of the car and the miles it has been driven.

As a result, we expect to see positive weight for the feature that represents year and negative weight for the feature that represents miles.

w1 is going to be positive and w2 is expected to be negativeThe weights are optimised based on an optimisation algorithm and a learning rate.

3.

9 What Is A Learning Rate?The learning rate determines the speed at which we want to update the weights.

The lower the learning rate, the longer it will take for the optimisation algorithm to reach the local minimum point and converge.

On the other hand, if the learning rate is large then it might never converge and reach the local minimum point.

Hence, the right balance is required.

The learning rate is used in the optimisation algorithm to update the weights.

3.

10 What Is An Optimisation Algorithm?There are a number of optimisation algorithms available.

Gradient descend is the most commonly used optimisation algorithm because it is widely understood in the industry.

In a nutshell, the optimisation algorithm computes a derivative of the function to find the rate of change and uses the learning rate to either increase or decrease the weights.

When the derivative is 0 then it implies that it has reached the minimum point.

Until it reaches the minimum point, the optimisation algorithm computes the difference between the predicted and actual values using a loss function such as mean squared error, as that is the value it is trying to minimise.

The optimiser takes in the loss output and changes its direction accordingly until it coverges to the local minimum or until it has iterated over the computations for a given number of times.

The optimiser is the core reason why a neural network ends up improving its predictions.

3.

11 What Is Epoch?Epoch is one of the input parameters of the learning algorithm.

Think of epoch has a loop.

It determines the number of times a learning algorithm is going to update the weights.

If the value of epoch is 1 then it means each data set in the training set will be fed into the neural network to update the weights.

If the epoch is 5 then it means there will be 5 loops.

Higher the value of epoch, longer the neural network takes to train.

3.

12 What Is Batch Size?Batch size is another hyper-parameter of neural network.

It indicates the number of samples in a training set.

The training set is divided into multiple chucks of batch size.

Therefore the larger the batch size value, the quicker it is for the neural network to train itself.

If the batch size is 100 then it means that the training set is going to be divided into equally sized of 100 samples and then each sample will be used to train the model.

This is an example of 3 batches:If the batch size is not divisible by 100 then the last batch will contain fewer samples than the rest of the batches.

3.

13 What Is Loss And Accuracy?The loss function is also known as the cost function.

They compute the error value.

To be precise, the cost function is the average of loss functions.

This is the function that the optimisation algorithm is trying to minimise.

There are a large number of loss functions available, such as mean squared error, binary cross entropy etc.

For regression, use mean squared error.

For binary classification, use cross entropy loss function.

The loss function essentially tells the neural network what action it needs to perform to improve the accuracy.

This information is taken by the optimiser to produce accurate weights.

Neural network can then forward propogate the input data.

3.

14 What Is Forward Propagation?The forward propagation process is also known as inference.

It is the most simple neural network form which takes in the inputs, processes them and passes them to the subsequent layers; all the way to the neurons of the output layer.

Each neuron applies the weights to the inputs along with the bias and computes the appropriate activation function.

If the predicted values are not good enough then we can utilise back propagation.

3.

15 What Is Back Propagation?Back propagation takes in the difference between the predicted and actual values to further enhance the weights.

Firstly, partial derivative of the error value with respect to each weight is calculated.

The derivative, referred to as gradient of the slope, is calculated from the last layer.

The derivative is then used to calculate the gradients of the previous layer and then process is repeated.

The process is repeated for every weight in every layer.

The value of the weight value is subtracted from the error value to ensure the accuracy is improved.

Note: error value is the difference between predicted and actual.

This process, as we move backwards from last to the first layer, is known as back propagation.

It can also apply dropouts to the weights.

3.

16 What Is Dropout?Dropout is used to set the weights to zero.

This process randomly sets the weights to 0 and thus enhances the prediction of the network.

It is Google' patented regulisation technique.

4.

Practical ExercisesThe best way to understand neural network is to do practical exercises together.

I will be presenting a scenario along with its description.

Then we’ll understand how the neural network will work.

Finally, we’ll use TensorFlow to implement the neural network.

TensorFlow is an open source library.

It can perform a number of functions on graphs.

One of its strengths is easily letting the scientists implement neural networks.

It can run on CPUs and GPUs.

4.

1 Scenario: Compute OR operator neural networkExplanation:There are two inputs: X and Y which produce an output ZWe are going to choose the value of bias and weights of X and Y such that:When either X or Y is 1 then the value of Z is 1, otherwise it is 0.

We could come up with following neural network configuration:One input layer with two neurons.

We are going to pass in the value of X to the first neuron and the value of Y to the second neuron in the input layer.

One output layer with one neuron.

It will return the value for ZThe activation function for the output layer will be Sigmoid.

As a result:Z = 0 if the output layer produces a value lower than 0.

5Z = 1 if the output layer produces a value greater or equal to 0.

5After trial and error, this is the configuration I have come up with:Bias = -1Weight of X = 2Weight of Y = 2Let’s Check:As an instance, when X = 1 and Y = 0 Then the neuron will compute:Intermediate Z = 1*2 + 0*2 + (-1) = 2–1 = 1Z = Sigmoid(1) = 1It produces 100% accuracy.

This neural network can now simulate an OR gate.

Here are the results:It’s important to note that this logic can now be used to predict and answer business questions.

Business OR scenarioAs an instance, if we were working in a credit risk department of a bank, we can use neural network to predict the credit exposure events.

For example, given the counterparty rating decreases OR the counterparty misses its payment, we are going to get an impact on the exposure.

4.

2 Scenario: Compute AND operator neural networkExplanation:There are two inputs: X and Y which produce an output ZWe are going to choose the value of bias and weights of X and Y such that:When both X and Y is 1 then the value of Z is 1, otherwise it is 0.

We could come up with following neural network configuration:One input layer with two neurons.

We are going to pass in the value of X to the first neuron and the value of Y to the second neuron.

One output layer with one neuron.

It will return the value for ZThe activation function for the output layer will be Sigmoid.

As a result:Z = 0 if the output layer produces a value lower than 0.

5Z = 1 if the output layer produces a value greater or equal to 0.

5After trial and error, this is the configuration I have come up with:Bias = -3Weight of X = 2Weight of Y = 2Let’s Check:As an instance, when X = 1 and Y = 0 Then the neuron will compute:Intermediate Z = 1*2 + 0*2 + (-3) = 2–3 = -1Z = Sigmoid(-1) = 0It produces 100% accuracy.

This neural network can now simulate an AND gate.

Note, the only difference between OR and AND neural network above is the value of Bias.

Bias plays an important role in predictions.

Have a look at detailed results:It’s important to note that this logic can now be used to predict and answer business questions.

Business AND scenarioAs an instance, if we were working in a trading floor of a bank, we can use neural network to predict whether a stock should be bought.

For example, given the stock has good profitability AND the historical profitability is good too then we will buy the stock.

A number of neural networks can be combined together to produce a complex neural network.

4.

3 Scenario: Let’s get ourselves familiar with the practice of implementing a neural network.

Build a neural network with following configuration:2 hidden layers, 32 neurons each with RELU activation functionOutput layer should have 5 neurons using Softmax activation functionAdam Optimizer with loss function of mean square error and learning rate as 0.

015 epocs and batch size of 100import tensorflow as tffrom tensorflow import kerasx = layers.

Dense(32, activation='relu')(inputs)x = layers.

Dense(32, activation='relu')(x)predictions = layers.

Dense(5, activation='softmax')(x)model = tf.

keras.

Model(inputs=inputs, outputs=predictions)learning_rate = 0.

01# Configure a model for mean-squared error regression.

model.

compile(optimizer=tf.

train.

AdamOptimizer(learning_rate), loss='mse') # mean squared error) model.

fit(data, labels, epochs=5, batch_size=100)SummaryThis article covered following topics:A very brief introduction of neural networksThorough overview of what each neural network component isPractical examples of neural networks in PythonNeural networks are a not to miss concept.