There have been 5 major activation functions tried to date, step, sigmoid, tanh, and ReLU..Since the step function is non-differentiable at zero hence it is not able to make progress with the gradient descent approach and fails in the task of updating the weights.To overcome, this problem sigmoid functions were introduced instead of the step function.Sigmoid FunctionA sigmoid function or logistic function is defined mathematically asThe value of the function tends to zero when z or independent variable tends to negative infinity and tends to 1 when z tends to infinity..This problem increases with an increase in the number of layers and thus stagnates the learning of a neural network at a certain level.Tanh FunctionThe tanh(z) function is a rescaled version of the sigmoid, and its output range is [ − 1,1] instead of [0,1]..[2]For tanh function, for an input between [-1,1], we have derivative between [0.42, 1].For sigmoid function on the other hand, for input between [0,1], we have derivative between [0.20, 0.25]As one can see from the pictures above a Tanh function has a higher range of derivative than a Sigmoid function and thus has a better learning rate..However, the problem of vanishing gradients still persists in Tanh function.ReLU FunctionThe Rectified Linear Unit is the most commonly used activation function in deep learning models..In general practice as well, ReLU has found to be performing better than sigmoid or tanh functions.Neural NetworksTill now we have covered neuron and activation functions which together for the basic building blocks of any neural network..I would highly suggest people, to revisit neurons and activation functions if they have a doubt about it.Before understanding a Neural Network, it is imperative to understand what is a layer in a Neural Network..For example, here is a small neural network.The leftmost layer of the network is called the input layer, and the rightmost layer the output layer (which, in this example, has only one node)..We also say that our example neural network has 3 input units (not counting the bias unit), 3 hidden units, and 1 output unit [4]Any neural network has 1 input and 1 output layer..The number of hidden layers, for instance, differ between different networks depending upon the complexity of the problem to be solved.Another important point to note here is that each of the hidden layers can have a different activation function, for instance, hidden layer1 may use a sigmoid function and hidden layer2 may use a ReLU, followed by a Tanh in hidden layer3 all in the same neural network..Choice of the activation function to be used again depends on the problem in question and the type of data being used.Now for a neural network to make accurate predictions each of these neurons learn certain weights at every layer..The algorithm through which they learn the weights is called back propagation, the details of which are beyond the scope of this post.A neural network having more than one hidden layer is generally referred to as a Deep Neural Network.Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN) is one of the variants of neural networks used heavily in the field of Computer Vision..Here it simply means that instead of using the normal activation functions defined above, convolution and pooling functions are used as activation functions.To understand it in detail one needs to understand what convolution and pooling are.. More details