Third, once defined, the model still need to be fitted (the weights should be adjusted, based on the data, to minimise some error function, just as in the case of the linear regression) and that is a really difficult optimisation task to complete.Different needs, different architecturesAs we have seen in the previous paragraph, neural network are templates of functions highly adjustable, through their parameters, that need to be optimised to fit the data..But depending on the nature of the problem and on the data to model, we maybe want to use different kind templates..These different kind of templates are called different “architectures”..For example, the basic Forward Neural Network (also called Multi Layer Perceptron) discussed above is the first basic architecture..However there exist several others..Among the well known architectures, we can mention Recurrent Neural Networks (RNN) — that represent a recurrent function of some sequential data where an output at time t depends on the input at this time t and on the previous output at time t-1 — and Convolutional Neural Networks (CNN) — that represent the mathematical convolution operation on the inputs and show good properties for example in some image based problems such as image recognition.With all the ongoing research works, there are always more architectures that are imagined depending on the problem to model..Obviously, we can’t describe all these kinds of architectures (and it would be completely out of the scope of this article) but the most important thing to keep in mind here is that an architecture of neural network should always be seen as a space of possible functions where optimising the parameters of the network is equivalent to find the best function in this space (based on an optimality criterion)..So, it is of course important to chose the right architecture because if not well chosen we will define a space in which even the best function could be far from what we expect as a result.Why is it a good idea?Among the reasons that make neural networks so effective and popular, we can mention the following three..First, the always increasing volume of data available in many areas makes reasonable the use of such highly parametrised models..Notice that the parameters of the model are also called “degrees of freedom” (this term is not specific to machine learning field and express the notion of “adjustable values” or “action levers” in a model) and that a lot of degrees of freedom in a model require a lot of data to adjust/calibrate this model..In the same spirit as you can’t afford in a linear system of equations to have more unknowns than equations, you need more data than parameters in your network (in fact, a lot more is better)..Second, the compute power, which is always greater, coupled with smart optimisation techniques makes possible to train at very large scale models with so much parameters.. More details