It is not to scale but we can determine the error terms are reduced as learning is happening at an exponential rate (z³) for this case.Fig 7: Performance of Cube (z³) Activation FunctionThe paper is thorough in its experiments where the model is rigorously tested against other state-of-the-art models and to quickly draw a comparison between the activation functions with respect to performance, we observe that:Non-Linear Cube > ReLU > Tanh > SigmoidTo understand more about how the model runs, I have attached the link of Manning’s paper below.A Fast and Accurate Dependency Parser using Neural Networks:https://cs.stanford.edu/~danqi/papers/emnlp2014.pdfConclusionWe learned the numerous activation functions that are used in ML models. For researchers, these functions are used to draw a comparison of what works best given the problem statement. There is no hard and fast rule for selecting a particular activation function. However, it depends upon the model’s architecture, the hyperparameters and the features that we are attempting to capture. Ideally, we utilize the ReLU function on our base models but we can always try out others if we are not able to reach an optimal result.As a last note, feel free to comment on your own versions of non-linear activation functions that you have designed, the high-level overview and in what scenario it performs best.Spread and share knowledge. If this article piqued your interest, give a few claps as it always motivates me to write out more informative content. For more data science and technology related posts follow me here.I am also available on Linkedin and occasionally tweet stuff as well. :). More details