to the neural network function parameters:The entire gradient computation algorithm, as presented by the authors, proceeds as follows in pseudocode:If you are interested in further mathematical details of this involved computation, please refer back to the original paper or even to the original paper on the adjoint method..The authors of the presented paper even provide Python code to easily compute the derivatives of the ODE solver.ODE Networks For Supervised LearningNow on to the most interesting part of the paper: applications..The first application the authors mention in their paper is in the field of supervised learning, namely MNIST written digit classification..The aim is to show that the ODESolve method can achieve comparable performance to a residual network with much less parameters..The network used for evaluation in the paper downsamples the input image twice and then applies 6 residual blocks..All in all, the network contains approx..0.6M parameters..The ODESolve network replaces the 6 layers by a single ODESolve module..In addition, the authors test a RK-Network, which is similar except that it backpropagates the error directly using a Runge-Kutta method..As mentioned above, you may relate the number of layers in a traditional neural network to the number of evaluations in the ODE-Net..The number of parameters of these two networks is 0.22M..The important result is that with roughly 1/3rd of the parameters the RK-Network and the ODE-Net achieve roughly the same performance as the residual network.. More details