6: Updating the weights and bias (dark green nodes)Also pay attention to the ‘direction’ of the pathway from the yellow node to the green node. They go from bottom to top.This is stochastic gradient descent — updating the weights using backpropagation, making use of the respective gradient values.Let’s first focus on updating b. The formula for updating b isEqn. 6.1: Stochastic gradient descent update for bwhereb — current valueb’ — value after updateη —learning rate, set to 0.05∂L/∂b — gradient i.e. partial differential of L w.r.t. bTo get the gradient, we need to multiply the paths from L leading to b using chain rule:Eqn. 6.2: Chain rule for partial differential of L w.r.t. bWe would require the current batch values of x, y, ŷ and the partial differentials so let’s just place them below for easy reference:Eqn. 6.3: Partial differentialsEqn. 6.4: Values from current batch and the predicted ŷUsing the stochastic gradient descent equation in Eqn. 6.1 and plucking in all the values from Eqn. 6.2–6.4 gives us:That’s it for updating b! Phew! We are left with updating w₁ and w₂, which we update in a similar fashion.End of batch iterationCongrats! That’s it for dealing with the first batch! x1 x2 y1) 4 1 2 ✔2) 2 8 -143) 1 0 14) 3 2 -15) 1 4 -76) 6 7 -8Now we need to iterate the above-mentioned steps to the other 5 batches, namely examples 2 to 6.Fig..7: Iterating through batch 1 to 6 (apologies for the poor GIF quality! )End of epochWe complete 1 epoch when the model has iterated through all the batches once..In practise, we extend the epoch to more than 1.One epoch is when our setup has seen all the observations in our dataset once..But one epoch is almost always never enough for the loss to converge..In practice, this number is manually tuned.At the end of it all, you should get a final model, ready for inference, say:Let’s have a review of the entire workflow in a pseudo-code:initialise_weights()for i in epochs: for j in batches: #forward propagation feed_batch_data() compute_ŷ() compute_loss() #backpropagation compute_partial_differentials() update_weights()Improve trainingOne epoch is never enough for a stochastic gradient descent optimisation problems..Remember that in Fig..4.1, our loss is at 4.48..If we increase the number of epochs, which means just increasing the number of times we update the weights and biases, we can converge it to a satisfactory low.Below are the things you can improve the training:Extend training to more than 1 epochIncrease batch sizeChange optimiser (see my post on gradient descent optimisation algorithms here)Adjust learning rate (changing the learning rate value or using learning rate schedulers)Hold out a train-val-test setAboutI built an interactive explorable demo on linear regression with gradient descent in JavaScript..Here are the libraries I used:Dagre-D3 (GraphViz + d3.js) for rendering the graphsMathJax for rendering mathematical notationsApexCharts for plotting line chartsjQueryCheck out the interactive demo here.You might also like to check out A Line-by-Line Layman’s Guide to Linear Regression using TensorFlow below, which focuses on coding linear regression using the TensorFlow library.A line-by-line layman’s guide to Linear Regression using TensorFlowLinear regression is a great start to the journey of machine learning, given that it is a pretty straightforward…medium.comReferencesCalculus on Computational Graphs: Backpropagation — colah's blogBackpropagation is the key algorithm that makes training deep models computationally tractable..For modern neural…colah.github.ioThanks to Ren Jie and Derek for ideas, suggestions and corrections to this article.Follow me on Twitter @remykarem for digested articles and demos on AI and Deep Learning.. More details