(think of error as the line drawn vertically (NOT a right angle to the line) down/up from the data point) For eg, if the line is far away from the data points, think of like line y=0*X+2 , the error is wayyy to high. As we move closer to the line by tuning the parameters theta-nought and theta-one, the error becomes minimal at some point. As we move away, the error increases. So that point of minimum error, which in-turn also gives us value of best theta-nought and theta-one is found by using an algorithm known as gradient descent.Or you can straight away use a math equation to find theta-nought and theta-one. This is the normal equation.The middle equation is the Normal equation.Here’s the python code. You need to know how to use numpy, pandas, and matplotlib to understand how this code works.Hope you got some of the intuition behind Linear Regression. It’s quite hard to understand it without math so i would recommend the Standford Coursera ML course. Andrew Ng is a great teacher.Thanks a lot :). More details