# Linear Regression From Scratch With Python

That’s what the error function is for — it calculates the total error of your line.

We’ll be using an error function called the Mean Squared Error function, or MSE, represented by the letter J.

Now while that may look complicated, what it’s doing is actually quite simple.

To find out how “wrong” the line is, we need to find out how far it is from each point.

To do this, we subtract the actual value yᵢ from the predicted value h(xᵢ).

However, we don’t want the error to be negative, so to make sure it’s positive at all times, we square this value.

M is the number of points in our dataset.

We then repeat this subtraction and squaring for all m pointsFinally, we divide the error by 2.

This will help us later when we are updating our parameters.

Here’s what that looks like in code:Now that we have a value for how wrong our function is, we need to adjust the function to reduce this error.

Calculating DerivativesOur goal with linear regression was to find the line which best fits a set of data points.

In other words, it’s the line that’s the least incorrect or has the lowest error.

If we graph our parameters against the error (i.

e graphing the cost function), we’ll find that it forms something similar to the graph below.

At the lowest point of that graph, the error is at it’s lowest.

Finding this point is called minimizing the cost function.

To do this, we need to consider what happens at the bottom of the graph — the gradient is zero.

So to minimize the cost function, we need to get the gradient to zero.

The gradient is given by the derivative of the function, and the partial derivatives of the functions are:We can calculate the derivatives using the following functionUpdating The Parameters Based On The Learning RateNow we need to update our parameters to reduce the gradient.

To do this, we use the gradient update ruleAlpha (α) is what we call the Learning rate, which is a small number that allows the parameters to be updated by a small amount.

As mentioned above, we are trying to update the gradient such that it’s closer to zero (the bottom).

The learning rate helps guide the network to the lowest point on the curve by small amounts.

Minimizing the Cost FunctionNow we repeat these steps — checking the error, calculating the derivatives, and updating the weights until the error is as low as possible.

This is called minimizing the cost function.

We can now tie all our code together in a function in python:When you run this, it randomly initializes θ₁ and θ₀, and then iterates 1000 times to update the parameters to reduce the error.

Every 100 times, it outputs what the line looks like to show us our progress.

Once your error is minimized, your line should now be the best possible fit to approximate the data!ConclusionHere are some key terms we learned —Simple linear regression — Finds the relationship between two variables that are linearly correlated.

E.

g.

finding the relationship between the size of a house and the price of a houseLinear relationship — When you plot the dataset on a graph, the data lies approximately in the shape of a straight line.

Linear equation — y = mx +b.

The standard form that represents a straight line on a graph, where m represents the gradient, and b represents the y-intercept.

Gradient — How steep the line isY-intercept — Where the line crosses the y-axisRandom Initialization — Giving parameters random values to begin withCost function — Calculates the total error of your lineMinimizing the cost function — Reducing the value of the cost function until the error is minimized.

Learning rate (alpha (α)) — A small number that allows the parameters to be updated by a tiny amount.

After reading this article, I hope you now have a better understanding of linear regression and the gradient update rule, which is foundational for other concepts in the field.

Thanks for reading,SarvasvWant to chat?.Find me on Twitter and LinkedinHere’s some other posts I’ve writtenYou Need To Go On An Information DietWe live in a society of drug addicts.

Your best friend is probably one.