In this article we’ll cover univariate linear regression which is a statistical approach to find and determine a relationship among an independent variable x and a dependent variable y.
we can write the relationship between the independent variable and the dependent variable as y=mx+b where m is the slope and b is our intercept.
In machine learning we typically call ŷ our prediction, x our feature, and parameters w weights.
Now let’s rewrite our regression equation in machine learning notation.
ExampleNow, let’s step through an example of how to calculate the estimated ŷ output of a regression where the linear equation is represented by ŷ = 2x and our input feature is x=4.
In this case there is no intercept.
Instead we just have a slope of 2 which means for every increment we increase our input it will double the expected output.
Let’s go ahead and convert this equation into a small function in Python so in the future we can have the computer do all the heavy lifting for us.
Fitting a Univariate Linear Regression LineAs you may have noticed, our formulas only works if we know the slope and intercept of our model, but what if we don’t have this information?.Before we can even find our slope and intercept we need to ask ourselves an even more important question.
How can we determine if the model is accurately capture our data — in other words is our model a good fit?.Once we have a criteria of how to evaluate a model we can then use this information to find an optimal slope and intercept.
One approach we can try is to minimise the error generated in our model by asking how far off our prediction is from the actual outcome.
We can do so by measuring the difference between our predictive model ŷ and the true value y using the sum squared error method.
Sum of Squared Errors:Check this amazing video before reading further by Sal Khan for the proof of equation(recommended):Now that we know how to calculate the sum of squared errors, how do we minimise it to find the best fit line for our linear regression?Say for example we have a linear equation where the slope is 1 and the intercept is 0.
How can we tell if this is a good model?.Well we can compare the predicted values of ŷ from the true values y in our recorded data set.
That is for each point in the data set we would subtract the predict value from the true value in the data and sum the total of squared errors.
Let’s convert the sum of squared error function into Python code.
Okay so now that we have an error formula how can we use it to try to find our optimal slope and intercept (our weights) that will reduce our errors?.Well, we can analyse the behavior of our SSE and see if it can help us determine which points we can use.
Instead of manually testing various weight values, we can take the derivative of the sum of square errors to help us find our ideal weight parameters.
Following diagram is the rough figure of how our Sum of Squared Line looks like in 3-Dimension!After taking derivative of the line we get following estimates!ȳ is the mean value of y and similarly for xOnce again let’s go ahead and code these formulas into Python.
Now that we can calculate the slope and intercept formulas, let’s create our linear regression model from scratch and make a new prediction.
Project Fuel EconomyHere is the Google Colab link for practice!Credit for project: http://ai-4-all.
org/Google ColaboratoryEdit descriptioncolab.
comFollow me for more amazing stuffs!.