All you need is adequate math to be able to understand basic graphs.
Before entering the topic, a little brush up…Regression : The regression model is a statistical procedure that allows us to estimate the linear or polynomial relationship that relates two or more variables.
It is mainly based on the amount of change on the output variable for a significant change in the predictor variables, basically the correlation between the variables.
Regression can be classified as:Linear RegressionPolynomial RegressionLogistic RegressionRidge RegressionLasso RegressionElasticnet regressionThere are also few other derivative models from these regression techniques that serve a specialized requirement.
This article mainly focuses on regularization over regression methods for a more accurate prediction.
Regularization : Regularization is a very important concept that is used to avoid overfitting of the data especially when the trained and tested data are much varying.
Regularization is implemented by adding a “penalty” term to the best fit derived from the trained data, in order to achieve a lesser variance with the tested data and also restricts the influence of predictor variables over the output variable by compressing their coefficients.
Ridge RegressionRidge regression is a technique that is implemented by adding bias to a multilinear regression model to expect a much accurate regression with tested data at a cost of loosing accuracy for the training data.
The general equation of a best fit line for multi linear regression isy = β0 + β1×1 + β2×2 + ··· βkxkwhere y is the output variable and x1,x2…xk are predictor variables.
The penalty term for ridge regression is λ(slope) ², where lambda denotes the degree of deflection from the original curve by restricting the coefficients of predictor variables but never makes them zero.
Therefore the equation for ridge regression isy = β0 + β1×1 + β2×2 + ··· βkxk + λ(slope) ²Let us take consider an example by taking the salary_data dataset, the ridge regression-scatter plot using a lambda value of 100 is:Lasso RegressionLasso regression is much similar to ridge regression but only differs in the penalty term.
The penalty for lasso regression is λ|slope|.
Lesso regression can even eliminate the variables by making their coefficients to zero thus removing the variables that have high co-variance with other predictor variables.
The equation for lasso regression isy = β0 + β1×1 + β2×2 + ··· βkxk + λ|slope|Taking the same example for the lasso regression, the lasso regression-scatter plot using a lambda value of 10000 is:Comparison of lasso and ridge with linear regression model, we get:Note: All the three plots must pass through a single point that is ( x̄, ȳ), where x̄ is the mean of predictor variables and ȳ is the mean of output variable.
Elasticnet regression is just a fancier combination of both ridge and lasso regressions that is capable removing an overfitting model to a greater extent.
With a proper value of lambda chosen the model can be regularized and a accuracy can be achieved.
.. More details