Bias-Variance Tradeoff to Avoid under/over-fittingA fundamental concept to understand your model’s performance.

MaherBlockedUnblockFollowFollowingApr 3The irreducible errorI think most of us who are starting in machine learning fall in the trap of overfitting, having an exceptionally good accuracy on the training data may be a really bad thing because the main goal of solving machine learning problem is to generate a generalized solution that can be used on unseen data.

Chabacano —LicenseIf this graph represents a classification problem where for example, the red dots represent cats and the blue dots represent dogs.

We need to classify the newly introduced dot to be a cat or dog.

The black curve in this graph is the best model you can get, although applying this model on the training data will have an error ratio which is due to the misplaced dots (noise) in the training data, this error is the irreducible error.

This is the kind of error we want to have when pushing our model to production.

Try to reduce this error and the test error will go up.

VarianceVariance measures how jumpy our estimator is.

When you’re trying to improve your model to accurately capture the regularities in the training data to the point that you are teaching the model some extra noise, that will un-generalize your model, that will make your model flexible enough to overfit on the training data.

The high Variance (which is mainly overfitting) can cause an algorithm to model the random noise in the training data, rather than the intended (correct) outputs.

BiasWhen you’re trying to lower the complexity of your model to generalize it to the unseen data, so you miss out some important regularities in the data.

Models with low bias are usually more complex, enabling them to represent the training set more accurately.

And models with higher bias tend to be relatively simple.

The high bias (which is mainly underfitting) can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).

TradeoffThe first graph (on the left) draws the data points — that are around the black curve (with some noise) — and 3 different models with different flexibilities.

The linear model is the least flexible, it has a high bias value and low variance value, as you can see in the middle graph, the linear model’s MSE is high in both the training and the testing data, this is due to underfitting.

The Spline model is the most flexible, it has a low bias value and a high variance value, as you can see in the middle graph, it’s MSE is high in the testing data and low in the training data, this is due to overfitting.

The quadratic model is balanced in terms of bias and variance, it has both low bias value and low variance value but not the minimum values because it’s not possible to gain both minimum values due to the unfortunate existence of errors in life.

So you want to balance out the bias and variance of your model to the point where your test error (validation error) and training error have reached their combined minimum.

We can compute the variance and bias to reach a good balance between them.

For more information about the mathematics check out this article.

Mean Square Error, MSE(x)=Var(x)+Bias(x)^2Expected Prediction Error, EPE(x)=Bias(x)^2+Var(x)+σ2.. More details