What Are Overfitting and Underfitting in Machine Learning?Anas Al-MasriBlockedUnblockFollowFollowingJun 21As you enter the realm of Machine Learning, several ambiguous terms will introduce themselves.
Terms like Overfitting, Underfitting, and bias-variance trade-off.
These concepts lie at the core of the field of Machine Learning in general.
In this post, I explain those terms with an example.
Why should we even care?Arguably, Machine Learning models have one sole purpose; to generalize well.
I have mentioned this in several previous posts, but it never hurts to emphasize on it.
Generalization is the model’s ability to give sensible outputs to sets of input that it has never seen before.
Normal programs cannot do such a thing, as they can only give outputs “robotically” to the inputs they know.
Performance of the model as well as the application as a whole relies heavily on the generalization of the model.
If the model generalizes well, it serves its purpose.
A lot of techniques to evaluate this performance have been introduced, starting with the data itself.
Building on that idea, terms like overfitting and underfitting refer to deficiencies that the model’s performance might suffer from.
This means that knowing “how off” the model’s predictions are is a matter of knowing how close it is to overfitting or underfitting.
A model that generalizes well is a model that is neither underfit nor overfit.
This may not make that much sense yet, but I need you to keep this sentence in mind throughout this post, as it’s the big picture regarding our topic.
The rest of the post will make links between whatever you learn and how it fits within this big picture.
Our ExampleLet’s say we’re trying to build a Machine Learning model for the following data set.
Please take note that I believe newcomers in the field should have more hands-on experience than research.
Therefore, mathematical technicalities like the functions involved and the such will not be touched on in this post.
For now, let’s just keep in mind that the x-axis is the input value and y-axis is the output value in the data set.
If you’ve had any previous experience with Machine Learning model training, you probably know that we have a few options here.
However, for simplicity reasons, let’s choose Univariate Linear Regression in our example.
Linear Regression allows us to map numeric inputs to numeric outputs, fitting a line into the data points.
This line-fitting process is the medium of both overfitting and underfitting.
The training stageTraining the Linear Regression model in our example is all about minimizing the total distance (i.
cost) between the line we’re trying to fit and the actual data points.
This goes through multiple iterations until we find the relatively “optimal” configuration of our line within the data set.
This is exactly where overfitting and underfitting occur.
In Linear Regression, we would like our model to follow a line similar to the following:Even though the overall cost is not minimal (i.
there is a better configuration in which the line could yield a smaller distance to the data points), the line above fits within the trend very well, making the model reliable.
Let’s say we want to infer an output for an input value that is not currently resident in the data set (i.
The line above could give a very likely prediction for the new input, as, in terms of Machine Learning, the outputs are expected to follow the trend seen in the training set.
OverfittingWhen we run our training algorithm on the data set, we allow the overall cost (i.
distance from each point to the line) to become smaller with more iterations.
Leaving this training algorithm run for long leads to minimal overall cost.
However, this means that the line will be fit into all the points (including noise), catching secondary patterns that may not be needed for the generalizability of the model.
Referring back to our example, if we leave the learning algorithm running for long, it cold end up fitting the line in the following manner:This looks good, right?.Yes, but is it reliable?.Well, not really.
The essence of an algorithm like Linear Regression is to capture the dominant trend and fit our line within that trend.
In the figure above, the algorithm captured all trends — but not the dominant one.
If we want to test the model on inputs that are beyond the line limits we have (i.
generalize), what would that line look like?.There is really no way to tell.
Therefore, the outputs aren’t reliable.
If the model does not capture the dominant trend that we can all see (positively increasing, in our case), it can’t predict a likely output for an input that it has never seen before — defying the purpose of Machine Learning to begin with!Overfitting is the case where the overall cost is really small, but the generalization of the model is unreliable.
This is due to the model learning “too much” from the training data set.
This may sound preposterous, as why would we settle for a higher cost when we can just find the minimal one?.Generalization.
The more we leave the model training the higher the chance of overfitting occurring.
We always want to find the trend, not fit the line to all the data points.
Overfitting (or high variance) leads to more bad than good.
What use is a model that has learned very well from from the training data but still can’t make reliable predictions for new inputs?UnderfittingWe want the model to learn from the training data, but we don’t want it to learn too much (i.
too many patterns).
One solution could be to stop the training earlier.
However, this could lead the model to not learn enough patterns from the training data, and possibly not even capture the dominant trend.
This case is called underfitting.
Underfitting is the case where the model has “ not learned enough” from the training data, resulting in low generalization and unreliable predictions.
As you probably expected, underfitting (i.
high bias) is just as bad for generalization of the model as overfitting.
In high bias, the model might not have enough flexibility in terms of line fitting, resulting in a simplistic line that does not generalize well.
Bias-variance trade-offSo what is the right measure?.Depending on the model at hand, a performance that lies between overfitting and underfitting is more desirable.
This trade-off is the most integral aspect of Machine Learning model training.
As we discussed, Machine Learning models fulfill their purpose when they generalize well.
Generalization is bound by the two undesirable outcomes — high bias and high variance.
Detecting whether the model suffers from either one is the sole responsibility of the model developer.