The answer to all of these questions is almost certainly because we as humans, we generalize with incredible ease.
On the other hand, machine learning very much struggles to do any of these things; it is only effective in classifying that one specific image.
While machine learning may be able to achieve superhuman performance in a certain field, the underlying algorithm will never be effective in any other field than the one it was explicitly created for because it has no ability to generalize outside of that domain.
Generalization, in that sense, refers to the abstract feature of intelligence which allows us to be effective across thousands of disciplines at once.
There is a terminology used in machine learning when we talk about how well a machine learning model learns and generalizes to new data, namely overfitting and underfitting.
Overfitting and underfitting are the two biggest causes for the poor performance of machine learning algorithms.
Goodness of fitIn statistics, The goodness of fit of a describes how well it fits for a set of observations.
Statistics often describe the goodness of fit which refers to measures used to estimate how well the approximation of the function matches the target function.
The black line fits the data well, the green line is overfitting.
Overfitting vs UnderfittingWe can understand overfitting better by looking at the opposite problem, underfitting.
Underfitting occurs when a model is too simple — informed by too few features or regularized too much — which makes it inflexible in learning from the dataset.
Simple learners tend to have less variance in their predictions but more bias towards wrong outcomes.
On the other hand, complex learners tend to have more variance in their predictions.
Let me give you an analogy to explain overfitting and underfitting.
Overfitted models are like subject matter experts:They know a lot about a particular field for example subject matter experts.
You ask them anything about the functionality of their tool(even in details), they’ll probably be able to answer you and that too pretty precisely.
But when you ask them why the oil price fluctuate, they’ll probably make an informed guess and say something peculiar.
In terms of machine learning, we can state them as too much focus on the training set (programmers) and learns complex relations which may not be valid in general for new data (test set).
Underfitted models are like those Engineers who wanted to be cricketers but forced by their parents to take up engineering.
They will neither know engineering nor cricket pretty well.
They never had their heart in what they did and have insufficient knowledge of everything.
In terms of machine learning, we can state them as too little focus on the training set.
Neither good for training not testing.
How to detect underfitting?A model under fits when it is too simple with regards to the data it is trying to model.
One way to detect such a situation is to use the bias-variance approach, which can be represented like this:Your model is under fitted when you have a high bias.
How to avoid underfitting :More data will not generally help.
It will, in fact, likely increase the training error.
Therefore we should increase more features.
Because that expands the hypothesis space.
This includes making new features from existing features.
Same way more parameters may also expand the hypothesis space.
How to detect Overfitting?A key challenge with overfitting, and with machine learning in general, is that we can’t know how well our model will perform on new data until we actually test it.
To address this, we can split our initial dataset into separate training and test subsets.
This method can approximate how well our model will perform on new data.
If our model does much better on the training set than on the test set, then we’re likely overfitting.
For example, it would be a big red flag if our model saw 95% accuracy on the training set but only 48% accuracy on the test set.
How to Prevent OverfittingDetecting overfitting is useful, but it doesn’t solve the problem.
Fortunately, you have several options to try.
Here are a few of the most popular solutions for overfitting:Cross-Validation: A standard way to find out-of-sample prediction error is to use 5-fold cross-validation.
Early Stopping: Its rules provide us with guidance as to how many iterations can be run before the learner begins to over-fit.
Pruning: Pruning is extensively used while building related models.
It simply removes the nodes which add little predictive power for the problem in hand.
Regularization: It introduces a cost term for bringing in more features with the objective function.
Hence it tries to push the coefficients for many variables to zero and hence reduce cost term.
Remove features: Some algorithms have built-in feature selection.
For those that don’t, you can manually improve their generalizability by removing irrelevant input features.
An interesting way to do so is to tell a story about how each feature fits into the model.
This is like the data scientist’s spin on software engineer’s rubber duck debugging technique, where they debug their code by explaining it, line-by-line, to a rubber duck.
Train with more data: It won’t work every time, but training with more data can help algorithms detect the signal better.
In the earlier example of modelling height vs.
age in children, it’s clear how sampling more schools will help your model.
Of course, that’s not always the case.
If we just add more noisy data, this technique won’t help.
That’s why you should always ensure your data is clean and relevant.
Ensembling: Ensembles are machine learning methods for combining predictions from multiple separate models.
There are a few different methods for ensembling, but the two most common are:Bagging attempts to reduce the chance of overfitting complex models.
It trains a large number of “strong” learners in parallel.
A strong learner is a model that’s relatively unconstrained.
Bagging then combines all the strong learners together in order to “smooth out” their predictions.
Boosting attempts to improve the predictive flexibility of simple models.
It trains a large number of “weak” learners in sequence.
A weak learner is a constrained model (i.
you could limit the max depth of each decision tree).
Each one in the sequence focuses on learning from the mistakes of the one before it.
Boosting then combines all the weak learners into a single strong learner.
While bagging and boosting are both ensemble methods, they approach the problem from opposite directions.
Bagging uses complex base models and tries to “smooth out” their predictions while boosting uses simple base models and tries to “boost” their aggregate complexity.
In case you want to know more about the ensemble models, the important techniques of ensemble models: bagging and boosting.
Please go through my previous story in the link below.
Difference between Bagging and Boosting?Bagging and boosting are commonly used terms by various data enthusiasts around the world.
But what exactly bagging and…medium.
comI hope this article would have given you a solid understanding of this topic.
In this post, you have learned about the terminology of generalization in machine learning of overfitting and underfitting.
Do you have any questions about overfitting, underfitting or related to this post?.Leave a comment and ask your question and I will do my best to answer it.
Thanks for reading!.❤.. More details