This would mean that everything above a certain shininess level is real gold, and everything below is not.
There’s no overlap.
Our spectrum would look something like this:The “perfect” scenario.
Here, all that glitters is actually gold.
So what would the ROC curve in this case look like?Again, let’s start with the highest possible threshold, where we only collect the most shiny objects.
These will all be gold.
As we move our threshold down to include less shiny stuff, we would still get nothing but gold (see the graph below).
Our sensitivity is increasing from 0% to 100%, and there is no fallout at all.
This would continue until we hit the threshold of shininess that separates gold from rocks, at point (2).
At this point we have collected all of the gold.
After this, if we collect any duller objects, we would only be adding rocks to our pile, so our fallout would start to increase from 0% to 100%.
This would keep going until we had collected everything at (3).
At this point we’ll have all of the gold and all of the rocks.
Worst or random caseSuppose we based our decision on a feature that has nothing to do with whether or not an object is gold.
Say we picked objects based on how much they look like our grandmother, or based on how close the nearest snail is.
In this case, we’re taking blind shots in the dark.
We’re not being specific at all.
It’s as if we put all the objects into a big bag, reached inside without looking and picked them out one at a time.
The chances that we’ll pick gold are about the same as the percentage of the bag that is actually gold.
If 20% of the bag is gold, then our odds are 20% that we’ll get gold.
Our spectrum would look something like this:The “random” decision scenario.
Gold is spread evenly across the spectrum.
Here, no matter where we put our threshold, the percentage of the gold on the left of the threshold is the same as the percentage of the worthless stuff on the left of the threshold.
Being more “picky” about our criteria (i.
1m away from a snail vs 3m away from a snail) doesn’t improve our odds of getting gold instead of rock.
In this case the sensitivity and the fallout will always stay equal.
The graph in the random scenario looks like a straight line starting from the bottom left and ending at the top right.
The fallout, as a percentage, is always the same as the sensitivity.
What are we trying to find?Our ultimate goal is to find the set of features that best predict success.
We want sensitivity to increase faster than the fallout.
If we do find this criteria, our odds of picking gold should increase the more picky we are.
If it turns out we collect gold about as often as luck, then the feature we’re basing our decisions on is not a good one.
If we look at the three graphs we’ve drawn so far, we can see a pattern.
The curves for good features tend to be closer to the left and the top of the graph.
This makes the pink area under them bigger.
In fact, the Area Under the Curve (AUC) is a useful way to see if the feature we’re basing our decisions on (shininess, or anything else) is a good one.
Calculating areas is something a computer can do for us.
By testing a lot of different features (shininess, distance to a snail, etc) automatically and going through this process, we can find the best feature or combination of features to base our decisions on.
In each case, we start from a high threshold, slowly decrease it, plot the curve, and measure the area under it.
A computer can quickly pick the best feature, or combination of features, that predicts if something is gold.
Of course we have to run a lot of experiments.
This is made easier if you have a lot of data already at hand.
ROC curves can help you make sense of data you already have, they find events or features that are good predictors for the thing you want to find.
If we have a database of rock samples that records their shape, weight, colour, shininess, etc, as well as whether or not they are gold, a computer could automatically figure out which of those features is the best predictor of real gold.
How is this connected to Machine Learning?When we base our decisions on some feature, we are using a simple model.
You’ll hear the term “model” used in Machine Learning a lot.
A model is a set of criteria, or a process, we use to make decisions.
Models in Machine Learning are often more complicated than the one in our example, but they all serve the same purpose.
Playing the role of the prospector, your decision-making went something like this:(See an object)"If the object is shinier than my threshold, pick it up.
Otherwise, leave it on the floor"This is your simple mental model.
You may change it depending on how useful it is to you.
You might add more features, such as it’s weight, or it’s weight relative to it’s size (density).
All of these then become part of your model.
The benefit of models is that they can be used in future decisions.
If you found that shininess, or some other group of features, is a good way to sort the gold from the rocks, you might tell your close friends to use that same model to make their decisions.
You’d also use the same model yourself the next time you were out in the field.
You can see other real life cases where having a good model would be important.
If you’re hiring someone into a company, you want to know, during the interview, if the person would be a good team member or not.
So you decide to look for some feature; for example, how well-dressed they are, or how quickly they answer a particular question.
You hope that this will be a good predictor of their productivity in the company, but the only way you know is by crunching the numbers.
Updating our model to make better decisions is the “learning” part of Machine Learning.
In Machine Learning, programs automatically run through a lot of data and find useful relationships to build useful models.
ROC curves help us decide if the model is a good one, or if it needs to be updated.
How Machine Learning algorithms actually create these models will be the subject of another post.
Extra credit: other useful termsThe Confusion Matrix has a lot of terms in it.
Here are some of the other terms and what they mean for our prospector.
Miss rateMiss rate is the other half of sensitivity.
Together, both terms help you understand how much of the total gold you got, and how much you left behind.
Miss Rate : gold I left behind —————— all goldi.
How much of the gold did I leave behind?Miss rate and sensitivity add up to 100% of all the gold.
False omission rate and negative predictive rateThese two terms help you understand what the landscape you left behind looks like, i.
how much of it is gold, and how much of it is worthless.
False Omission Rate : gold I left behind ———————— all things I left behindi.
How much of the stuff left behind was a mistake?Negative Predictive Rate : rocks I left behind ———————— all things I left behindi.
How much of the stuff left behind was I right to leave behind?Together, false omission rate and negative predictive rate add up to 100% of the stuff that was left behind.
Precision and false discovery rateThese two terms help you understand how much of the stuff in your wheelbarrow is gold, and how much is worthless.
Precision : gold in my wheelbarrow —————————- all things in my wheelbarrowi.
How much of my wheelbarrow is actually gold?False Discovery Rate : worthless stuff in my wheelbarrow ——————————— all things in my wheelbarrowi.
How much of the stuff in my wheelbarrow is worthless?Precision and false discovery rate add up to 100% of the wheelbarrow.
(Originally published at https://mycardboarddreams.