WHAT and WHY of Log OddsPiyush AgarwalBlockedUnblockFollowFollowingJul 8The three main categories of Data Science are Statistics, Machine Learning and Software Engineering.
To become a good Data Scientist, one needs to have a combination of all three in their quiver.
In this post, I am going to talk about a Log Odds — an arrow from the Statistics category.
When I first began working in Data Science, I was so confused about Log Odds.
I would have questions like What is Log Odds, Why do we need them, etc.
When trying to understand any concept, I like to use the Divide and Understand strategy, i.
, break it into smaller pieces, understand their meanings separately, and then combine this knowledge to get hold of the concept as a whole.
So here, let’s first learn what is meant by Odds and then try to work our way towards understanding Log Odds.
Figure-0: Divide and Understand Log OddsFor the purposes of this explanation, let’s consider a scenario where we play 10 games of chess against an Artificially Intelligent (AI) system and 4 times we are able to beat it (I will be impressed by my chess skills if I am actually able to do that in reality).
Odds and ProbabilityAs per our scenario, there are 4 times I am able to beat the system, so the odds of me winning the game are 4 to 6, i.
, out of total 10 games, I win 4 games and lose 6 games.
Figure-1: Odds of winning are 4 to 6This can alternatively be written as a fractionFigure-2: Odds as a fractionOdds should NOT be confused with ProbabilitiesOdds are the ratio of something happening to something not happening.
In our scenario above, the odds are 4 to 6.
Whereas, Probability is the ratio of something happening to everything that could happen.
So in the case of our chess example, probability is 4 to 10 (as there were 10 games played in total).
Figure-3: Odds v/s ProbabilityPer our example,Odds of winning: 4/6 = 0.
6666Probability of winning: 4/10 = 0.
40Probability of losing: 6/10 = 0.
60which is also equal to 1 – Probability of winning: 1 – 0.
40 = 0.
60Given the Probability, we can also calculate the Odds as belowFigure-4: Calculating Odds, given ProbabilityLog OddsSo now that we understand Odds and Probability, let’s try to understand Log Odds and why do we actually need them.
Log Odds is nothing but log of odds, i.
In our scenario above -Figure-5: Odds on a Number LineOdds of winning are between 0 and 4, whereas the odds of losing range from 4 to infinity, which is a very vast scale.
This makes our magnitude of odds of winning look much smaller to that of losing.
Not Fair!! :(So what can we do to make it fair?.You guessed it right, take a log.
Taking the log of odds make it look symmetrical, thereby solving our problem.
Odds of winning = 4/6 = 0.
6666log(Odds of winning) = log(0.
6666) = -0.
176Odds of losing = 6/4 = 1.
5log(Odds of losing) = log(1.
5) = 0.
176Figure-6: log(odds) on a Number LineLook at that, it looks so symmetrical and a fair comparison scale now.
So basically using the log function helped us making the distance from origin (0) same for both odds, i.
e, winning and losing.
You can now see how important this can be.
But wait…Did you also know that understanding all this also helps us understand the basics of a very important function, the Logit Function, which is the basis for one of the most commonly used machine learning algorithms, Logistic Regression.
Let that sink in!!Figure-7: Logit FunctionConclusionSo hopefully this post helped you get a better understanding of Odds, Probability, Log Odds (same as log(Odds)) and Logit Function.
Before concluding, one thing I want to point out here is the usefulness of log(odds).
Figure-8: log(odds) helps getting a Normal distributionYou can see from the plot on the right that how log(odds) helps us get a nice normal distribution of the same plot on the left.
This makes log(odds) very useful for solving certain problems, basically ones related to finding probabilities in win/lose, true/fraud, fraud/non-fraud, type scenarios.
.. More details