Statistics is the Grammar of Data Science — Part 5/5

Statistics is the Grammar of Data Science — Part 5/5Statistics refresher to kick start your Data Science journeySemi KoenBlockedUnblockFollowFollowingFeb 16This is the 5th (and last) article of the ‘Statistics is the Grammar of Data Science’ series, covering the Conditional Probability and Bayes’ Theorem and their importance.

Ready?RevisionBookmarks to the rest of the articles for easy access:Article SeriesPart 1: Data Types | Measures of Central Tendency | Measures of VariabilityPart 2: Data DistributionsPart 3: Measures of Location | MomentsPart 4: Covariance | CorrelationPart 5: Conditional Probability | Bayes’ Theorem ????Conditional ProbabilityConditional probability is the likelihood of an event occurring, based on the occurrence of a previous event.

The notation for conditional probability is P(A|B), read as ‘the probability of A given B’.

The formula for conditional probability is:Conditional Probability of A given BA ∩ B is the intersection of A and B in a Venn Diagram, so P(A∩B) is the probability that both A and B occurring.

As such:The probability of A given B is equal to the probability of A and B occurring over the probability of B alone occurring.

Venn diagram: Probabilities of events A and B.

✏️ Example:Let’s suppose we are drawing three marbles from a bag — red, green and blue.

What is the conditional probability of drawing the blue marble after already having drawn the red one?If Event A is the event of getting a red marble first— and — Event B is the event of getting a blue marble, we are looking to calculate P(A∩B):P(A) is the probability of getting a red marble in the first turn.

As it is one possible outcome out of three:P(A) = 1/3 = 33.

33%P(B|A) is the probability of getting a blue marble in the second turn.

As there will be two marbles remaining:P(B) = 1/2 = 50%P(A ∩ B) is the probability of drawing a red marble in the first turn and a blue one in the second turn:P(A ∩ B) = P(A) * P(B|A) = 33.

33% * 50% = 16.

66%To visualise this we can use a tree diagram — each branch has a conditional probabilityTree Diagram for the marble bag exampleCausationWe saw in Part 4 what causation means.

Conditional probability does not indicate that there is necessarily a causal relationship between the two events, nor that both events occur simultaneously.

Independent EventsIndependent are those events whose outcome does not influence the probability of the outcome of the other event.

Due to this reason:P(A|B) = P(A)P(B|A) = P(B)Mutually Exclusive EventsMutually exclusive are those events that cannot occur simultaneously, i.

e.

if one event has already occurred, the other cannot occur.

Thus:P(A|B) = 0P(B|A) = 0Neon sign showing the simple statement of Bayes’ theorem.

Courtesy: WikipediaBayes’ TheoremHaving just explored what the Conditional Probability is, let’s take a look at the Bayes’ Theorem.

It simply says:The probability of A given B is equal to the probability of B given A times the probability of A, over the probability of B.

Bayes’ RuleIn other words it provides a useful way of going between P(A|B) and P(B|A).

On a larger scale, it is the cornerstone of an entire field within Statistics (‘Bayesian Statistics’).

It is employed in various disciplines, with medicine and pharmacology as the most notable examples, but is also used in finance, for instance to forecast the probability of the success of an investment.

Quantity is almost as important as quality when applying those conditional variables to Bayes Theorem.

For example, let’s assume we would like to calculate the risk of lending money to a borrower.

If we factor other probabilities, like the borrower’s age, their credit rating and their risk appetite, the likelihood lending money to a specific individual can vary.

As the theory itself presupposes:The more variables that are in play, and the more certain we become about those variables, the more certain an accurate conclusion can be drawn, using conditional probabilities!✏️ Example:Let’s suppose we are doctors in a clinic and we know that: — the probability of a patient having a liver disease is 20% — the probability of being an alcoholic is 5% and — that among those patients diagnosed with liver disease 10% are alcoholics.

We would now like to find out:What is the conditional probability of a patient having liver disease if they are an alcoholic?P(A) is the probability of having a liver disease:P(A) = 20%P(B) is the probability of being an alcoholic:P(B) = 5%P(B|A) is the conditional probability of a patient being an alcoholic given they have liver disease:P(B|A) = 10%P(A|B) is the conditional probability of a patient having a liver disease given they are alcoholics:P(A|B) = (10% * 20%) / 5% = 40%Mutually Exclusive EventsA special case of the Bayes’ theorem is when the event A is a binary value.

In such a case, we symbolise as ‘A–’ that the event A has occurred and ‘A+’ that the event A has not occurred (i.

e.

the events A– and A+ are mutually exclusive).

The formula is expressed as follows:Bayes’ Rule for mutually exclusive events A- and A+AccuracyA true positive is an outcome where the model correctly predicts the positive class.

Similarly, a true negative is an outcome where the model correctly predicts the negative class.

A false positive is an outcome where the model incorrectly predicts the positive class.

Similarly, a false negative is an outcome where the model incorrectly predicts the negative class.

Sensitivity indicates the probability that a test will correctly predict a case → True positives.

Specificity indicates the probability that the test will correctly identify a negative case → True negatives.

Sensitivity = true positives / (true positives + false negatives)Specificity = true negatives / (true negatives + false positives)✏️ Example:Let’s suppose a certain disease has an incidence rate of 2%.

If the false negative rate is 10% and the false positive rate is 1%:What is the conditional probability of an individual who tests positive to actually have the disease?If Event A is the event that the individual has this disease — and — Event B is the event that the individual tests positive, we are looking to calculate P(A|B) i.

e.

given that the individual tests positive (B), they actually have the disease (A).

P(A+) or P(A) is the probability of having the disease:P(A+) = 2%P(A–) is the probability of not having the disease, which can be derived:P(A-) = 100% – 2% = 98%P(B|A–) is the probability of false positive i.

e.

not having the disease (A–) but getting positive (B):P(B|A-) = 1%P(B–|A) is the probability of a false negative i.

e.

having the disease (A) but getting negative (B–)P(B-|A) = 10%P(B|A+) or P(B|A) is the true positive i.

e.

having the disease (A) and getting positive (B), which can be derived:P(B|A) = 100% – 10% = 90%We now have all the info to calculate the P(A|B):P(A|B) = [ P(B|A) x P(A) ] / [ P(B|A-) x P(A-) + P(B|A+) x P(A+) ] = [ 90% x 2% ] / [ 1% x 98% + 90% x 2% ] = = 64.

7%We can visualise the test accuracy in a matrix:And in a tree diagram:Over and Out!.Today we engaged in learning that Conditional probabilities reflect the influence of one event on the probability of another.

Then we drilled into the Bayes’ Theorem, which is fundamental in Statistics as it allows for probabilistic inference.

The EndPhoto by Matt Botsford on UnsplashThis is the end of the ‘Statistics is the Grammar of Data Science’ sequel!.I hope you are now equipped with the knowledge to start your Data Science journey!.Please drop me a line in the comments if you would like anything else covered and will try to add it to one of my next articles.

Thanks for reading!I regularly write about Technology & Data on Medium — if you would like to read my future posts then please ‘Follow’ me!.