because P(Y|X) is the direct variation of P(X,Y), so people are “justified” to use P(Y|X) to approximate P(X,Y).

However, on some occasions it would cost us to do the approximation.

Let’s investigate another example.

Supposed I was having lunch at restaurant located at a central business district.

My friend told me that the people working in financial firms are required to wear suits.

This is actually saying that the conditional probability P(a person wearing a suit|a person working in financial firms) is high, and we are near the financial centre so P(a person working in financial firms) is also relatively high.

Thus, the joint probability P(a person wearing a suit, a person working in financial firms) is also quite high.

Then, a person wearing a suit come in, I naturally ask: how likely is that person working in financial firms?You might think, quite likely, right?.But the probability might not be as high as you might think.

Why?.Because there are also lots of lawyers, government officials and staff of restaurants working nearby and they are all required to wear suits!(equivalently, this is saying P(a person wearing a suit, a person working in non-financial firms) = P(a person wearing a suit|a person working in non-financial firms)*P(a person working in non-financial firms) is also “high”.

) So, just observing a person wearing a suit could not help us much in determining where the person works.

If we ignore the counter case that a lot other people also need to wear suits, we are in trouble making the judgment that P(a person working in financial industry|a person wearing a suit) is high.

Therefore, source of the bias is that we sometimes ignore the counter case.

Recalled that P(y|x) = P(x,y)/(P(x,y)+P(x,~y).

If the counter case P(x,~y) is rare, we could safely ignore it to use P(x,y) to approximate P(y|x).

However, when the counter case P(x,~y) is also very common, the denominator becomes large and P(y|x) becomes “smaller”, or at least not as high as you might originally think.

(Exercise: Supposed that your father has beard (and you have not met him for a long time).

What is the the probability that the person you encounter in the street is your father given the person has beard?)The Source of Human Bias — Ignoring the PriorLet Y be a person is a professor, X be a person brining a book.

Supposed I observed that professors always love to bring a book with them (P(x|y) is high).

I sat at a restaurant.

A man with a book with him came in, what is the probability that he is a professor P(y|x)?Well, you might think it is high as not many people bring a book with them.

This might be true, but the thing is professors are very rare and I am unlikely to meet one normally (so P(y) is low)!.If the prior probability that observing a professor is super low, then the nominator of P(y|x) would be very low as well!(Exercise: What is the probability of getting an acoustic neuroma (a type of rare cancer) given the symptoms of feeling dizzy is observed?)Discussion of the DataThe data collection method I used is not random sampling.

This is problematic if the population I concerned is the all the people but I only include people that I am familiar with.

The samples are not representative enough for the whole population.

Besides, the way I discretise the continuous data (the net worth) is subjective, and could have large influence on the result especially if the dataset is a large one.

Things Not Discussed HereBayes’ theorem can be used to encode our beliefs as prior and we can do inference by calculating the posterior probability.

The concepts are similar to the examples above but they are not directly dealt with and discussed in the article.

Only two random variables are used in the above examples and they are both discrete.

This is done so to allow us to focus on the trunk and understand the calculations of Bayes’ theorem.

However, there are some more interesting cases when more random variables are involved, and it is more obvious in the field of probabilistic graphical model, which I am also working on.

Probability distribution is not discussed here for simplicity.

Bayesian statistics treated the parameters of a probability distribution as random variables, which is not mentioned as well.

ConclusionThis article used several simple real life examples to go through the concepts and calculations of Bayes’ theorem and see why some human biases arise.

I hope this article can crystallise the equation for you and you can have a deeper understanding of this important theorem!Reference[1] Daniel, K.

(2011).

Thinking, fast and slow.

Allen Lane, New York, USA.

[2] Blitzstein, J.

K.

, & Hwang, J.

(2014).

Introduction to probability.

Chapman and Hall/CRC.

.