Why every data scientist shall read “The Book of Why” by Judea Pearl

Despite numerous algorithms I had acquired, my puzzle remains.Puzzle That Algorithm Itself Cannot SolveIf you are not the kind of data scientist who only cares how to reduce that 0.01% error but try to make sense of your model, you might have questioned yourselves from time to time:Should I add this variable to my model?Why does this counter-intuitive variable show up as a predictive one?Why does this variable suddenly become insignificant if I add another variable?Why does the direction of the correlation being opposite to what you think?Why is the correlation zero when I thought it would be higher?Why does the direction of relationship reverse when I dis-aggregate the data into sub-population?Over the time, I have built up enough sense to tackle these fundamental questions, for example, I know bi-variate relationship can be very different from multi-variate relationship or the data is subject to selection bias..But it lacks a solid framework that I could convince myself definitely and others..More importantly, I might not be aware until a relationship contradicts my sense!.Important to note, when something contradicts, it has already gone very wrong..Without a map, how can I be sure I am not heading to the wrong destination before I know I am lost?Yes, Both Association and Causality Can PredictThe puzzle has completely gone when I read “The Book of Why” by Judea Pearl..Now it is my guide for data science..Here I will tell you WHY briefly..In short, it is causality, the relationship between cause and effect..To predict something in the future, there are two ways:I know that when I see X, I will see Y (Association)I know that X causes Y (Causality)Both ways can predict..Both ways might yield similar model performance..So, what is the difference?.Why bother to understand causality?. More details

Leave a Reply