From prediction to decision makingWhy your predictions might be falling short — opinionKristofer Rolf SöderströmBlockedUnblockFollowFollowingJan 23Photo by Mika Baumeister on Unsplash“There are a number of gaps between making a prediction and making a decision” Susan Athey Correlation does not imply causationThis is one of the most repeated phrases in statistical testing.
It’s done so for a reason, I believe, and that is the level of abstraction of these two concepts.
It’s messy and complicated and we need to remind ourselves about the difference when interpreting results.
It seems however, that this distinction is not as marked in the fields of Machine Learning, Big Data Analytics, or Data Science and it is perhaps not too surprising.
The apparent emphasis on the field has been to increase prediction accuracy with more data and fine tuning existing models.
In my experience with online courses and tutorials, less work seems to be dedicated to the steps after the accuracy of a prediction reaches a desired level.
While doing research on related literature for my application, I could mistakes being made ranging from the relatively harmless confusion between the terms predict, correlate, cause and explain to the spurious claim that a variable — or set of features or covariates — is explaining the behaviour of a model, instead of just signalling an association between themselves.
This is not a problem in image recognition, natural language processing, sentiment analysis, etc.
where practitioners are concerned about mapping features to each other.
The problem comes when the same techniques and philosophy are applied in the realm of decision or policy making.
Why it’s a problemMapping a correlation does not mean we understand the problem at hand.
To illustrate, we can take a look at a simplified example: Imagine having historical data of hotel prices and occupancy rates.
Hotel prices are set by software that increases the prices as a function of the occupancy rate (higher occupancy results in higher prices) An off-the-shelf ML algorithm will identify a positive correlation between prices and occupancy.
If you want to predict the occupancy rate at any given price rate, this is the right methodology.
However, if you want to know the effect of increasing the prices on the occupancy rate, your predictive model would likely say that increasing prices will sell more rooms.
This is very unlikely to be the case and a different set of statistical techniques are required to answer questions like these.
What we can doJust understanding this limitation and communicating it to decision makers would take us a long way.
Identifying and using the appropriate methodology to address the question is the desired outcome.
There is an ongoing interest in combining lessons from econometrics and machine learning to extend our understanding of prediction and causal inference  As usual, the best we can do is keep learning and stay informed of developments in related fields.
A good place to start is here .
 Athey, S.
, Beyond prediction: Using big data for policy problems, (2017), Science Vol.
355, Issue 6324, pp.
483–485 Athey, S.
, The Impact of Machine Learning on Economics, (2017), Chapter in forthcoming NBER book The Economics of Artificial Intelligence: An Agenda, Ajay K.
Agrawal, Joshua Gans, and Avi Goldfarb Imbens, G.
, & Rubin, D.
, Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction, (2015), Cambridge University Press.. More details