Why you should pick your hypotheses BEFORE looking at your dataRyan GotesmanBlockedUnblockFollowFollowingApr 27While sitting in statistics class my prof would often repeat the importance of deciding the hypotheses one wants to test before looking at the data.
He would scoff at the idea of data-mining and claim anyone that formed their hypotheses after looking at data was doomed to failure.
He never said why.
I have seen similar thoughts echoed in guidelines for designing medical trials: Before starting a randomized control trial to test a drug’s efficacy it is crucial to state the hypotheses you expect before hand.
I have wondered about the reason behind this for quite some time and recently having figured it out I thought I’d share why with an example.
Imagine we have a variable like height.
It is normally distributed.
We sample 3 people from a population, measure their heights and take an average.
We shall call this X1.
We sample 3 different people from the SAME population, average their heights and call this X2.
Now imagine we did a two-tailed t-test assessing if X1=X2 with alpha = 0.
The null hypothesis is:and alternative hypothesis isSince the 2 means come from the same population we expect p>0.
05 and that we will fail to reject the null.
However, there is always the rare chance one group is sampled exclusively from 1 tail of the distribution while the other is sampled from the other tail and the difference between X1 and X2 is unexpectedly large.
In such cases we will incorrectly reject the null hypothesis and based on our alpha the probability of this happening is 5%.
We can test this by simulation where we repeatedly sample 2 groups of 3 from the same normal distribution and apply a t-test on their means.
We do this 100000 times and plot a histogram of p-values below.
As expected we get a uniform distribution and about 5% of p-values fall between 0 to 5%.
These are false positives and an important point to dwell on.
Even though these groups came from the same distribution, if we did not know this and just looked at the means there is a 5% chance we would incorrectly conclude they come from different distributions.
This error rate is inherent in statistics and we accept there is a 5% chance our conclusion of rejecting the null hypothesis is incorrect.
As long as we are confident it is 5% and stays 5% we are ok.
But now let’s look at what happens when we change our hypotheses after looking at the data.
We simulate this by again drawing 2 groups of 3 from the data.
We sort each group by height and if the middle member of the 1st group is taller than the tallest member of the 2nd group we change our hypotheses to:Note that this is a one-sided hypothesis and it can be easier to get significant results using it.
If the middle member of one group isn’t taller than everyone in the other group we use the above two-sided hypothesis as usual.
This process replicates the behaviour of a researcher who intends to use a two-sided hypothesis but after peaking at the data realizes he has a better chance of getting significant results using a one-sided hypothesis.
Doing another 100000 sample simulation we get the histogram below.
Note what has happened to the number of p-values falling between 0–5%.
It has increased dramatically.
Peaking at the data and changing our hypotheses has made it far more likely to reject the null hypothesis when we shouldn’t.
And the worst part is we don’t even realize this is happening.
We can get far more “significant” results but they may all be false positives.
In medicine and clinical trials we want to avoid incorrect claims of drug efficacy at any cost.
Imagine the harm and expense of putting millions of people on drugs we mistakenly believe to work because a researcher changed their hypotheses after looking at the data and incorrectly rejected the null.
It would be a disaster.
And that is why you probably shouldn’t change your hypotheses after looking at the data.
Let me know your thoughts and if you have any questions in the comments below.
.. More details