Log Book —Guide to Hypothesis TestingThis is a guide to Hypothesis testing.
I have tried to cover the basics of theory and practical implementation with a step by step example.
Dip Ranjan ChatterjeeBlockedUnblockFollowFollowingJul 7For any research that we do we are basically trying to answer a question or hypothesis.
One method of answering such question/hypothesis is called hypothesis testing or significance testing.
The structure of hypothesis testingWhat hypothesis testing does is it provides a path following which we can test or prove our question/hypothesis effectively.
The below are the steps which are typically followed while doing hypothesis testing:1.
Define the research hypothesis for the study.
Explain how you are going to operationalize (that is, measure or operationally define) what you are studying and set out the variables to be studied.
Set out the null and alternative hypothesis (or more than one hypothesis; in other words, a number of hypotheses).
Set the significance level.
Make a one or two-tailed prediction.
Select an appropriate statistical test based on the variables you have defined and whether the distribution is normal or not.
Run the statistical tests on your data and interpret the output.
Reject or fail to reject the null hypothesis.
SourceThere might be some variations to this but in most cases this is structure that is followed.
An exampleResearch HypothesisA company needs to purchase company cars for their sales staff.
They contacted their local Ford dealership and inquired as to the price of a 2007 Ford Explorer 4X4 XLT.
The dealership stated that the average price (μ) of this vehicle in Maryland is $29,995.
Now before purchasing the company wanted to confirm if that is indeed true.
OperationalizeThe company took a random sample of 20 (n) Ford dealerships in Maryland and found that the mean price was $30,474.
80 and the sample had a standard deviation of $1,972.
com | | 27,595 |+————————+—+——–+| www.
com | | 30,250 |+————————+—+——–+| www.
com | | 32,705 |+————————+—+——–+| www.
com | | 28,485 |+————————+—+——–+| www.
com | | 31,295 |+————————+—+——–+| www.
com | | 30,075 |+————————+—+——–+| www.
com | | 30,505 |+————————+—+——–+| www.
com | | 30,820 |+————————+—+——–+| www.
com | | 25,995 |+————————+—+——–+| www.
com | | 32,830 |+————————+—+——–+| www.
com | | 31,875 |+————————+—+——–+| www.
com | | 32,285 |+————————+—+——–+| www.
com | | 32,580 |+————————+—+——–+| www.
com | | 28,915 |+————————+—+——–+| www.
com | | 29,940 |+————————+—+——–+| www.
com | | 31,720 |+————————+—+——–+| www.
com | | 30,265 |+————————+—+——–+| www.
com | | 32,555 |+————————+—+——–+| www.
com | | 31,580 |+————————+—+——–+| www.
com | | 27,226 |+————————+—+——–+The null and alternative hypothesisIn order to undertake hypothesis testing we need to express our research hypothesis as a null and alternative hypothesis.
The null hypothesis and alternative hypothesis are statements regarding the differences or effects that occur in the population.
We will use our sample to test which statement (i.
, the null hypothesis or alternative hypothesis) is most likely (although technically, we test the evidence against the null hypothesis).
The null hypothesis is essentially the “devil’s advocate” position.
That is, it assumes that whatever we are trying to prove did not happen.
Null Hypotheses (H₀):The null hypothesis states that our company believes that the average cost of a 2007 Ford explorer is more than the quoted $29,995 price from the local Ford Dealership.
μ > 29995Alternative Hypothesis (Hᴀ):Our alternate must be the opposite of the null hypothesis therefore that the average price is not greater then the quoted price from the local Ford dealership.
μ < 29995Significance levelsThe level of statistical significance is often expressed as the so-called p-value.
Depending on the statistical test we will choose later, we will calculate a probability (i.
, the p-value) of observing our sample results (or more extreme) given that the null hypothesis is true.
Before going further let us first understand the meaning of P values the easy way:P-ValuesThe definition from the American Statistical Association:In statistical hypothesis testing, the p-value or probability value or asymptotic significance is the probability for a given statistical model that, when the null hypothesis is true, the statistical summary (such as the sample mean difference between two compared groups) would be the same as or of greater magnitude than the actual observed results.
In academic literature, the p-value is defined as the probability that the data would be at least as extreme as those observed, if the null hypothesis were true.
“at least as extreme as those observed”- What is meant by that?To answer that question, let’s do an experiment with a coin.
A friend of mine gave me a coin and claimed that it was special.
I decided to test it out.
My null hypothesis was: “The coin is normal”.
Then I tossed the coin 20 times.
I got 18 heads and 2 tails.
Normally, when tossing a coin, the expectation is that it will be 50% heads and 50% tails so I expected a result of 10 heads + 10 tails to be the most frequent, followed by the results of 9 heads + 11 tails and 11 heads + 9 tails.
The result of 18 heads + 2 tails goes to the periphery of the probability curve (that is, more extreme).
The p-value is the probability of the observed outcome plus all “more extreme” outcomes (in this case: 19 heads + 1 tail and 20 heads + 0 tail), represented by the shaded “tail area”.
Often, statisticians calculate a two-tail p-value(we will come to this shortly), so we take both extremes of the graph, i.
18 tails + 2 heads, 19 tails + 1 heads and 20 tails + 0 head are also included.
P-value = probability that the data would be at least as extreme as those observed = [[p (18 heads and 2 tails) + p (19 heads and 1 tails) + p (20 heads and 0 tail)]] + [[ p (18 tails + 2 heads) + p (19 tails + 1 heads) + p (20 tails + 0 head)]] = 0.
0004 ** [] represents each tailThe chance of obtaining such a result is so small, if the coin were normal.
So I reject the null hypothesis, and accept that the coin is special.
In the majority of analyses, α of 0.
05 is used as the cutoff for significance and we will do so in our case too, i.
if the p-value is less than 0.
05, we will reject the null hypothesis.
Type 1 & 2 ErrorsUnderstanding Type 1 errorsType 1 errors — often called ‘false positives’ — happen in hypothesis testing when the null hypothesis is true but rejected.
Simply put they happen when the tester validates a statistically significant difference even though there isn’t one.
Example of type I error is if the the coin above was actually normal when we deduced it to be special and it was just dumb luck that we got 18 heads and 2 tails.
Type 1 errors have a probability of “α” correlated to the level of confidence that you set.
A test with a 95% confidence level means that there is a 5% chance of getting a type 1 error.
This is the risk in our case too, the 5% chance may play against us.
Understanding type 2 errorsType 2 errors are referred to as ‘false negatives’, these errors happen when the null hypothesis is false and we subsequently fail to reject it.
Example of type II errors would have been if the coin above was actually special but we failed to reject that it was normal.
One tailed or two tailed testMore often than not we have to decide if our statistical test should be a one-tailed test or a two-tailed test (also known as “directional” and “non-directional” tests respectively).
So, what exactly is the difference between the two?.First, it may be helpful to know what the term “tail” means in this context.
The tail refers to the end of the distribution of the test statistic for the particular analysis that you are conducting.
For example, a t-test uses the t distribution, and ANOVA uses the F distribution.
The distribution of the test statistic can have one or two tails depending on its shape (see the figure below).
The black-shaded areas of the distributions in the figure are the tails.
Symmetrical distributions like the t and z distributions have two tails.
Asymmetrical distributions like the F and chi-square distributions have only one tail.
This means that analyses such as ANOVA and chi-square tests do not have a “one-tailed vs.
two-tailed” option, because the distributions they are based on have only one tail.
But what if we are conducting a t-test, which has a two-tailed distribution?.Now we will have to decide if a one-tailed or two-tailed test is most appropriate for your study.
We are using a significance level of 0.
05, a two-tailed test allots half of this alpha to testing the statistical significance in one direction and the other half alpha to testing statistical significance in the other direction.
This means that .
025 is in each tail of the distribution of our test statistic.
One-tailed test provides more power to detect an effect because the entire weight is allocated to one direction only and because of this there might be a temptation to use a one-tailed test whenever you have a hypothesis about the direction of an effect.
But before doing so, we must consider the consequences of missing an effect in the other direction.
When using a two-tailed test, regardless of the direction of the relationship of the hypothesis, we would be testing for the possibility of the relationship in both directions.
For example, we may wish to compare the mean of a sample to a given value x using a t-test.
Our null hypothesis is that the mean is equal to x.
A two-tailed test will test both if the mean is significantly greater than x and if the mean significantly less than x.
The mean is considered significantly different from x if the test statistic is in the top 2.
5% or bottom 2.
5% of its probability distribution, resulting in a p-value less than 0.
But in our example the null hypothesis states that our company believes that the average cost of a 2007 Ford explorer is more than the quoted $29 ,995 price from the local Ford Dealership.
We are not interested in seeing if it is less than the quoted price, which is why we will be using a single tailed test.
Now that we have an idea of both P- value and single/double tailed tests we will set the level of significance of our hypothesis to 0.
05 as our risk for rejecting the true hypothesis, a type 1 error.
We choose 0.
05 as our level of significance as traditionally 0.
05 is used for consumer research when determining level of significance.
Statistical testChoosing a test is test is easy if you follow the below chart(check the diagram at the bottom once), since we do not know the standard deviation of the entire population(we just have the std.
for our sample) we have to use the t-Test.
When using a test statistic for one population mean, there are two cases where you must use the t-distribution instead of the Z-distribution.
The first case is where the sample size is small (below 30 or so), and the second case is when the population standard deviation, σ is not known, and you have to estimate it using the sample standard deviation, s.
In both cases, you have less reliable information on which to base your conclusions, so you have to pay a penalty for this by using the t-distribution, which has more variability in the tails than a Z-distribution has.
A hypothesis test for a population mean that involves the t-distribution is called a t-test.
The formula for the test statistic in this case is:where t ₙ ₋₁ is a value from the t-distribution with n–1 degrees of freedom.
Summarizing all the data we have so far:+===========+====================+| 29,995.
00 | hypothesized value |+———–+——————–+| 30,474.
80 | mean Data |+———–+——————–+| 1,972.
59 | sample std.
085 | std.
error |+———–+——————–+| 20 | n |+———–+——————–+| 19 | degrees of freedom |+———–+——————–+Calculating the t-value from the above formula → 1.
The corresponding value of p is 0.
You can find numerous online tables and calculators to calculate the p -value and t-score.
The p-value is greater than α 0.
In this case, we fail to reject the null hypothesis.
When this happens, we say that the result is not statistically significant.
In other words, we are reasonably sure that our observed data can be explained by chance alone.
By interpreting the results of the test, we proved the null hypothesis.
Types of testsThe below handy flowchart describes what are the type of tests you can do in order confirm/reject your null hypothesis.
For the life of me I cannot find the reference of the chart, I had it saved long back.
Please let me know if you can find the reference I will be more than happy to add it.
Depending on your data type, sampling method and the type of your hypothesis you have to choose the appropriate test.
The process for each test will be similar to the one we have shown here.
If you want details and use cases of different types of tests mentioned above, there are 9 types of tests which have been described in this link along with their corresponding R code.
It is really helpful.
Flowchart to determine the type of statistical test to be doneReferences:https://statistics.