# The Chi Square Statistic (p.1)

Try it.

Notice that the new x² value is 4.

125 and this value exceeds the table value of 3.

841 (at 1 degree of freedom and an alpha level of 0.

05).

This means that p < 0.

05 (it is now 0.

04) and we reject the null hypothesis in favor of the alternative hypothesis — the heart rate of animals is different between the treatment groups.

When p < 0.

05 we generally refer to this as a significant difference.

Probability level (alpha)Last but not least, we can calculate chi square of the 2 x 2 contingency table in another way…Let’s say we had a random sample of 237 pupils who were asked if they ever got into troubles at school.

The result is the table below:Table 3: Pupil situation observationBecause the number of pupils is varying per gender, it is hard to compare boys and girls in that way.

Therefore, let’s standardize joint frequencies by dividing counts within each row with its corresponding row total.

In addition, let’s standardize marginal frequencies by dividing each marginal frequency by the overall total (located on the bottom right corner — 237)Table 4a: Convert observations to frequenciesTable 4b: Results in frequenciesThis way, joint frequencies become joint (conditional) probabilities, or observed probabilities (marked in green): it takes into account two categorical variables.

For instance, 0.

39 is a probability of a boy being in trouble P(B, Trouble).

Marginal frequencies become marginal probabilities — it takes into account only one of the categorical variables.

For instance, 0.

49 is a probability of being a boy P(B), or 0.

35 — a probability of being in trouble P(Trouble).

In the beginning, we outlined our hypothesis statements as follows:H0: X and Y are independent.

H1: X and Y are dependent.

Probabilities in statistics states that if two events are independent, the following equation is satisfied:P(X, Y) = P(X) x P(Y)…where X and Y are some events.

Chi square test is based on this assumption.

Therefore, if is true, meaning that X and Y are independent, the following equation will be satisfied:P(gender, situation) = P(gender) x P(situation)On the left side of this equation we see our joint probability, and on the right side of this equation, we see two marginal probabilities.

First, Chi-square uses this assumption to calculate expected probabilities (joint probabilities) using its marginal probabilities.

For instance, expected probability for boys being in trouble isP(Boy) x P(Trouble)= 0.

35 x 0.

49= 0.

17or expected probability for girls being in trouble isP(Girl) x P(Trouble)= 0.

35 x 0.

51= 0.

18…etc.

Find expected probabilities in the brackets below:Table 5: Finding P(X,Y), P(X), P(Y)In other words, when we calculate the expected probabilities, we calculate probabilities that we should expect if is true, or, if X and Y are independent variables.

That means that, if boys and trouble status are independent variables, our expected probability for boys not being in trouble is 0.

32.

Secondly, we should measure the differences between the actual probabilities (actual joint probabilities in our tables) and those expected probabilities we have just calculated.

If we see that the difference between our actual probabilities and the probabilities we expect to have in case two categorical variables are independent, is huge, then our variables are most likely not independent.

Similarly, if the difference between our actual probabilities in our table and probabilities that we suppose to get in case of independence is small, our two variables are most likely independent.

The difference between these two probabilities is represented by value that we have to calculate using the formula:Chi square (x²) formulaSo, we just plug the values from the table above:Last step is to compare value with the value in the distribution table (denoted as) to conclude if you should accept or reject .

The following procedures applies:x² > table value: accept — meaning that you have enough statistical to conclude that two variables are dependent.

x² ≤ table value: reject — meaning that you have enough statistical to conclude that two variables are independent.

To get a proper value from the table, we have to know two things:Significance level and degrees of freedom.

In the table, significance level (α) is on the top, and degrees of freedom (υ) is on the left side.

The significance level is something that you choose yourself.

Let’s use a significance level of 5%, so α = 0.

05.

Degrees of freedom in Chi-square is calculated using a formula:With significance level of 0.

05 and degree of freedom of 1, we have table value = 3.

8415 (first row, third column)Probability level (alpha)Now we have both x² and table values and can compare these two to make a decision.

So, any estimated from above calculation x² = 1.

034 which is below our table values = 3.

8415, that is, any difference between our actual probabilities and probabilities we expect to have if two variables are independent, that is below 3.

8415, means that these two variables are independent, or have no relationships in-between.

As our x² < table value, we can reject H1 and conclude that the gender and trouble status are NOT correlated with each other.

ConclusionSo, here are the steps to make a Chi-square test:Add marginal frequencies to a contingency tableTranslate joint and marginal frequencies into probabilitiesEstimate the expected probability for each cellCalculate x²Compare x² with table value and make a decision:x² > table value = accept = dependentx² ≤ table value = reject = independent.