Two-way Tables - Chisquare test for Independence

Contents of this page:
Assumptions
Hypothesis Test

Method

Chisquare test for Independence

Assumptions

All cell counts at least 5.

Example Recall the dataset on Breaking Cocain Addiction
Here for each subject we have two variables, "Drug" with values "Desipramine", "Lithium" and "Placebo", and "Relapsed" with values "Yes" and "No". Both variables are discrete. The obvious question here is: is there a differene between the treatments?

What we need is a hypothesis test for the following problem: we have to discrete variables x and y, and we want to test

H0: x and y are independent vs. Ha: x and y are dependent

The most commonly used method for this problem is called the Chisquare test for Independence. The mathematical details of this test are not trivial, so we will only use the p-value approach here. The test is done by the commands Stat > Tables > Cross Tabulation and Chisquare.

Example back to the example:
Graph:

Contingency table:


Test
1) Parameter: measure of association
2) Method: Chisquare test for Independence
3) Assumptions: all expected cell counts ≥5. (see output of MINITAB)
4) α=0.05
5) H0: "Drug" and "Relapse" are independent (no difference in relapse rates for different drugs)
6) Ha: "Drug" and "Relapse" are dependent (some differences in relapse rates for different drugs)
7) p=0.005 (Stat > Tables > Cross Tabulation and Chisquare, Rows: Drug, Columns: Relapse, Chisquare > check chi-square analysis
8) p=0.005 < 0.05, so we reject the null hypothesis
9) there are some statistically significant differences between the treatments.

Example Consider the data on Drownings in Los Angeles. Analyze this data and decide whether there is a difference between men and women
We have two discrete variables Method and Gender
Graph
The graph has to based on percentages, but the graphs MINITAB does on its own are no good. Here is what you need to do:
Calc > Calculator, Store in %Male, Expression: ROUND('Male' / SUM('Male') * 100,1)
Calc > Calculator, Store in %Female, Expression: ROUND('Female' / SUM('Female') * 100,1)
Graph > Barchart, Values from a table, Cluster, ok
Graph variables %Male %Female, Row labels: Methods
Bar Chart options > Decreasing Y

Contingency Table the data is already a contingency table

1) Parameter: measure of association
2) Method: Chisquare test for Independence
3) Assumptions: all expected cell counts ≥5. (see output of MINITAB of Stat > Tables > Chisquare Test (Table in Worksheet), Columns containing the table: Men Women)

Notice that at the bottom of the MINITAB output there is a line that reads:

1 cells with expected counts less than 5.

so there is in fact a problem with the assumption for the test.

The "expected cell counts" are the numbers below the "observed cell counts" in the MINITAB output. In our case we see that the expected cell count for Method 8 (Pails, basins, toilets) and Female is 3.17, which yields the warning. One way to deal with this is to join similar groups. So for example here we can join the group "Pails, basins, toilets" to the "Other" group to get a new "Other" with counts with 47 and 16. Then we rerun the test to get:


1) Parameter: measure of association
2) Method: Chisquare test for Independence
3) Assumptions: all expected cell counts ≥5. (see output of MINITAB)
4) α=0.05
5) H0: "Gender" and "Method" are independent (no difference between men and women)
6) Ha: "Gender" and "Method" are dependent (some differences between men and women)
7) p=0.000 (Stat > Tables > Chisquare Test (Table in Worksheet), Columns containing the table: Men Women
8) p=0.000 < 0.05, so we reject the null hypothesis
9) there are some statistically significant differences between the the methods of drowing of men and women