Exercises - Categorical Data

Problem 1

Cervical Cancer and Smoking (Nischan et al., 1988; Pagano & Gauvreau, 1993, p. 359)
Data from a study of cervical cancer and smoking are shown below.
Cancer No cancer
Does smoke 108 163
Does not smoke 117 268

a) Based on these data, is there a (statistically significant) relationship between smoking and cervical cancer?
b) In a further analysis of this data we include information on the number of sexual partners, either at most one or more than one:
Group 1: Less than one partner
Cancer No cancer
Does smoke 12 21
Does not smoke 25 118

Group 2: More than one partner
Cancer No cancer
Does smoke 96 142
Does not smoke 92 150

Are there (statistically significant) relationships between smoking and cervical cancer within these groups?

Problem 2

During a game that uses a die you get the feeling that you are not getting enough sixes (which would be good in this game). So for some time you keep track of the rolls of the die, and you observe:
Die shows a One Two Three Four Five Six
Frequencies: 11 12 7 8 16 5

Does this data show that the die is not fair? Test at the 5% level.

Solutions

Problem 1


a) Method : Chisquare test for independence
α=0.05
H0: There is no relationship between cervical cancer and smoking
H1: There is some relationship between cervical cancer and smoking
p-value = 0.012 < 0.05
We reject the null hypothesis
Conclusion: there is some relationship between cervical cancer and smoking.

a) Group 1. Method : Chisquare test for independence
α=0.05
H0: There is no relationship between cervical cancer and smoking for people with one or less partners
H1: There is some relationship between cervical cancer and smoking for people with one or less partners
p-value = 0.016<0.05
We reject the null hypothesis
Conclusion: there is some relationship between cervical cancer and smoking for people with one or less partners
Group 2. Method : Chisquare test for independence
α=0.05
H0: There is no relationship between cervical cancer and smoking for people with two or more partners
H1: There is some relationship between cervical cancer and smoking for people with two or more partners
p-value = 0.603>0.05
We fail to reject the null hypothesis
Conclusion: there is no evidence of a relationship between cervical cancer and smoking for people with two or more partners.


Problem 2

Method: chisquare goddness of fit test
Theory to be tested: die is fair.
If that is true about 1/6th of the rolls should be Ones, 1/6th should be Twos and so on. There were 59 rolls during the experiment, so we should see about 59/6 = 9.83 Ones, Twos and so on:
Die shows a One Two Three Four Five Six
Observed 11 12 7 8 16 5
Expected 9.83 9.83 9.83 9.83 9.83 9.83

Another way of seeing this is as follows: if the die is fair, then all the outcomes should be equally likely (after all, this is what fair means), so all the "Expected" numbers should be the same. They have to add up to 59 so we get "Expected"*6=59 or "Expected"=59/6
Using these numbers we find T=8.02 and P(X<=x) =0.8447. So:
α=0.05
H0: The die is fair (=theory is true)
H1: The die is not fair
p-value = 1-0.8447=0.1553>0.05
We fail to reject the null hypothesis
Conclusion: there is not enough evidence to conclude that the die is not fair.

A little bit more, not part of you would be expected to do:
It is a little strange that there should be so few sixes (only 5, the fewest of any number) when we were already suspecting that there were not enough sixes. Only the chisquare goodness of fit test does not know the meaning of the word "six" or its special significance here, so it treats it just like the other categories.
Sometimes the chisquare test (just like any other method) can give obviously stupid answers. Consider this: say in the 59 rolls above we find:
Die shows a One Two Three Four Five Six
Observed 13 12 10 9 8 7

With this we find T=2.72881 and p-value=1- 0.2583=0.7417>0.05, so the chisquare test says that there is no evidence that this is not a fair die!! But the probability that the number of rolls should be nicely ordered from largest to smallest (or the other way around) is only 0.0028, so such an arrangement is highly unlikely.