| Contents of this page: |
| Assumptions |
| Confidence Interval |
| Hypothesis Test |
Remember the seven questions we discussed the first day of class. Here is one of them: Do the salaries of men and women differ? How would one try to answer this question? We could do the following: carry out a stratified sample, that is do a simple random sample of men and a simple random sample of women. Calculate the mean income of the men and the mean income of the women. If the two are very different, we can conclude that the salaries are different. Of course as always in Statistics we really want to check whether the salaries are statistically significantly different.
The best way to think of this type of problem is as follows: we imagine the data to come from two distinct groups, say group 1 and group 2. We are interested in the means of those groups, let's call them μ1 and μ2.
As always in Statistics we should start with a graph. The standard one for this type of problem is the boxplot by groups
Next we can make a short table of the summary statistics. Because the methods described here only work for normal data anyway that means the table:
In fact as we shall see this table contains all the information we need.
Equal variance Assumption
The formulas for the confidence interval and the hypothesis test are fairly simple if we can assume that the population standard deviations of the two groups are the same. As a rule of thumb this is ok of the larger sample standard deviation is not more than three times the smaller one. Then we begin by calculating the pooled standard deviation. It is just what it says, the estimate for the standard deviation if we ignore the fact that there are two groups. It is defined by
Note that sp always falls between s1 and s2
If we have s1=s2, then a 100(1-α)% confidence interval for the difference in means μ1-μ2 is given by
Of course we don't know either s1 or s2, so how do we know whether this formula is ok to use? What you do is draw the multiple boxplot (which you should do anyway to check for normality). If the longer box is not more than 3 times as long as the shorter box this formula is usually ok.
Example Consider the data on the Study Habits of College Students. Find a 90% confidence interval for the difference of the mean study habits of men and women.
We start with some graphs:

The boxplot shows that there seems to be difference, with the women scoring somewhat higher than the men. There are no outliers in the boxplots, so the normal assumption appears to be satisfied. This is also shown in the probability plots
Here is the table of summary statistics:
| Men | Women |
|---|---|
| n1=20 | n2=18 |
1=121.25 |
2=141.06 |
| s1=32.85 | s2=26.44 |


If the equal variance assumption is not satisfied you can still do the problem use the computer. In MINITAB the default is to assume unequal variance but you can change that. The advantage of using the equal variance assumption is a smaller confidence interval:
90% confidence interval with equal variance assumption: ((-36.2577, -3.3534)) (Stat > Basic Statistics > 2-Sample t, Samples: Men Women, check box: Assume equal variance, Options > Confidence level: 90)
90% confidence interval without equal variance assumption: (-36.0807, -3.5304)(Stat > Basic Statistics > 2-Sample t, Samples: Men Women, uncheck box: Assume equal variance, Options > Confidence level: 90)
Null Hypothesis: H0: μ1 = μ2
Alternative Hypothesis: Choose one of the following
a) Ha: μ1 > μ2
b) Ha: μ1 < μ2
c) Ha: μ1 ≠ μ2
Test Statistic:
Rejection Region:
If your alternative is a) Ha: μ1 > μ2, then reject H0 if T > tn1+n2-2, α
If your alternative is b) Ha: μ1 < μ2, then reject H0 if T < -tn1+n2-2, α
If your alternative is c) Ha: μ1 ≠ μ2, then reject H0 if |T| > tn1+n2-2, α/2
Remark: we can also test H0: μ1 - μ2 = d, but those tests are much rarer and won't be discussed here.
Remark: again this test requires equal variance. If that assumption is not justified you can use the computer to do the calculations for you.
Example Analyse the data concerning the failure of companies. Using the computer test at the 5% level whether healthy companies have higher ratios of assets to liabilities than failed companies.
The summary statistics are as follows:
| Healthy | Failed |
|---|---|
| n1=68 | n2=33 |
1=1.73 |
2=0.82 |
| s1=0.64 | s2=0.48 |

Very careful here: MINITAB assumes alphabetic ordering, so group 1 is failed and group 2 is healthy, therefore we need the alternative less than. Or we have to insure the correct ordering.
Example Let's consider again the data on Cocain Use of Mothers and the Length of the Babies. Test at the 5% level whether the babies of "drug-free" mothers are longer than the babies of "first trimester" mothers.
1) Parameters: two means μ1 and μ2
2) Method: 2-sample t
3) Assumptions: The boxplots and the normal plots show that the data appears to be reasonably normal and that the equal variance assumption is justified.
4) α = 0.05
5) H0: μ1 = μ2 (mean lengths are the same)
6) Ha: μ1 > μ2 (babies of drug free mothers are longer than first trimester babies)
7) p = 0.012 (Stat > Basic Statistics > 2-Sample t, Samples in different columns, First: Dug-Free, Second: First Trimester, check box: Assume equal variance, Options > Alternative=greater than)
8) 0.012<0.05, so we reject the null hypothesis
9) Babies of drug free mothers are statistically significantly longer than first trimester babies.
Of course it would be nice to compare all three groups simultaneously, but if you want to know how to do that you
have to come to my ESMA3102 course
For more on the two-sample problem see chapter 10 of the textbook.