There are two ways to look at ANOVA problems:
a) Traditional View: The data consists of measurements taken from several groups.
Example: Measurements of the length of babies of mothers, with the mother belonging to one of three groups (Drug Free, 1st Trimester, Throughout)
b) Modern View: We have one continuous response variable and one (or more) discrete factors (predictor variables)
Example: Response=Length of Baby, Factor=Drug Use
MINITAB can handle the data in either format, but in order to keep things simple in this class we will assume that it is in the modern view. If it is not use the Data > Stack > Columns command
Notation: In ANOVA we use the word factor instead of the word predictor, but they mean the same.
The values of a factor (and there are only a few because factors are discrete variables) are called the levels.
Example: in Mothers and Cocaine Use "Drug Status" is a factor, and it has the three levels Drug Free, 1st Trimester and Throughout.
If there are more than one factor each combination of one value from each factor is called a factor-level combination.
Basic Question
a) Traditional View: Is there a difference in the (population) means of the groups?
This means we are testing
H0: m1 = .. = mk vs.
Ha: mi ≠ mj for some i≠j
b) Modern View: Is there a relationship between the factor(s) and the response?
The hypotheses are the same as in the traditional view.
The test is done by the Stat > ANOVA > Oneway command
The basic question that ANOVA tries to answer is whether there is a difference of the population means of the different groups. So why is it called Analysis of Variance?

In other words we can study the differences in group means by studying the variances.
Check these assumptions just as you did in regression problems.
An example where ANOVA doesn't work:
If there are outliers:
1) a1=a2≠a3
2) a1≠a2=a3
3) a1=a3≠a2
4) a1≠a3≠a2
Each of these posibilities has different consequences!
To see which of them is most likely true, run a multiple comparion method such as Tukey's test.
The printout of Tukey's method compares one level of the factor with another, one by one. It starts with
Drug Free subtracted from:
| Lower | Center | Upper | |
|---|---|---|---|
| First Trimester | -3.880 | -1.800 | 0.280 |
For the others:
Drug Free - Throughout: "-4.818" and "-1.382" --- different signs --- "Drug Free" and "First Trimester" are stat. signif. different
First Trimester - Throughout: "-3.408" and "0.808", so "First Trimester" and "Throughout" are not stat. signif. different
How to present the results of a mulitple comparison:
1) Order the groups by the sample means
2) Underline together those groups which have have not been found to be stat. signif. different.
3) If possible, simplify
Example:
| Throughout | 1stTrimester | Drug Free |
| ________________________________ | ||
| ________________________________ | ||
Data Set 1: Data on the prices of T-shirts and where they were bought. Is there a difference in the prices depending on the store location? ANOVA: p-value=0.015.
Data Set 2: Data on the prices of T-shirts and their sizes. Is there a difference in the prices depending on the size? ANOVA: p-value=0.015.
But in data set 2 there is a natural ordering of the factor "size", and the data is not consistent with this ordering, so here we should be much more cautious with any conclusions than in the case of data set 1.
1) Try a transformation of the response variable
a) If the problem is "small", try the SQRT
b) If the problem is "large", try the LOGT
2) If the transformations don't work, use a nonparametric method (In the case of a oneway ANOVA use Kruskal-Wallis)
Also, in either case you should use the median instead of the mean and IQR/1.35 instead of the standard deviation in your summary table.
Example: Capacity of Wells:
| Rock Type | N | Median | IQR/1.35 |
| Dolomite | 50 | 1.72 | 6.92 |
| Limestone | 50 | 0.45 | 1.45 |
| Siliclastic | 50 | 0.46 | 0.96 |
| Metamorphic | 50 | 0.30 | 0.79 |
How to do a Multiple Comparison in a Two-way ANOVA with Interaction
For the Film Coatings dataset we found a significant interaction between Temperature and Pressure. In the Interaction plot we see that the combination Temp=High and Pressure=Mid results in the thinnest film. But is this combination stat. signif. better than all the others?
To be able to run a multiple comparison we need to turn this into a one-way ANOVA problem. We can do this by combining the two factors Temperature and Pressure into one new factor, call it TP, with levels Low Low, Low Mid etc. We can do this using the command concatenate, in Data. Now we run the one-way ANOVA with Tukey's multiple comparison. The result is:
| HM | ML | HL | MM | MH | HH | LL | LM | LH | |
|---|---|---|---|---|---|---|---|---|---|
| ____________________ | |||||||||
| _________________________ | |||||||||
| _______________ | |||||||||
| _______________ | |||||||||
In this problem we are interested in the stat. signif. best combination of temperature and pressure. If we wanted to do a multiple comparison of just one of the factors we can use the General Linear Models command:
Stat > ANOVA > General Linear Model, Responses: Thickness, Model: Temperature Pressure Temperature* Pressure, Comparisons > Terms: Temperature
Note you need to include the interaction term Temperature* Pressure in the model, otherwise a model without interaction is fit which we know is wrong
The output of MINITAB for the pairwise comparison is different from the two-way command, it gives the p-values for the pairwise test of no difference:
mLow=mMid, p-value=0.00, they are different
mLow=mHigh, p-value=0.00, they are different
mMid=mHigh, p-value=0.88, they are not different
So we find:
| Temperature | ||
| Low | Mid | High |
| ___________fix | ||
When to use Twoway or General Linear Model
Use General Linear Model if
a) you want to do a multiple comparison
b) you have an unbalanced design
c) you have a more complicated design
otherwise use Twoway