Analysis of Oxygen Concentration and Fermentation

Ethanol is a continuous response, Sugar and Oxygen are discrete factors, so this is a two-way ANOVA problem. More specifically, this is a Factorial design because both Sugar and Oxygen are equally important.

Step 1: Graphs
Graph > Boxplot > with Groups, Graph variable=Ethanol, Categorical variable=Sugar

Graph > Boxplot > with Groups, Graph variable=Ethanol, Categorical variable=Oxygen

There is a hint of unequal variance here. We fix this with a square root transform:
Calc > Calculator, Store in Sqrt(Ethanol), Expression SQRT('Ethanol')
Now again the graphs:
Graph > Boxplot > with Groups, Graph variable=Sqrt(Ethanol), Categorical variable=Sugar

Graph > Boxplot > with Groups, Graph variable=Sqrt(Ethanol), Categorical variable=Oxygen

and this looks much better.

Step 2: Summary Statistics
Because we used a transformation we will use the median and IQR/1.35
Stat > Basic Statistics > Display Descriptive Statistics, Variable=Ethanol, By variables=Sugar, Statistics > check IQR
Note: this uses Ethanol, not Sqrt(Ethanol)!
Sugar
Groups n Median IQR/1.35
Galactose 8 0.225 0.194
Glucose 8 0.025 0.083

Stat > Basic Statistics > Display Descriptive Statistics, Variable=Ethanol, By variables=Oxygen, Statistics > check IQR
Oxygen
Groups n Median IQR/1.35
0 4 0.275 0.321
46 4 0.155 0.243
92 4 0.145 0.156
138 4 0.065 0.093

Step 3: Interaction
Stat > ANOVA > Interaction Plot, Responses=Sqrt(Ethanol), Factors = Sugar Oxygen

There does not seem to be any interaction. To confirm this test for it:
Stat > ANOVA > Twoway, Response=Sqrt(Ethanol), Row Factor=Sugar, Column factor=Oxygen

1) a=0.05
2) H0: g11=g12=...=g24=0 (no interaction)
3) H0: gij≠0 (some interaction)
4) p-value=0.995 > a
5) We fail to reject H0, there is no interaction

Therefore we will now fit an additive model, without the interaction term

Step 4: Hypothesis Test
Stat > ANOVA > Twoway, Response=Sqrt(Ethanol), Row Factor=Sugar, Column factor=Oxygen, check box Fit additive model, Graphs > Residual vs. Fits Plot and Normal Plot


both plots look ok

Test for Sugar:
1) a=0.05
2) H0: a1 = a2=0 (no difference in the mean ethanol for different sugars)
3) Ha: ai≠0 for some i (some differences in the mean ethanols for different sugars)
4) p-value=0.000 < a
5) We reject H0, there are some differences in the mean ethanol for different sugars

Test for Oxygen:
1) a=0.05
2) H0: b1 = .. = b4=0 (no difference in the mean ethanol for different oxygen levels)
3) Ha: bi≠0 for some i (some differences in the mean ethanols for different oxygen levels)
4) p-value=0.035 > a
5) We reject H0, there are some stat. signif. differences in the mean ethanol for different oxygen levels

Step 5: Multiple Comparison
1) Sugar
Sugar is stat. signif, but sugar only has two levels, so those two are different. No need for a multiple comparison here.

2) Qxygen
Oxygen is stat. signif. Oxygen also is really a continuous variable, it is just measured in this experiment discretely. In this situation one does not usually run a multiple comparison

Warning If we had not done a transformation the results would have been quite different. For example, Oxygen would not have been stat. signiificant (p-value=0.065)

Regression Analysis

Oxygen might also be considered a continuous variable. In fact, with a different machine Oxygen might be "23.5" or really any other number. For this reason one might analyze this dataset using regression:

The boxplot of Ethanol by Sugar and the scatterplot of Ethanol by Oxygen show a hint of unequal variance. A squre root transform fixes those problems.

Now code Sugar as SugarCode with Galactose=0 and Glucose=1

The scatterplot of Sqrt(Ethanol) by Oxygen shows a weak negative linear relationship (r=-0.503, p=0.047), no problems
The boxplot of Sqrt(Ethanol) by SugarCode shows a strong relationship (r=-0.722, p=0.002), no problems
Oxygen and SugarCode are uncorrelated (r=0)

Make new variable Oxygen*SugarCode
The Best Subset regression shows that the best model is bases on Oxygen and SugarCode (Mallow's Cp=2.0).
The model is
Sqrt(Ethanol) = 0.654 - 0.00213 Oxygen - 0.315 SugarCode
The diagnostic plots look good:

Which analysis is better? That is really a question of what we want to achieve:
• If we only want to know whether there is a relationship between Ethanol, Oxygen and Sugar, the ANOVA works well
• If we want to know what the relationship between Ethanol, Oxygen and Sugar is, for example if we want to use it to predict the ethanol for a certain oxygen level and sugar type, we need to do the regression analysis.

Note that both analyses have a lot in common:

• both need a transform of ethanol
• both say oxygen and sugar matter
• both say there is no interaction (Oxygen*SugarCode is not part of the final regression model)
• Both have the same R2(=77.4%)