Water Quality and Mining

Step 1: Graphs
Graph > Boxplot > with Groups, Graph variable=Iron, Categorical variable=Rock

Graph > Boxplot > with Groups, Graph variable=Iron, Categorical variable=Mine

We have a clear problem with the normal assumption, so use the log transform
Calc > Calculator > Store in log(Iron), Expression LOGT('Iron')
redo the graphs
Graph > Boxplot > with Groups, Graph variable=log(Iron), Categorical variable=Rock

Graph > Boxplot > with Groups, Graph variable=log(Iron), Categorical variable=Mine

This has solved the problem, so the analysis will be based on log(Iron)

Step 2: Summary Statistics
Because we use a transformation we will base the tables on Median and IQR/1.35
Stat > Basic Statistics > Display Descriptive Statistics, Variable=Iron, By variables=Rock, Statistics, check IQR
Note this uses Iron, not log(Iron)
Rock
Groups n Median IQR/1.35
Limestone 39 1.3 3.6
Sandstone 39 0.41 1.2

Stat > Basic Statistics > Display Descriptive Statistics, Variable=Iron, By variables=Mine, Statistics > check IQR
Mine
Groups n Median IQR/1.35
Unmined 26 0.515 2.64
Reclaimed 26 0.685 0.73
Abondoned 26 1.65 8.4

Note that the IQR's are very different (0.73 vs 8.4). This is because this data set has a lot of outliers which still effect the IQR. It would be better to use a table based on log(Iron) and mean instead. It would look like this:
Stat > Basic Statistics > Display Descriptive Statistics, Variable=log(Iron), By variables=Mine
log(Iron) by Mine
Groups n Mean Std
Unmined 26 -0.140 0.775
Reclaimed 26 -0.1745 0.4723
Abondoned 26 0.543 0.879

but reading this table requires some knowledge of math (log) Step 3: Interaction
Stat > ANOVA > Interaction Plot, Responses=log(Iron), Factors = Rock Mine

There seems to be some interaction. To confirm this test for it:
Stat > ANOVA > Twoway, Response=log(Iron), Row Factor=Rock, Column factor=Mine

1) a=0.05
2) H0: g11=g12=...=g23=0 (no interaction)
3) H0: gij≠0 (some interaction)
4) p-value=0.000 < a
5) We reject H0, there is interaction

Step 4: Hypothesis Test
Stat > ANOVA > Twoway, Response=log(iron), Row Factor=Rock, Column factor=Mine, Graphs > Residual vs. Fits Plot and Normal Plot


both plots look ok

Test for Rock:
1) a=0.05
2) H0: a1 = a2=0 (no difference in the mean iron content for different types of rock)
3) Ha: ai≠0 for some i (some differences in the mean iron content for different types of rock)
4) p-value=0.035 < a
5) We reject H0, there are some differences in the mean iron content for different types of rock

Test for Mine:
1) a=0.05
2) H0: b1 = b2 = b3=0 (no difference in the mean iron content for different types of mines)
3) Ha: bi≠0 for some i (some differences in the mean iron content for different types of mines)
4) p-value=0.00 > a
5) We reject H0, there are some stat. signif. differences in the mean iron content for different types of mines

Step 5: Multiple Comparison
1) Rock has only two levels, so there is no need for a multiple comparison
2) Mine. For this use the General Linear Model:
Stat > ANOVA > General Linear Model, Responses: 'log(iron)', Model: Rock Mine Rock*Mine, Comparisons > Terms: Mine
Note you need to include the interaction term Rock*Mine in the model!
Unminded Reclaimed Abondoned
___________________________

Interpretation: There is a stat. signif. difference between the mean iron content of abondoned mines and the others. The difference between unmined and reclaimed mines is not stat. sign, at least not at these sample sizes.