Hypothesis Testing

Again this was a topic in 3101, and you should read up on it here. Some items to remind you:

Parts of a HT

1) Parameter of interest
2) Method of analysis
3) Assumptions of Method
4) Type I error probability α
5) Null hypothesis H0 (in plain language and in terms of a parameter, if appropriate)
6) Alternative hypothesis Ha
7) Find p value
8) Decision and Conclusion, in plain language.

We will decide whether or not to reject the null hypothesis based on the p-value. The p-value p is computed by the computer. This number is then compared to the pre-chosen level of significance α (usually α=0.05 or 5%) as follows:
p< α p> α
reject H0 fail to reject H0

Example: Let's have a look at the 1970's Military Draft.
Real question: Did the 1970 draft work the way it was supposed to?

1) Parameter of interest: Pearson's correlation coefficient
2) Method of analysis: test based on normal theory
3) Assumptions of Method: data comes from a bivariate normal distribution.

marginal plot shows no outliers, no problems with normal assumption

4) α=0.05
5) H0: ρ=0 (no relationship between "Day of Year" and "Draft Number")
6) Ha: ρ≠0 (some relationship between "Day of Year" and "Draft Number")
7) p=0.000 (use Stat > Basic Statistics > Correlation)
8) p<α, so we reject H0, there is some relationship between "Day of Year" and "Draft Number", something went wrong in the 1970 draft.

More on the p-value

Let's do a little simulation to understand the p-value

Example We will do the following:

• generate 50 observations from a normal distribution with mean μ and standard deviation s=1.0
• find the p-value of the test H0:μ=10.0 vs Ha: μ≠10.0
• repeat 1000 times
• draw the histogram of the 1000 p-values and find the percentage of p-values<0.05

This is done in the MACRO pvalue, run it with

CTRL-L, %k:\3102\pvalue 10.0

Now if mu=10.0, the null hypothesis is true. As we can see in that case any number between 0 and 1 is equally likely to be our p-value, and so 5% are < 0.05.
As mu gets father away from the hypothesized value of 10.0 (the null hypothesis is "more" wrong), the p-values start to "bunch up" around 0, so we correctly rejecting the null hypothsis more and more often.

What you can conclude from the outcome of a hypothesis test

After carrying out a hypothesis test, what can you conclude? There are always the following possibilities:

• If we rejected the null hypothesis:
--- we reject H0 because H0 is false
--- we committed the type I error (but we know the probability of doing so - α)

• If we failed to reject the null hypothesis:
--- we failed to reject H0 because H0 is true
--- we committed the type II error
--- we failed to reject H0 because our sample size was to small!

Example in the test above we rejected the null hypothesis. So we conclude that either something went wrong in the draft or we committed the type I error, but the p-value is 0.000, so that is very unlikely

Example Let's illustrate the above with a little simulation. For this we will generate some data and carry out a hypothesis test as follows:

Calc > Random Data > Normal, Generate 20 rows of data , Store in c1, Mean: 10.0, Standard Deviation: 3.0

Now we carry out the following hypothesis test:

1) α=0.05
2) H0: μ=10.0
3) Ha: μ≠10.0

But we generated the data, so we know that μ=10.0. Therefore we know that the null hypothesis is true, and so if we commit an error it will be the type I error.

Now if we keep doing the above many times, what will happen? According to the theory we should fail to reject H0 (which is the correct conclusion) 95% of the time, and we should reject H0 (commit the type I error) 5% of the time.

This is done in the MINITAB macro test. Run it with this command:

CTRL-l, %k:\3102\test 20 10.0 3 0.05

Now let's change things a bit. Instead of generating data from a normal with mean 10.0, generate it from a normal with mean 12.0, but still carry out the test above for μ=10.0. So now we know that H0 is false, and we should reject it. If we don't we will commit the type II error. How often do we do this? Again run the macro:

CTRL-l, %k:\3102\test 20 12.0 3 0.05

As we see it correctly reject H0 about 81% of the time and commit the type II error 19% of the time.

Here are some interesting cases:

Effect of the true mean

Macro Percentage of type II error
CTRL-l, %k:\3102\test 20 10.5 3 0.05 89%
CTRL-l, %k:\3102\test 20 11.0 3 0.05 71%
CTRL-l, %k:\3102\test 20 11.5 3 0.05 43%
CTRL-l, %k:\3102\test 20 12.0 3 0.05 19%
CTRL-l, %k:\3102\test 20 12.5 3 0.05 6%

so the further the true mean is from the one specified in H0, the less likely we are to commit the type II error, or
The more wrong the null hypothesis is, the more likely we are to make the right decision

Effect of the standard deviation

Macro Percentage of type II error
CTRL-l, %k:\3102\test 20 10.5 3 0.05 89%
CTRL-l, %k:\3102\test 20 10.5 2.5 0.05 87%
CTRL-l, %k:\3102\test 20 10.5 2.0 0.05 82%
CTRL-l, %k:\3102\test 20 10.5 1.5 0.05 70%
CTRL-l, %k:\3102\test 20 10.5 1.0 0.05 44%
CTRL-l, %k:\3102\test 20 10.5 0.5 0.05 1%

so the smaller the standard deviation, the less likely we are to commit the type II error, or
the closer together the data is, the easier it is to find a small difference between the true and the hypothesized mean

Effect of α

Macro Percentage of type II error
CTRL-l, %k:\3102\test 20 11.5 3 0.1 31%
CTRL-l, %k:\3102\test 20 11.5 3 0.05 43%
CTRL-l, %k:\3102\test 20 11.5 3 0.01 70%
CTRL-l, %k:\3102\test 20 11.5 3 0.005 79%
CTRL-l, %k:\3102\test 20 11.5 3 0.001 91%

so the smaller the α, the larger the β, or
the less likely it is that we commit one error, the more likely it s that we commit the other

Effect of Sample Size n

Macro Percentage of type II error
CTRL-l, %k:\3102\test 20 10.5 3 0.05 89%
CTRL-l, %k:\3102\test 50 10.5 3 0.05 80%
CTRL-l, %k:\3102\test 100 10.5 3 0.05 63%
CTRL-l, %k:\3102\test 200 10.5 3 0.05 35%
CTRL-l, %k:\3102\test 300 10.5 3 0.05 18%
CTRL-l, %k:\3102\test 400 10.5 3 0.05 9%
CTRL-l, %k:\3102\test 500 10.5 3 0.05 4%

so the large the samplesize, the smaller the β, or
the more information (data) we have the better a job we can do

So remember: the outcome of a hypothesis test not only depends on whether the null hypothesis is true or false but it also always depends on the sample size. If the sample size is small even a large difference might not be statistically significant even if it is of practical importance. If the sample size is very large even a small difference might be statistically significant even if it does not make any practical difference.

For more on hypothesis testing go to here or see chapter 8 of the textbook.