Some Standart Hypothesis Tests
In general we use three types of hypothesis tests in statistics: so-called parametric tests that assume a specific distribution for the data (most often normality), non-parametric tests which often have a more general assumption such as symmetry of the distribution and simulation based tests. Simulation based tests can be parametric and non-parametric.
Testing for the Mean
Say we have X1, .., Xn iid N(m,s2) and we wish to test H0: m=m0
The 1-sample t-test is based on the test statistic

which under H0 has a tn-1 distribution.
This is the LRT test for this problem.
Because of the central limit theorem this test can also be used even if the data does not come from a normal distribution as long as the data set is large enough (?)
Alternatives to this test if the normal assumption is not true and the central limit theorem does not work are tests based on the median such as the Wilcoxon Signed-Rank test and tests based on simulation.
Tests for the Equality of Two Means
Say we have X1, .., Xn iid N(m1,s2) and Y1, .., Ym iid N(m2,s2)and we wish to test H0: m1=m2. The 2-sample t test is based on the test statistic

which under H0 has a tn+m-2 distribution.
This is the LRT test for this problem.
This test assumes that the two samples come from populations with the same variance (equal variance assumption or homoscatasticity). If that assumption is not true there is a version of this test as well but with a different test statistic.
Because of the central limit theorem this test can also be used even if the data does not come from a normal distribution as long as the data set is large enough (?)
Alternatives to this test if the normal assumption is not true and the central limit theorem does not work are tests based on differences of the medians, the Mann-Whitney U test and tests based on simulation.
Tests for the Equality of More than Two Means
This goes under the heading of Analysis of Variance (ANOVA) and is a whole field of its own.
Tests for the Equality of Two Variances
Say we have X1, .., Xn iid N(m1,s12) and Y1, .., Ym iid N(m2,s22) and we wish to test H0: s1=s2. The F test is based on the test statistic

which under H0 has an F(n-1,m-1) distribution.
This is the LRT test for this problem.
Non-parametric tests for this problem are the Mood test, the Freund-Ansari-Bradley-David-Barton test, the Siegel-Tukey test and others.
Tests for the Association (Correlation) of Two Random Variables
Continuous Random Variables
Say we have X1, .., Xn iid N(m1,s12) and Y1, .., Ym iid N(m2,s22) and we wish to test H0: X
Y
If the assumed association is linear a test can be based on Pearson's correlation coefficient r. Alternative measures of association are Spearmans's rank correlation coeffcient or Kendall's tau coefficient.
Discrete Random Variables
The chisquare test for independence is based on the c2 statistic

where "O" stands for observed and "E" fo expected.
Example: We have data on the hair color and the gender of the people in acertain population. The data is as follows:
|
Hair Color |
|
| Gender |
Black |
Brown |
Blond |
Red |
Total |
| Male |
32 |
43 |
16 |
9 |
100 |
| Female |
55 |
65 |
64 |
16 |
200 |
| Total |
87 |
108 |
80 |
25 |
300 |
We wish to test H0: "Hair Color" is independent from "Gender". Let's assume for a moment that the above is not a sample but the actual population. Then under H0 we have

and so under H0 the "expected" number of males with black hair is 0.0967*300 = 29.0 whereas the "observed" number of males with black hair is 32.
Notice we found the probability of a male with black hair by multiplying the marginals. Doing this for all combinations we find
|
Hair Color |
|
| Gender |
Black |
Brown |
Blond |
Red |
Total |
| Male |
32 (29.0) |
43 (36.0) |
16 (26.7) |
9 (8.3) |
100 |
| Female |
55 (58.0) |
65 (72.0) |
64 (53.3) |
16 (16.7) |
200 |
| Total |
87 |
108 |
80 |
25 |
300 |
where the expected numbers are in prentecis. With this we find the c2 statistic to be c2 = (32-29.0)2/29.0 + (43-36.0)2/36 + ... + (16-16.7)2/16.7 = 8.987. Under the null hypothesis this has a c2 distribution with (r-1)(c-1) = (4-1)(2-1) = 3 degrees of freedom. The test has a p-value of 0.029 and so we would reject the null hypothesis at the 5% level.
This is a large sample test with the requirement that non of the expected numbers be to small. Often one sees expected > 5 but this is actually more than is really needed
One of the reasons this test is very useful that it allows us to combine the results of different experiments in one test.
We have rejected H0 above if c2 is to large. There is also an application of this test if c2 is to small, indicating an agreement between observed and expected more than should happen by random fluctuation. A famous example is Pearson's application of the test to Gregor Mendels genetic experiments.
Alternatives to the c2 test include Fisher's exact test and Friedman's two-way Analysis of Variance by Ranks test.
Goodness of Fit Tests
Often in Statistics we assume that the data was generated by a specific distribution, for example the normal. If we are not sure that such an assumption is justified we would like to test for this
Example: Say we have X1, .., Xn iid F, and we wish to test H0: F=N(0,1)
First notice that here the alternative hypothesis is H0: F
N(0,1), or even simply left out. Either way it is a HUGE set, made up of all possible distributions other than N(0,1). This makes assessing the power of a test very difficult.
Chisquare Goodness-of-fit Test
This test uses the same test statistic as the Chisquare test for independence, namely

but now the expected are calculated under the null hypothesis, that is assuming that the distributional assumption is correct
Example: A famous data set in statistics is the number of deaths from horsekicks in the Prussian army from 1875-1894. It has been hypothesized that this data follows a Poisson distribution. Let's carry out a hypothesis test for this.
First of a Poisson distribution has a parameter, l. Clearly even if the assumption of a Poisson distribution is correct it will be correct only for some values of l. The mle of l is the sample mean, here 9.8, and in some sense if any Poisson distribution fits the data, the Poisson with mean 9.8 should. So we will test specifically F = Pois(9.8)
The chisquare goodness-of-fit test has the same assumption as the chisquare test for independence, namely that none of the expected numbers be to small. For example we find E(0) = 20*P(X=0)= 20*dpois(0,9.8) = 0.0. We deal with this by combining some categories. With this we find the following table:
| Number of Horsekicks |
0-6 |
7-9 |
10-12 |
13 or more |
Total> |
| Observed |
6 |
4 |
5 |
5 |
20 |
| Expected |
2.8 |
6.8 |
6.5 |
3.9 |
20 |
we find c2 = 5.32.
Under the null hypothesis the c2 statistic has a c2 distribution with m-k-1 degrees of freedom, where m is the number of classes and k is the number of parameters estimated from the data. So here we have m-k-1 = 4-1-1 = 2 d.f, and we find a p-value of 0.07, indicting that the data might well come from a Poisson distribution.
In the binning we have used, some E are a bit small. We could of course bin even further, but then we also lose even more information. Instead we can use simulation to find the p value. The routine horsekicks.fun carries out the calculations.
Example: Say we have a data set and we want to test whether is comes from a normal distribution. In order to use the c2 test we first need to bin the data. There are two basic strategies:
a) Use equal size bins (with the exception of the first and the last)
b) Use adaptive bins chosen so that each bin has roughly the same number of observations.
The routine normal.chisq carries out both versions.
Testing for normality is a very important problem, although because of simulation not quite as important today as it used to be. There are a number of test available for this problem, most of them much better (that is with higher power) than the chisquare test. Look for example for the Shapiro-Wilks test and the Anderson-Darling test.
A very good way to assess the distribution of a sample (such as normality) is to draw a graph specifically designed for this purpose, the probability plot. It plots the sample quantiles vs. the quantiles of the hypothesized distribution. If the data follows that distribution the resulting plot should be linear. In R we have the routine qqplot and for the normal distribution especially we have qqnorm. qqline adds a least squares fitted line to help with reading the graph. One important use of this graph is that it also provides some insight on what the true distribution looks like. qq.ill does some examples.
Kolmogorov-Smirnov Goodness-of-Fit Test
Say we have X1, .., Xn which are continuous and independent r.v. and we wish to test H0: Xi~F for all i.
This is done by the Kolmogorov-Smirnov Goodness-of-Fit test, which works as follows: first we need to define the empirical distribution function Fn(x) as follows: Fn(x) = {% of observations ≤ x}. The empirical distribution function is very important in Statistics, it is found with the routine empcdf and in emp.ill we illustrate it on several examples.
Now if the null hypothesis is true, than Fn should be close to F. We will test this using the test statistic D = max{|Fn(x)-F(x)|:x
}. This is called the Kolmogorov-Smirnov statistic.
At first glance it appears that computing D is hard: it requires finding a maximum of a function which is not differentiable. But inspection of the graphs (and a little calculation) shows that the maximum has to occur at one of the jump points, which in turn happen at the observations. So all we need to do is find Fn(Xi)-F(Xi) for all i.
The method is implemented in R in the routine ks.test where x is the data set and y specifies the null hypothesis, for example y="pnorm" tests for the normal distribution. Parameters can be given as well. For example ks.test(x,"pnorm",5,2)tests whether X~N(5,4)
Note that this implementation does not allow us to estimate parameters from the data. Versions of this test which allow such estimation are known for some of the standard distributions are known, but not part of R. We can of course use simulation to implement such tests.
It is generally recognized that the Kolmogorov-Smirnov test is much better than the Chisquare test.