is the mle of q, so a test based on
seems reasonable.
will indicate that the alternative is more likely to be true than the null hypothesis, so a reasonable rejection region is {
>cv}. To find cv we need to solve the equation a = P(
>cv | q=q0) = 1 - P( X1+..+Xn < n*cv | q=q0), and so cv = qpois(1-a,nq0)/n
What if we want to test H0: q=q0 vs. H1: q
q0? Now the critical region should be {
< cv1 or
> >cv2}. It turns out, though, that the choice of cv1 and cv2 is not unique. For example, cv1 = 0 and cv2 = qpois(1-a,nq0)/n would work. One popular strategy is the following
cv1 = qpois(a/2,nq0)/n and cv2 = qpois(1-a/2,nq0)/n.
This test is implemented in poistest1. It also draws the power curve.
Q0 vs. H1: q
Q0c. Then the likelihood ratio test statistic is defined by
The constant c here is not important, it will be found once we deside on the type I error probability a. It may better to think of this as "We reject H0 if l(x) is small".
Note that the supremum in the denominator is found over the whole parameter space, so this is just like finding the mle, and then finding the corresponding value of the likelihood function.
Note that we in the numerator we find the supremum over a subset of the one used in the denominator, so we always have l(x) ≤ 1
The logic of the LRT is this: In the denominator we have the likelihood of observing the data we did observe, given the most favourable paramaters (the mle) possible. In the numerator we find the likelihood assuming the null hypothesis is true. If their ratio is much smaller than 1, then there are parameters outside the null hypothesis which are much more likely than any in the null hypothesis, and we would reject the null hypothesis.
Example: Let X1, .., Xn be a sample from N(q,1). Consider testing H0: q=q0 vs. H1: q
q0. Here Q0={q0} and so the numerator of l(x) is L(q0|x). For the denominator we have to find sup{L(q|x),q 
}:
and so
Now an LRT test rejects the null hypothesis if l(x) < c for some constant c. But
In other words the LRT test rejects the null hypothesis if l(x) is small, which is equivalent to |
-q0| being large.
What is the constant c3? It depends on a, namely
for example, say n=10 and we want a=0.05, then c3=qnorm(1-0.05/2)/sqrt(10) = 0.62
Example: Let X1, .., Xn be a sample from a population witha pdf f(x|q) = e-(x-q) if x>q and 0 otherwise. (This is an exponential r.v with rate 1, shifted by q). The likelihood function is given by
Consider testing H0: q≤q0 vs. H1: q>q0 for some specified value q0. The function L(q|X) is positive and increasing on -
< q ≤ X(1), and then drops to 0. Thus the denominator of l(X), the unrestricted maximum of L(q|X) is L(X(1)|X). If X(1) ≤ q0, the numerator of l(X) is also L(X(1)|X). But since we maximize over q≤q0, the numerator is L(q0|X)if X(1) > q0. Therefore the likelihood ratio statistic is given by
The likelihood ratio function is drawn in lrtexp
An LRT rejects the null hypothesis if l(X) ≤ c, which is clearly equivalent to a test which rejects the null hypothesis if X(1) ≥ c'. To determine the value of c' for some specific n and a we would now need the distribution of X(1). This is not impossible (actually, not even very hard, we have a formula for the 1st order statistic) but we can also do this via simulation. The calculations are also in lrtexp.
Q0 vs. H1: q
Q0c. Then under some regularity conditions the distribution of -2logl(X) converges to the distribution of a c2(p). Here p is difference of the number of free parameters in Q and the number of free parameters in Q0.
Note that if we let
be the mle of L, that is L(
|X) = sup{L(q|X):q
Q} and let
be the numerator of the LRT, that is L(
|X) = sup{L(q|X):q
Q0}, then
Example: Let's return to the example of the mixture of two normals, and let's say we want to test H0: a=0.5 vs. H1: a
0.5. For the denominator of the LRT we can use the routine mixmle2. For the numerator we could redo the calculations leading to mixmle2, but now assuming a=0.5. Alternatively (and a whole lot less work, but considerably slower) we can use the R routine optim, general purpose minimizer. The test is run in the routine mixtest1
How good a test is this? That is, does it achieve its nominal type I error, and what is its power? We study this on one example in mixtest1power where we generate 200a N(0,1)'s and 200(1-a) N(3,0.52) for a from 0.5 (meaning the null hypothesis is true) to 0.8 (meaning it is quite wrong). To compute the power we run 1000 simulations for each of 20 values of a. Even on a fast desktop computer this takes about 20 minutes! And a complete study should include the effects of n and of different values of m1, s1, m2 and s2! The result of this study is in mixpoweres[[1]]
Example: Again consider the mixture of normals example, and let's say we wish to test H0: m1-m2=2.5, s1=s2 vs H1: either one or the other is not true.
Here under H0 we have three parameters: a, m1 (and once we have m1, m2 is fixed) and s1 (again then s2 is fixed).
The test is run in the routine mixtest2
Now the power of the test is a function of two variables, the true difference in means, and the true difference in standard deviations. In mixtestpower2 we compute the power for some examples. The result of two cases is in mixpoweres[[2]] and mixpoweres[[3]]
A test based on Zn = (Tn-q)/Sn is often called a Wald test.
Example: Let X1, .., Xn be iid Ber(p). Consider testing H0: p = p0 vs. H1: p
p0. The MLE of p is
=
. Since
is the sample mean the CLT applies and states that for any p
Of course we don't know p, but again we can estimate the p's in the denominator by
and so we get a test with the test statistic
and we reject the null hypothesis is |Zn| > za/2.
Instead of replacing the p's in the denominator by
we could also have used p0. Then another test is based on

which rejects the null hypothesis if |Z'n| > za/2.
Which of these tests is better? Well, that depends on the power function. In powerbertest we draw the power function for both tests, for testing p=0.2. As we see the powerfunctions cross, so it depends on the true value of p which test is better.
In the case of Z' we can actually compute the power directly from a binomial cdf, see the black curve.
Example: Let X1, .., Xn be iid N(m,s2) and let the prior distribution on m be N(m0,t2), where s2, m0 and t2 are known. Say we wish to test H0: m ≤ m0 vs. H1: m > m0. It can be shown that then the posterior distribution p(m|X) is N(nt2
+s2m)/(nt2+s2),s2t2/(nt2+s2)).
Say we decide to accept H0 if P(H0 is true |X) > P(H1 is true |X) and reject H0 otherwise. So we reject H0 if
P(m ≤ m0|X) ≥ 1/2.
Since p(m|X) is symmetric, this is true iff the mean of p(m|X) is less than or equal to m0. Therefore H0 is accepted if
≤ m0 + s2(m0-m)/(nt2)
Example: Say X1, .., Xn are iid P(l). We wish to test H0: l=l0 vs. H1: l
l0.
We know that the sample mean
is the mle of l, and so we can base a test on
and we reject H0 if
is to far from l. For this we compute B simulated data sets X'1, .., X'n iid P(l0) and find the a/2 and the 1-a/2 quantiles. This will give us the critical values cv1 and cv2, and we reject H0 if either
< cv1 or if
> cv2
The calculations are in poistest and a power calculation is in powerpoistest
There are also some tests already built on simulation. Say we have X1, .., Xn from some distribution f(x) with mean m1 and Y1, .., Ym from the same distribution f(x) with mean m2. We wish to test H0: m1=m2 vs. H1: m1
m2.
Now under the null hypothesis the two samples come from the exact same distribution with the same mean. So their order is completely random, and if we find the test statistic
it should have a mean of 0. We could now proceed as above, generating data from f etc., but here is another idea: let's but the X's and the Y's in one vector of length n+m, let's find a random permutation of this vector, and then split it up into the first n and the last m numbers, calling them X' and Y'. Because under H0 any rearrangement of the data is just as likely as any other, this new data set is just as good as the original one. So we can now find T' from these observations. Because we have complete mixed the X's and the Y's we have ET'=0 for sure. Generating many T's we get an idea of "likey" values of T, and can compare the one in the real data set to them.
This is called a permutation test.
Example: Say we have X1, .., Xn iid N(m1,s2) and Y1, .., Ym iid N(m2,s2). Now the LRT test for this is based on

which has a t distribution with n+m-2 degrees of freedom, and we reject H0 if |T|>qt(1-a/2). Let's compare this test (which is known to be optimal) to a permution test.
The 2-sample t test is already implemented in t.test, and we can find its power either via simulation or analytically. The permutation test ist implemented in perm.test. Its power has to be found via simulation, but because each permutation already includes a loop over 1000 runs, and now a simulation of a single point on the power curve will need to run perm.test 1000 times as well, and we will want say 50 points on the power curve, this mean we will run the inner loop of perm.test about 50 million times. On a desktop this takes about 2 hours. One example (for n=10 and m=15) was already run, and the result is in powertperm.plot. As we can see, the permutation test is just about as good as the t test, at least for these parameter values.