Example Say we have X1, .., Xn ~ Pois(θ) and we want to test H0: λ=λ0 vs. H1: λ > λ0
We know that
is the mle of λ, so a test based on
seems reasonable.
Clearly large values of
will indicate that the alternative is more likely to be true than the null hypothesis, so a reasonable rejection region is {
>cv}. To find cv we need to solve the equation
α = P(
>cv | λ=λ0) = 1 - P( X1+..+Xn < n·cv | λ=λ0),
but under the null hypothesis Xi~P(λ), so X1+..+Xn ~P(nλ) so cv = qpois(1-α,nλ0)/n
The p-value of the test is p=P(Y> x1+..+xn | λ=λ0) where Y~P(nλ0)
What if we want to test H0: λ=λ0 vs. H1: λ≠λ0? Now the critical region could be {
<cv1 or
>cv2}. It turns out, though, that the choice of cv1 and cv2 is not unique. For example , cv1 = 0 and cv2 = qpois(1-α,nλ0)/n would work. One popular strategy is the following
cv1 = qpois(α/2,nλ0)/n and cv2 = qpois(1-α/2,nλ0)/n.
The p-value now is found by
This test is implemented in poistest.
The type II error probability for the one and two-sided tests are

poistest also draws the power curves.
θ0 vs. H1: θ
θ0c. Then the likelihood ratio test statistic is defined by
The constant c here is not important, it will be found once we decide on the type I error probability α. It may better to think of this as "reject H0 if λ(x) is small".
Note that the supremum in the denominator is found over the whole parameter space, so this is just like finding the mle, and then finding the corresponding value of the likelihood function.
Note that in the numerator we find the supremum over a subset of the one used in the denominator, so we always have λ(x) ≤ 1
The logic of the LRT is this: In the denominator we have the likelihood of observing the data we did observe, given the most favourable paramaters (the mle) possible. In the numerator we find the likelihood assuming the null hypothesis is true. If their ratio is much smaller than 1, then there are parameters outside the null hypothesis which are much more likely than any in the null hypothesis, and we would reject the null hypothesis.
Notice the connection to the Neyman-Pearson theory: in a test of H0: θ=θ0 vs Ha: θ=θ1 a test with rejection region of the form f(x|θ1)>kf(x|θ0) is optimal, and
f(x|θ1)>kf(x|θ0) iff f(x|θ0)/f(x|θ1)<1/k
but here λ(x)=f(x|θ0)/f(x|θ1) is the likelihood ratio. So while we no longer have a theorem that guarantees optimality if the hypotheses are composite, it is still reasonable to expect this type of test to be quite good.
Example : Let X1, .., Xn be a sample from N(θ,1). Consider testing H0: θ=θ0 vs. H1: θ≠θ0. Here Θ0={θ0} and so the numerator of λ(x) is L(θ0|x). For the denominator we have to find sup{L(θ|x),θ 
}:
and so
Now an LRT test rejects the null hypothesis if λ(X) < c for some constant c. c depends on the choice of α. Again ot is best to think of the test as rejecting H0 if "λ(X) is small". But
λ(X)=exp{-n/2(
-θ0)2} is small iff
-n/2(
-θ0)2 is small iff
(
-θ0)2 is large iff
|
-θ0| is large, say |
-θ0|>c
In other words the LRT test rejects the null hypothesis if λ(X) is small, which is equivalent to |
-θ0| being large.
What is the constant c? It depends on α, namely
For example , say n=10 and we want α=0.05, then c=qnorm(1-0.05/2)/√10 = 0.62
Example : Let X1, .., Xn be a sample from a population with pdf f(x|θ) = e-(x-θ) if x>θ and 0 otherwise. (This is an exponential r.v with rate 1, shifted by θ). The likelihood function is given by
Consider testing H0: θ≤θ0 vs. H1: θ>θ0 for some specified value θ0. The function L(θ|X) is positive and increasing on -∞ < θ ≤ X(1), and then drops to 0. Thus the denominator of λ(X), the likelihood function evaluated at the unrestricted maximum of L(θ|X) is L(X(1)|X). If X(1) ≤ θ0, the numerator of λ(X) is also L(X(1)|X). But since we maximize over θ≤θ0, the numerator is L(θ0|X) if X(1) > θ0. Therefore the likelihood ratio statistic is given by
The likelihood ratio function is drawn in lrtexp
An LRT rejects the null hypothesis if λ(X) ≤ c, which is clearly equivalent to a test which rejects the null hypothesis if X(1) ≥ c'. To determine the value of c' for some specific n and α we would now need the distribution of X(1):
so now
This is implemented in lrtexp(2). The type II error probability is given by
Θ0 vs. H1: θ
Θ0c. Then under some regularity conditions the distribution of -2logλ(X) converges to the distribution of a c2(p). Here p is difference of the number of free parameters in Θ and the number of free parameters in Θ0.
Example We flip a coin 1000 times and find 545 heads. Test at the 5% whether this is a fair coin.
In general we have X1, .., Xn~Ber(p) and we want to test H0:p=p0 vs Ha:p≠p0. Let's find the LRT test for this problem. First we have

lrt.ber draws the LRT statistic, and it is clear that
λ(x)<c iff k much smaller or much larger than np0 iff |k-np0|>cv
Now k=∑Xi~Bin(n,p0), so
α=P(|k-np0|>cv) = 1-P(|k-np0|<cv) = 1- P(-cv<k-np0<cv) = 1- P(-cv+np0<k<cv+np0)
For our test n=1000, p0=0.5 and using α=0.05 we find cv=31 (lrt.ber(2)), and so we reject the null hypothesis because |k-np0|=|545-1000·0.5|=45>31. We conclude that the coin is not fair.
How about using the chisquare approximation? In that case T=-2logλ(x)~χ2(1), so
and again we reject H0, now because T=8.11>cv=3.8
Why do we have this approximation? If H0 is true k~Bin(n,p0), so E[k]=np0, so k~np0. If we find the Taylor series expansion of log(x+1) at x=0 we get
Example : say X1, .., Xn~Beta(α,β) and we want to test H0:α=β vs. Ha:α≠β . Now

For the numerator of the LRT statistic we assume the null hypothesis is true: α=β, and estimate α using Newton's method. We also make use of some functions built into R: lgamma, digamma and trigamma compute the log of the gamma function and it's first and second derivative. So let g(x)=log(gamma(x)), then

For the denominator we have

After computing the suprema we can find the test statistic. All of this is implemented in lrt.beta. In lrt.beta1 we do a power study for the case α=2.0, the result is shown here:
A test based on Zn = (Tn-θ)/Sn is often called a Wald test.
Example : Let X1, .., Xn be iid Ber(p). Consider testing H0: p=p0 vs. Ha: p≠p0. The MLE of p is
=
,so the CLT applies and states that for any p
Of course we don't know p, but again we can estimate the p's in the denominator by
and so we get a test with the test statistic
and we reject the null hypothesis is |Zn| > za/2.
Instead of replacing the p's in the denominator by
we could also have used p0. Then another test is based on

which rejects the null hypothesis if |Z'n| > za/2.
Which of these tests is better? Well, that depends on the power function. In ber1.pow we draw the power function for both tests, for testing p=0.2. The Z test is in blue and the Z' test in red. The Z curve is found via simulation, whereas the Z' power curve can be calculated directly:

As we see the power curves cross, so it depends on the true value of p which test is better.
If we suspect that p>0.2 we might prefer the Z' test, otherwise the Z test.
Example : Let X1, .., Xn be iid N(μ,σ) and let the prior distribution on μ be N(μ0,t), where σ, μ0 and t are known. Say we wish to test H0: μ ≤ μ0 vs. H1: μ > μ0. It can be shown that then the posterior distribution π(μ|X) is
N(nt2
+σ2μ)/(nt2+σ2),σt/√(nt2+σ2)).
Say we decide to accept H0 if P(H0 is true |X) > P(H1 is true |X) and reject H0 otherwise. So we reject H0 if
P(μ ≤ μ0|X) ≥ 1/2.
Since π(μ|X) is symmetric, this is true iff the mean of π(μ|X) is less than or equal to μ0. Therefore H0 is accepted if
≤ μ0 + σ2(μ0-μ)/(nt2)
Example : Say X1, .., Xn are iid P(λ). We wish to test H0: λ=λ0 vs. H1: λ≠λ0.
We know that the sample mean
is the mle of λ, and so we can base a test on
and we reject H0 if
is to far from λ. For this we compute B simulated data sets X'1, .., X'n iid P(λ0), compute their sample means and find the α/2 and the 1-α/2 quantiles. This will give us the critical values cv1 and cv2, and we reject H0 if either
< cv1 or if
> cv2
The calculations are in poissim.test, which also draws the power curve, together with the curve of the test we discussed previously.
There are also some tests already built on simulation. Say we have X1, .., Xn from some distribution f(x) with mean μ1 and Y1, .., Ym from the same distribution f(x) with mean μ2. We wish to test H0: μ1=μ2 vs. H1: μ1≠μ2.
Now under the null hypothesis the two samples come from the exact same distribution with the same mean. So their order is completely random, and if we find the test statistic
it should have a mean of 0. We could now proceed as above, generating data from f etc., but here is another idea: let's put the X's and the Y's in one vector of length n+m, let's find a random permutation of this vector, and then split it up into the first n and the last m numbers, calling them X' and Y'. Because under H0 any rearrangement of the data is just as likely as any other, this new data set is just as good as the original one. So we can now find T' from these observations. Because we have completely mixed the X's and the Y's we have ET'=0 for sure. Generating many T's we get an idea of "likely" values of T, and can compare the one in the real data set to them.
This is called a permutation test.
Example : Say we have X1, .., Xn iid N(μ1,σ) and Y1, .., Ym iid N(μ2,σ). Now the LRT test for this is based on

which has a t distribution with n+m-2 degrees of freedom, and we reject H0 if |T|>qt(1-α/2, n+m-2). Let's compare this test (which is known to be optimal) to a permution test.
The 2-sample t test is already implemented in t.test, and we can find its power either via simulation or analytically. The permutation test ist implemented in perm.test. Its power has to be found via simulation, but because each permutation already includes a loop over 1000 runs, and now a simulation of a single point on the power curve will need to run perm.test 1000 times as well, and we will want say 50 points on the power curve, this means we will run the inner loop of perm.test about 50 million times. On a desktop this takes about 2 hours. One example (for n=10 and m=15) was already run, and the result is shown below. As we can see, the permutation test is just about as good as the t test, at least for these parameter values.