
Say we have the following problem: we have a dataset, in expdata, which we suspect is from an exponential distribution. How can we make sure of that? First off we need to do a few checks:
In exp.fun(1) we draw the histogram and the empirical cdf, both with the respective exponential curves, using the sample mean as the rate. The curves appear to be close, but it is difficult to tell just how close.
• what type of bins? Let's use adaptive binning (about equally many observations in each bin)
• how many bins? Let's pick 20, not for any good reason at all!.
The calculations are done in exp.fun(2)
• Generate x'=rexp(100,
)
• find KS statistic for x', using 1/mean(x') as rate
• repeat 1000 times
• p-value is percentage of simulated KS statistics greater than that of data.
This is done in exp.fun(3)
Now we have a problem: the chisquare test says the data may well be from an exponential rv (p-value=0.057) whereas the KS test says no (p-value=0.005). Who do we believe?
• generate x~gamma(α,β)
• find the p-values of the chisquare and the KS-test for x, just as above
• repeat many times and check the percentage the tests reject the null hypothesis
Now if α=1, H0 is true and the powers should be around 0.05, the farther away from 1, the closer to 1 the powers should be. In the next graph we have the power curves:
The graph is done in exp.fun(5) with the data in exp.table. The table was calculated with exp.fun(4) but be careful, it took about 10 hours to do theses calculations on a dual-pentium machine!
So we see that the KS test has greater power against a gamma alternative, so if that is what we are worried about we should indeed reject the null hypothesis.