Example say we want to generate data from the distribution with density

(That is actually very easy once you realize that F=Beta(3/2,2))
In sb4(1) we generate 1000 observations from this F, draw the histogram (scaled so it integrate out to 1) and overlay it with the true density.
The histogram is a nice graph to check but it is not always clear how good a match we have, especially in the "tails" of the distribution. Another useful graph is a plot of the empirical cdf vs. the true cdf, as shown in sb4(2), Of course for this we need to know the cdf:
Example Does the R command sample actually work? In check.sample we generate data and compare the frequencies with the probabilities.
If the rv takes infinitly many values we might have to combine "low-probability" cases.
If the data is continuous we can still compute some useful numbers:
Example let's again use the example above. sb4(3) computes some examples.
Example : Say we have X1, .., Xn iid F, and we wish to test H0: F=N(0,1)
Consider the following result of a simulation study, based on 100 observations:
| True | Simulation | |
|---|---|---|
| 1 | 0.31 | 0.3 |
| 2 | 0.17 | 0.1 |
| 3 | 0.05 | 0.12 |
| 4 | 0.47 | 0.48 |
Now, is this a good fit? Good enough so we can say our simulation generates the correct data? Obviously if 0.31 is close to 0.3, and so on, we did ok. So what we need is a formula to combine all the info above into one number, a small one if the fit is good and a large one if it is not. There are many ways to do this, the most famous is this one:

here "O" stands for observed and "E" for expected. The formula uses the actual data, not the frequencies, so O = 31, 17, 5 and 47. The expected are calculated under the null hypothesis, that is assuming the true probabilities hold, so E = np = 30, 10,12, 48. Therefore the test statistic is
c2 = (31-30)2/30+(17-10)2/10+(5-12)2/12+(47-48)2/48 = 9.04
So is 9.04 "large" or "small"? Well a famous theorem says that under some conditions c2 has a chisquare distribution with k-1 degrees of freedom where k is the number of "categories", here 4. So the p value of the test would be
P(c2 >9.04)= 1-pchisq(9.04,3)=0.029
and if we use the usual 5% level we would reject the null hypothesis, these simulation does not generate the right data.
Example : A famous dataset in statistics is the number of deaths from horsekicks in the Prussian army from 1875-1894. It has been hypothesized that this data follows a Poisson distribution. Let's carry out a hypothesis test for this.
First off a Poisson distribution has a parameter, λ. Clearly even if the assumption of a Poisson distribution is correct it will be correct only for some values of λ. A good estimator of λ is the sample mean, here 9.8, and in some sense if any Poisson distribution fits the data, the Poisson with mean 9.8 should. So we will test specifically F = Pois(9.8)
The chisquare goodness-of-fit test is a large-sample test , it requires a certain minimal sample size. The usual condition is E≥5, although it is known that this is very conservative. For example we find E(0) = 20*P(X=0)= 20*dpois(0,9.8) = 0.0. We deal with this by combining some categories. With this we find the following table:
| Number of Horsekicks | 0-6 | 7-9 | 10-12 | 13 or more | Total |
| Observed | 6 | 4 | 5 | 5 | 20 |
| Expected | 2.8 | 6.8 | 6.5 | 3.9 | 20 |
In the binning we have used, some E are a bit small. We could of course bin even further, but then we also lose even more information. Instead we can use simulation to find the p value. The routine horsekicks.fun carries out the calculations.
Example : Say we have a data set and we want to test whether is comes from a normal distribution. In order to use the χ2 test we first need to bin the data. There are two basic strategies:
a) Use equal size bins (with the exception of the first and the last)
b) Use adaptive bins chosen so that each bin has roughly the same number of observations.
The routine normal.chisq carries out both versions.
Testing for normality is a very important problem, although because of simulation not quite as important today as it used to be. There are a number of tests available for this problem, most of them much better (that is with higher power) than the chisquare test. Look for example for the Shapiro-Wilks test and the Anderson-Darling test.
A very good way to assess the distribution of a sample (such as normality) is to draw a graph specifically designed for this purpose, the probability plot. It plots the sample quantiles vs. the quantiles of the hypothesized distribution. If the data follows that distribution the resulting plot should be linear. In R we have the routine qqplot and for the normal distribution especially we have qqnorm. qqline adds a line that passes the the first and third quartiles to help with reading the graph.
Say we have X1, .., Xn which are continuous and independent r.v. and we wish to test H0: Xi~F for all i. To check this we draw the graph of the empirical vs. the hypothesized cdf. But how close do these have to match each other to decide we have a good fit? One idea is to look for the largest difference between the two curves. This is exactly what the next, the Kolmogorov-Smirnov test, does. It uses the test statistic
D = max{|Fn(x)-F(x)|:x
}.
At first glance it appears that computing D is hard: it requires finding a maximum of a function which is not differentiable. But inspection of the graphs (and a little calculation) shows that the maximum has to occur at one of the jump points, which in turn happen at the observations. So all we need to do is find Fn(Xi)-F(Xi) for all i.
The method is implemented in R in the routine ks.test where x is the data set and y specifies the null hypothesis, For example y="pnorm" tests for the normal distribution. Parameters can be given as well. For example ks.test(x,"pnorm",5,2) tests whether X~N(5,2)
Note that this implementation does not allow us to estimate parameters from the data. Versions of this test which allow such estimation are known for some of the standard distributions, but are not part of R. We can of course use simulation to implement such tests.
It is generally recognized that the Kolmogorov-Smirnov test is much better than the Chisquare test.
Example Consider the following two scatterplots. One of them was done by randomly picking points, the other is not quite so random. Which is which?

Example A famous statistician used to do the following exercise with his classes: he had a bag with slips of paper. On each paper was either the word "Real" of "Fake". He let each student pick a paper without him seeing it. Then he told them that tonight at home they should read it. If their paper said "Real" they should get a coin, flip it 100 times and write down the sequence of heads and tails. If it said "Fake" they should just make up a sequence but try to make it as real as they could. The next day in class he went around, looked at each students sequence and guessed whether it was "Real" or "Fake". He got it right most of the time. How did he do that?
He looked at the longest "run" that is the longest sequence of either Heads or Tails. There should be at least one run of length 6 or longer, with probability about 80% Most students who fake it don't put runs anywhere near that long.
Example Say we randomly select two points in the unit square. Is their distance also uniformly distributed?
The answer is no, as shown in randomness(3)
As we said earlier, everything gets much more difficult in higher dimensional space. The main problem is that the marginal distributions do not determine the joint distribution, except in the case of independence. As a first test, we can apply the ideas discussed above to the marginals, but if these pass the test we still can't be sure that our simulation works.
As for formal tests, most of them do not generalize to higher dimensions, and those that seem to do run into the curse of dimensionality. First, recall the following:
points in n-dimensional Euclidean space can be described by their coordinates written as n-tuples (x1,..,xn).
A point is in the n-dimensional hypercube of side length s if |xi|≤s/2 for all i. If s=1 it is called the unit hypercube.
A point is in the n-dimensional hypersphere of radius r if ∑xi2≤r2. If r=1 it is called the unit hypersphere.
Example What is the volume of the the n-d hypercube? Clearly the answer is sd, but consider what that means: if s<1 sd→ 0 and if s >1 sd→ ∞ as d→ ∞ . But the side length is fixed, it is the dimension that goes up!
Example Let's consider the following question: What is the probability that a point generated randomly in the unit square is actually in the unit circle?
In two dimensions we can do this analytically:
but in higher dimensions we need simulation, see sb5
What we find is very strange: sb5(10)=0.02, which seems to say that in 10-dimensional space all randomly choosen points are in the edges!
We can also calculate the ratio of the volumes:
so the the ratio goes to 0 although the volume of the hypercube goes to infinity and the hypersphere touches the hypercube in 2d places!
Example Multivariate normal distribution.
Let's investigate the probability that a randomly chosen observation from a multivariate normal distribution is "out in the tails", that is some given distance from the mean. Let Xd~N(0,I) be the "standard" normal in Rd, then the density of Xd is given by
f(x)=(2π)-d/2exp(-x'x/2)
Let S0.01(x) be the set of all points where the value of the density is 1% of it's highest value, which is of course obtained at the mean. Now

and we find
| d | P |
|---|---|
| 1 | 0.002 |
| 5 | 0.101 |
| 10 | 0.512 |
| 15 | 0.866 |
| 20 | 0.980 |
so in higher dimensions almost all the observations are out in the tails, almost none are close to the mean!
sb6 draws the histogram of the distances from the origin of 10000 observations.
One consequence of the curse of dimensionality is that the chisquare goodness-of-fit test does not extend past 2 or 3 dimensions:
Example say we have observations X1, ..,Xnn in Rd and we want to test whether they come from a unifrom distribution, that f(x)=1 if 0≤xi≤1 for i=1,..,d. First we need to bin the data. Let's say we use the following binning: in each dimension we devide the interval into 10 equal sized bins, 0-1/10, 1/10-2/10,..,9/10-1. What is the probability that a randomly chosen observation from a uniform falls into one of these bins? Because of the uniform the probability is the same for all bins, and we have
P(0≤Xi≤1/10,i=1,..,d) =∏P(0≤Xi≤1/10)=1/10d
Recall the requirement for the chisquare is that E≥5, so we need at least n=5*10d observations. For d=3 that means n=5000, for d=5 it means n=500,000
The required samplesize grows exponentially with the dimension.