• random number tables
• numbers taken from things like the exact (computer) time
• quantum random number generators
• ...
Currently none of these are practically useful (and affordable)
Of course a computer can not do anything truly random, so all we can do is generate X1, X2, .. that appear to be iid U[0,1], so-called pseudo-random numbers.
Luckily, some people are really good at that!
Example A linear congruential generator works as follows: start with a seed X0, calculate
Xn+1=(aXn+c) mod m
where a, b, c and m are chosen such that
• c and m are relatively prime
• a-1 is divisible by all prome factors of m
• a-1 is a multiple of 4 if m is a multiple of 4
A well known PRNG in Numerical Recipes in C uses a=1664525, b=0, c=1013904223 and m=232
Some computer programs (like R) have them already built in, most general computer languages (like C) do not. There are many excellent ones availabe on the Internet.
Some issues to be aware of:
All pseudo random number generators are cyclic, that is there is an N such that X1=XN, X2=XN+1 etc. For any decent method we have N in the billions. For the one above N=m=232
All pseudo random number generators have a SEED, usually an integer. If you want to generate the same sequence you can do this by specifying this SEED.
in R if you want to generate the same sequence again then use the command set.seed(SEED) where SEED is an integer.
There are often subtle differences between compilers so don't expect the same program to generate the same sequence on different computers.
Why this works:
Here we have p1 < U < p1 + p2, so we set X=x2
The routine gendisc1 runs this algorithm
Example How this works: generate data from the following rv:
P(X=0)=0.1, P(X=1)=0.3, P(X=2)=0.5 and P(X=4)=0.1
Now we generate U~U[0,1] and we get:
Case 1: U=0.0512, then we have the following:

so U<p1, and we set X=x1=0
Case 2: U=0.3502, then p1<U<p1+p2, and we set X=x2=1
Case 3: U=0.9542, then U>p1+p2+..+pk-1, and we set X=x4=4
Notice that the values of X (x1,..) are almost irrelevant, in fact we can just generate data with values 1, 2, .., and in the end change the "labels": "1"→x1, "2"→x2 etc.
Theorem the above algorithm generates the required rv.
proof
Remember that U~U[0,1], so P(U<x)=x if 0<x<1
P(X=x1) = P(U<p1)=p1
say k>1, then
P(X=xk) = P(p1+..+pk-1U<p1+..+pk) =
(p1+..+pk)-(p1+..+pk-1) = pk
Example Generate 100000 observations from X~Bin(10,0.65)
Of course there is a routine in R built in to do this: rbinom(100000,10,0.65). You can check that the correct data was generated by comparing table(rbinom(100000,10,0.65))/10000 with dbinom(0:10,10,0.65)
Or you can use our routine above: gendisc1(100000,0:10,dbinom(0:10,10,0.65))
So this works but is a bit slow. Can we speed it up? Consider this: dbinom(0,10,0.65)=2.76·10-5 , so we almost never choose 0, but we check it in the computer program every single time. dbinom(6,10,0.65)-dbinom(5,10,0.65)=0.2522 is the biggest interval and about 25% of the time U is in there, but in order to get there our program first needs to check 0, 1, 2, ,3 , 4 and 5. This give us an idea for a slight improvement: order p and x by decreasing size of p. here is the table of x and p:
| x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| p | 0 | 0.001 | 0.004 | 0.021 | 0.069 | 0.154 | 0.238 | 0.252 | 0.176 | 0.072 | 0.013 |
Here is the same table, ordered by the p's:
| x | 7 | 6 | 8 | 5 | 9 | 4 | 3 | 10 | 2 | 1 | 0 |
| p | 0.252 | 0.238 | 0.176 | 0.154 | 0.072 | 0.069 | 0.021 | 0.013 | 0.004 | 0.001 | 0 |
so now if U<0.252 we set x=7, if 0.252<U<0.252+0.238=0.49 set x=6 and so on
The routine gendisc2 does this. Check gendisc2(100000,0:10,dbinom(0:10,10,0.65)) and compare its speed to the others. rbinom is still much faster. Another way in R to speed things up is to "vectorize" the program, see gendisc2a
Example Generate observations from X~G(0.01)
Her we have the additional problem that the vector p is infinite. Computers cannot handle infinitly large objects, so we need to "truncate" p. Here is one way to do this:
1) Find x1 and x2 such that P(x1<X<x2)=0.999999
x1=qgeom(0.0000005,0.01)+1=1 and x2=qgeom(0.9999995,0.01)+1=1443
If U<0.0000005+P(X=1)=0.0100005, set x=1
If U>0.9999995, set x=1444
otherwise do as above.
c for all j such that pj > 0. Then
Example : Say we want to generate a r.v X with values x in {1,3,5,7} and probabilities p=c(0.1,0.5,0.1,0.3). First we need a r.v. Y which is easy to generate and takes 4 values (not necessarily the same as in X though!). We can use for this the r.v. that chooses a number from 1 to 4 at random, using the sample(1:4,1) command. This has pmf q=(1/4,1/4,1/4,1/4). , so p/q=(0.4,2,0.4,1.2) and if we set c=2 we have pj/qj
c for all j.
with this the accept-reject algorithm for this problem is implemented in gendisc3
Why this works: if our "candidate" r.v Y picks a value y which has a high probability in p, it will often be accepted and we get many of these. If on the other hand Y picks a value which has a low probability in p, it will rarely be accepted and we get only a few of those. The method is illustrated in accrej.ill
Theorem
The accept-reject algorithm generates a r.v. X such that P(X=xj) = pj.
In addition, the number of iterations of the algorithm needed until X is found is a geometric r.v. with mean c.
proof
Now each iteration is a Bernoulli trial with success probability 1/c, and successive trials are independent. Therefore the number of trials needed until the first success is a geometric r.v. with mean c. Also
Example : say we want to generate r.v.'s X such that P(X=k)=6/(π2k2), k=1,2, ...
Recall
We need a r.v Y on {1,2,..} which we can generate. There are two discrete rv's we know which have infinitely many values, the geometric and the Poisson. But there is a problem with those two:

We do know, though, a continuous r.v. that goes to 0 very slowly, namely the Cauchy. Actually, its density f(x)=1/(π(1+x2)) already has the right "size" for our problem. We can do this, then by "discretizing" the Cauchy: Let Z~Cauchy and define Y=i(I[-i,-i+1](Z)+I[i-1,i](Z)) for i=1,2,.. So
if |Z|<1 → Y=1
if 1<|Z|<2 → Y=2
and so on. Then
Now we need max{pj/qj}. Doing this via calculus would be quite difficult because we would end up with a nonlinear equation which we would need to solve numerically. A different approach is to just plot j vs. pj/qj and see what it looks like. This is done in gendisc4 with n=100 and findc=T. We see that pj/qj appears to approach a limit a little below 1 as j goes to infinity, but it has a value of 1.216 at j=1, so we can reasonably guess that c=1.216 should work for us. (Of course we can verify all that by taking derivatives and using de L'Hospital's rule)
With this we can generate these r.v.'s, again in gendisc4.
Alternatively we could of course have used the inverse transform method, but notice that in this r.v. occasionally we see very large values, and so the inverse transform method would be quite slow.
In the theorem above it says that the mean number of trials until the first success, which is the number of "tries" until we find an acceptable candidate, has a geometric rv with mean c. Obviously the smaller c is, the faster we find a Y that is accepted, the faster we generate our observations.
We have the same theorem as for the discrete case:
Theorem
The accept-reject algorithm generates a r.v. X with density f.
In addition, the number of iterations of the algorithm needed until X is found is a geometric r.v. with mean c.
proof same as above.
Note: we do have to be careful because of course g(x) can be 0. This is ok as long as f(x) is 0 as well but not if f(x)>0. Basically we need Y to live on the same set as X. (We say X and Y have the same support)
Example : generate a r.v. X with density f(x)=6x(1-x) 0<x<1.
Here we can use Y~U[0,1], with g(x) = 1. Let's find max{f(x)/g(x) | 0<x<1}. Taking derivatives we find d/dx {6x(1-x)} = 6-12x = 0, x=1/2. This is a maximum (?) so we have max{f(x)/g(x) | 0<x<1} = f(1/2) = 3/2 = c
Note: f(x)/(cg(x)) = 6x(1-x)/(3/2*1) = 4x(1-x)
The routine is implemented in gencont1
why the accept-reject algorithm works is illustrated on this Example in accrej1.ill
Example : generate a r.v. X ~ G(3/2, 1)
Now f(x)>0 for x>0, so we need a Y that "lives" on (0,
) and that we already know how to generate, for example an exponential r.v. We know how to do this: if U~U[0,1] and λ>0 then Y=-λlog(U) ~ E(1/λ)
What value of λ should we use? Note that EX = 3/2, EY = λ, so maybe λ=3/2 is a good idea.
with this we need to find c. Again we will try to find max{f(x)/g(x)}. We have
and so
The routine is implemented in gencont2
why the accept-reject algorithm works for this example is illustrated on this example in accrej2.ill
Above we picked λ=3/2 because this matched up the means of X and Y, a reasonable choice. But is it an optimal choice? Let's see. Optimal here means a choice of λ that minimizes c. Now if we repeat the above calculation with λ instead of 3/2 we find

This is the minimum (?) and so our choice was in fact optimal.
as one might expect, generating data from random vectors is generally harder than for one dimensional random variables. To begin with though, at least for the case of finite rv's there is nothing new: say we have X=(X1, .., Xn) where Xj takes values xj1, ..,xjnj. Then all we need to do is generate data from the random variable X' with values x11, ..,x1nj, x21, ..,xjn2, .., xn1, ..,xnnj and their respective probabilities.
Example generate data from the random vector (X,Y) with pmf
| Y | |||
|---|---|---|---|
| 0 | 1 | ||
| X | 0 | 0.1 | 0.5 |
| 0 | 0.3 | 0.2 |
Instead generate data from X' with values 1-4 and probabilities (0.1,0.5,0.2,0.3). Then
if X'=1 set X=0, Y=0
if X'=2 set X=0, Y=1
if X'=3 set X=1, Y=0
if X'=4 set X=1, Y=1
Example generate data from the random vector (X,Y,Z) with pmf f(x,y,z)=(x+y+z)/162, x,y,z
{1,2,3}
Note that there are 27 different combinations of values of (x,y,z), so we begin by generating data from a random variable X' with values 1-27 and probabilities as above.
done in gendisc5
For infinite discrete and for continous random vectors we still have the Accept-Reject algorithm:
Example say we want to generate data from a continuous rv (X,Y,Z) with f(x,y,z) = 4xy 0≤x,y,z≤1
Here we can generate data from (U1,U2,U3) = 1 on [0,1]3. Now
c=max{f(x,y,z)/g(x,y,z)}=max{4xy/1}=4.
One problem in the multivariate case is to make sure that our program generates the correct data. One idea is to check the histograms of the marginals, but of course this is not sufficient proof that there is no mistake. Here the marginals are given by

The algorithm is in gen_xyz(1).
In this case of course X,Y and Z are independent because f(x,y,z)=fX(x)·fY(y)·fZ(z), so we can just generate data from the marginals, see gen_xyz(2).
Another idea is to generate the data from conditional distributions:
Example Say (X,Y) is a discrete rv with joint pmf f(x,y)=(1-p)2px, x,y
{0,1,..}, y≤x, and 0<p<1. Generate data from (X,Y)
Note that

so we have the Y=G-1 and X|Y=y = G+y-1, where G~Geom(1-p). The method is implemented in gen_px
Example say we have a 10-dimensional rv with joint pdf f(x1,..,x10)=c∏xi, 0<x1<x2<..<x10<1. Generate data from this rv.
For the methods we know sofar we need c:

so this is ugly. Doable, but ugly. Of course, if we needed this for 100 instead of 10... It turns out, though, that we can actually generate such data even without knowing c, but this discussion has to wait a bit.