• random number tables
• numbers taken from things like the exact (computer) time
• quantum random number generators
• ...
Currently none of these are practically useful (and affordable)
Of course a computer can not do anything truly random, so all we can do is generate X1, X2, .. that appear to be iid U[0,1], so-called pseudo-random numbers.
Luckily, some people are really good at that!
Example A linear congruential generator works as follows: start with a seed X0, calculate
Xn+1=(aXn+c) mod m
where a, b, c and m are chosen such that
• c and m are relatively prime
• a-1 is divisible by all prome factors of m
• a-1 is a multiple of 4 if m is a multiple of 4
A well known PRNG in Numerical Recipes in C uses a=1664525, b=0, c=1013904223 and m=232
Some computer programs (like R) have them already built in, most general computer languages (like C) do not. There are many excellent ones availabe on the Internet.
Some issues to be aware of:
All pseudo random number generators are cyclic, that is there is an N such that X1=XN, X2=XN+1 etc. For any decent method we have N in the billions. For the one above N=m=232
All pseudo random number generators have a SEED, usually an integer. If you want to generate the same sequence you can do this by specifying this SEED.
in R if you want to generate the same sequence again then use the command set.seed(SEED) where SEED is an integer.
There are often subtle differences between compilers so don't expect the same program to generate the same sequence on different computers.
Why this works:
Here we have p1 < U < p1 + p2, so we set X=x2
The routine gendisc1 runs this algorithm
Example How this works: generate data from the following rv:
P(X=0)=0.1, P(X=1)=0.3, P(X=2)=0.5 and P(X=4)=0.1
Now we generate U~U[0,1] and we get:
Case 1: U=0.0512, then we have the following:

so U<p1, and we set X=x1=0
Case 2: U=0.3502, then p1<U<p1+p2, and we set X=x2=1
Case 3: U=0.9542, then U>p1+p2+..+pk-1, and we set X=x4=4
Notice that the values of X (x1,..) are almost irrelevant, in fact we can just generate data with values 1, 2, .., and in the end change the "labels": "1"→x1, "2"→x2 etc.
Theorem the above algorithm generates the required rv.
proof
Remember that U~U[0,1], so P(U<x)=x if 0<x<1
P(X=x1) = P(U<p1)=p1
say k>1, then
P(X=xk) = P(p1+..+pk-1U<p1+..+pk) =
(p1+..+pk)-(p1+..+pk-1) = pk
Example Generate 100000 observations from X~Bin(10,0.65)
Of course there is a routine in R built in to do this: rbinom(100000,10,0.65). You can check that the correct data was generated by comparing table(rbinom(100000,10,0.65))/10000 with dbinom(0:10,10,0.65)
Or you can use our routine above: gendisc1(100000,0:10,dbinom(0:10,10,0.65))
So this works but is a bit slow. Can we speed it up? Consider this: dbinom(0,10,0.65)=2.76·10-5 , so we almost never choose 0, but we check it in the computer program every single time. dbinom(6,10,0.65)-dbinom(5,10,0.65)=0.2522 is the biggest interval and about 25% of the time U is in there, but in order to get there our program first needs to check 0, 1, 2, ,3 , 4 and 5. This give us an idea for a slight improvement: order p and x by decreasing size of p. here is the table of x and p:
| x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| p | 0 | 0.001 | 0.004 | 0.021 | 0.069 | 0.154 | 0.238 | 0.252 | 0.176 | 0.072 | 0.013 |
Here is the same table, ordered by the p's:
| x | 7 | 6 | 8 | 5 | 9 | 4 | 3 | 10 | 2 | 1 | 0 |
| p | 0.252 | 0.238 | 0.176 | 0.154 | 0.072 | 0.069 | 0.021 | 0.013 | 0.004 | 0.001 | 0 |
so now if U<0.252 we set x=7, if 0.252<U<0.252+0.238=0.49 set x=6 and so on
The routine gendisc2 does this. Check gendisc2(100000,0:10,dbinom(0:10,10,0.65)) and compare its speed to the others. rbinom is still much faster.
Example Generate observations from X~G(0.01)
Her we have the additional problem that the vector p is infinite. Computers cannot handle infinitly large objects, so we need to "truncate" p. Here is one way to do this:
1) Find x1 and x2 such that P(x1<X<x2)=0.999999
x1=qgeom(0.0000005,0.01)+1=1 and x2=qgeom(0.9999995,0.01)+1=1443
If U<0.0000005+P(X=1)=0.0100005, set x=1
If U>0.9999995, set x=1444
otherwise do as above.
c for all j such that pj > 0. Then
Example : Say we want to generate a r.v X with values x in {1,3,5,7} and probabilities p=c(0.1,0.5,0.1,0.3). First we need a r.v. Y which is easy to generate and takes 4 values (not necessarily the same as in X though!). We can use for this the r.v. that chooses a number from 1 to 4 at random, using the sample(1:4,1) command. This has pmf q=(1/4,1/4,1/4,1/4). , so p/q=(0.4,2,0.4,1.2) and if we set c=2 we have pj/qj
c for all j.
with this the accept-reject algorithm for this problem is implemented in gendisc3
Why this works: if our "candidate" r.v Y picks a value y which has a high probability in p, it will often be accepted and we get many of these. If on the other hand Y picks a value which has a low probability in p, it will rarely be accepted and we get only a few of those. The method is illustrated in accrej.ill
Theorem
The accept-reject algorithm generates a r.v. X such that P(X=xj) = pj.
In addition, the number of iterations of the algorithm needed until X is found is a geometric r.v. with mean c.
proof
Now each iteration is a Bernoulli trial with success probability 1/c, and successive trials are independent. Therefore the number of trials needed until the first success is a geometric r.v. with mean c. Also
Example : say we want to generate r.v.'s X~Bin(n,p). Then X
{0,1,..,n}, so we could again use Y~U{0,..,n}. What is c? qj=1/(n+1) so pj/qj=(n+1)pj. We need the maximum of the pj's. Here is one way to find it:
so pk<pk+1 if k<(n+1)p-1 and pk>pk+1 if k>(n+1)p-1 so pk is increasing until k~(n+1)p-1, and then it is decreasing, so it has a maximum at k~(n+1)p-1 If (n+1)p-1 is an integer and k=(n+1)p-1, then pk=pk+1, otherwise it has a unique maximum at k=floor((n+1)p) either way k=floor((n+1)p) works. So
for n=10, p=0.65 we find k=floor(11*0.65)=7 and p7/q7=2.775
Now say we already have a routine that can generate Y~Bin(n,1/2). Should we use this one instead?
Now both would work, but according to the theorem we accept a value of Y for X on average every c trials, and of course the more often we accept one the faster we get our X's. For the uniform we found c=(n+1)dbinom(floor((n+1)p),n,p). Y would be better if its c, say c', is smaller. Now

for n=10, p=0.65 we find c'=(2×0.65)10=13.8>2.775, so the uniform is better. The next graph draws c and c' for p from 0.4 to 0.6, which shows that Bin(n,1/2) would be better if p
(0.45,0.55)

of course we have ignored the question of how long it takes to generate an observation from U{0,..,n} or Bin(n,1/2), In real life this also would need to be considered.
Example : say we want to generate r.v.'s X such that P(X=k)=6/(π2k2), k=1,2, ...
Recall
We need a r.v Y on {1,2,..} which we can generate. There are two discrete rv's we know which have infinitely many values, the geometric and the Poisson. But there is a problem with those two:

In both cases c is infinite because the probabilities P(Y=j) go to 0 much faster than P(X=j), so that pj/qj=P(X=k)/P(Y=k)→∞.
We do know, though, a continuous r.v. that goes to 0 very slowly, namely the Cauchy. Actually, its density f(x)=1/(π(1+x2)) already has the right "size" for our problem. We can do this, then by "discretizing" the Cauchy: Let Z~Cauchy and define Y=i(I[-i,-i+1](Z)+I[i-1,i](Z)) for i=1,2,.. So
if |Z|<1 → Y=1
if 1<|Z|<2 → Y=2
and so on. Then
Now we need max{pj/qj}. Doing this via calculus would be quite difficult because we would end up with a nonlinear equation which we would need to solve numerically. A different approach is to just plot j vs. pj/qj and see what it looks like. This is done in gendisc4 with n=100 and findc=T. We see that pj/qj appears to approach a limit a little below 1 as j goes to infinity, but it has a value of 1.216 at j=1, so we can reasonably guess that c=1.216 should work for us. (Of course we can verify all that by taking derivatives and using de L'Hospital's rule)
With this we can generate these r.v.'s, again in gendisc4.
Alternatively we could of course have used the inverse transform method, but notice that in this r.v. occasionally we see very large values, and so the inverse transform method would be quite slow..
We have the same theorem as for the discrete case:
Theorem
The accept-reject algorithm generates a r.v. X with density f.
In addition, the number of iterations of the algorithm needed until X is found is a geometric r.v. with mean c.
proof same as above.
Note: we should always use Y such that g(x)>0 iff f(x)>0 (we say X and Y have the same support). Otherwise
if x is such that g(x)=0 and f(x)>0 we would never generate a Y=x, but we should
if x is such that g(x)>0 and f(x)=0 we occasionally generate a Y=x, but we never accept it because U>f(Y)/(cg(Y))=0 always, so we just waste an iteration.
Example : generate a r.v. X with density f(x)=6x(1-x) 0<x<1.
Here we can use Y~U[0,1], with g(x) = 1. Let's find max{f(x)/g(x) | 0<x<1}. Taking derivatives we find d/dx {6x(1-x)} = 6-12x = 0, x=1/2. This is a maximum (?) so we have max{f(x)/g(x) | 0<x<1} = f(1/2) = 3/2 = c
Note: f(x)/(cg(x)) = 6x(1-x)/(3/2*1) = 4x(1-x)
The routine is implemented in gencont1
why the accept-reject algorithm works is illustrated on this example in accrej1.ill
Can we speed up the generation of the data, even if all we know is how to generate uniforms? Of course if Y~U[0,1] then aY+b~U[a,b], so how about this: divide [0,1] into 3 equal-length intervals (0,1/3), (1/3,2/3), (2/3,1) and let Y be as follows: first it randomly picks one of these intervals, then it picks a point in the interval at random. Let Z be such that P(Z=i)=pi, then Yi=Y|Z=i~U((i-1)/3,i/3), i=1,2,3

Note
draw.dens() draws the density and show why the maxima are as above.
How do we find the best p1, p2 and p3? Let's simplify the problem for a moment and consider min(max{a/p,b/(1-p}), which is drawn by minmax(). Clearly the optimal choice is when we

In gencont3() we generate the data using this method. Note that p=c(1,1,1) just uses U[0,1] as above.
How about using k intervals? let's assume k is odd, then

implemented in gencont4
Example let's say we want to generate observations from f(x)=(r+1)xr, 0<x<1, r>0. We could use the same idea as above:

see gencont6()
In the above we alwasy use intervals of equal length 1/k. How about choosing the intervals in an optimal way? let's try just k=1, so we have intervals (0,t) and (t,1) for some t. Now

see gencont7()
The case of k intervals is doable, but leads eventually to a system of k non-linear equations which need to be solved numerically.
Example : generate a r.v. X ~ Γ(3/2, 1)
Now f(x)>0 for x>0, so we need a Y that "lives" on (0,
) and that we already know how to generate, for example an exponential r.v. We know how to do this: if U~U[0,1] and λ>0 then Y=-λlog(U) ~ E(1/λ)
What value of λ should we use? Note that EX = 3/2, EY = λ, so maybe λ=3/2 is a good idea.
with this we need to find c. Again we will try to find max{f(x)/g(x)}. We have
and so
The routine is implemented in gencont2
Above we picked λ=3/2 because this matched up the means of X and Y, a reasonable choice. But is it an optimal choice? Let's see. Optimal here means a choice of λ that minimizes c. Now if we repeat the above calculation with λ instead of 3/2 we find

This is the minimum (?) and so our choice was in fact optimal.
as one might expect, generating data from random vectors is generally harder than for one dimensional random variables. To begin with though, if the variables are discete all we need to do is realize that a discrete random vector is is just a "big" discrete random variable:
Example generate data for the rv with joint pmf

to do this generate variates from a random variable Z with

and then just recode: Z=1→X=0,Y=0, Z=2→X=0,Y=1,..,Z=6→X=2,Y=1
In general the Accept-Reject algorithm works just as before:
Example say we want to generate data from a continuous rv (X,Y,Z) with f(x,y,z) = 4xy 0≤x,y,z≤1
Here we can generate data from (U1,U2,U3) = 1 on [0,1]3. Now
c=max{f(x,y,z)/g(x,y,z)}=max{4xy/1}=4.
One problem in the multivariate case is to make sure that our program generates the correct data. One idea is to check the histograms of the marginals, but of course this is not sufficient proof that there is no mistake. Here the marginals are given by

The algorithm is in gen_xyz(1).
In this case of course X,Y and Z are independent because f(x,y,z)=fX(x)·fY(y)·fZ(z), so we can just generate data from the marginals, see gen_xyz(2).
Example say we want to generate data from a continuous rv (X,Y) with f(x,y)=8xy, 0≤x<y≤1.
We can generate data (U1,U2) uniform on 0≤x<y≤1 by generating (U1,U2) on [0,1]2 but only using it of U1<U2 . Then of course g(x,y)=2 if 0≤x<y≤1 and

implemented in gencont5(1)
Another idea is to generate the data from conditional distributions: we already know fY(y)=4y3, 0<y<1 and fX|Y=y(x|y)=2x/y2, 0<x<y. So we can first generate Y and then X|Y=y as follows

implemented in gencont5(2)
Example Say (X,Y) is a discrete rv with joint pmf f(x,y)=(1-p)2px, x,y
{0,1,..}, y≤x, and 0<p<1. Generate data from (X,Y)
Note that

so we have the Y=G-1 and X|Y=y = G+y-1, where G~Geom(1-p). The method is implemented in gen_px
Example (umbrella problem) we have previously shown that for the Markov process that describes the umbrella problem the stationary distribution is given by the stationary distribution is π0=q/(q+r), πi=1/(q+r) i=1,..,r. Let's write a routine that generates data from this process and finds an estimate of the stationary distribution. We have the transition matrix
See umbrella()
Example say we have a 10-dimensional rv with joint pdf f(x1,..,x10)=c∏xi, 0<x1<x2<..<x10<1. Generate data from this rv.
For the methods we know sofar we need c:

so this is ugly. Doable, but ugly. Of course, if we needed this for 100 instead of 10... It turns out, though, that we can actually generate such data even without knowing c, but this discussion has to wait a bit.