

Now this markov chain will be time-reversible and have stationary measure p if
piPij=pjPji for all j
i.
This is equivalent to
piqijaij=pjqjiaji
and is easy to check that this will be satisfied if we set

One of the reasons this algorithm is so useful is the following: say we only know the values in p up to a constant, that is we have a sequence {bj,j=1,..m}, bj≥0 and ∑bj=B. We want to generate observations from p with pj=bj/B. Then the above algorithm works without the need to find B because

This is great because For example if we want to generate data from a posterior distribution B is the integral over the marginal distribution, which might be very difficult to find.
With this we get the Hastings-Metropolis Algorithm:
1) Choose an irreducible Markov chain with transition probabilities Q and choose some integer k between 1 and m
2) Let n=0 and X0=k
3) generate a r.v. X such that P(X=j)=qx0j and generate U~U[0,1]
4) If U <> bXqX,Xn/bXnqXn,X then NS=X, else NS=Xn
5) n=n+1, Xn=NS
6) Go to 3
Notice the similarities between this algorithm and the accept-reject method. The main differences, and the reason this algorithm is so useful, are that here we don't need to find c, which usually requires a maximization and we don't need B, as discussed above.
Example : Let's begin with a very simple Example , X~N(m,s2). First we need the "proposal" distribution Q. We are actually quite free to make almost any choice here. Let's try the following two: if we are at the point x we choose the next point from a
a) U[x-e,x+e] for some e>0.
b) N(x,e2) for some e>0.
For a) we have qxy=1 if x-e<y<x+e, 0 otherwise, and for b) we get qxy=dnorm(y,x,e). So the algorithm uses:
1a) X=runif(1,Xn-e,Xn+e])
1b) X=rnorm(1,Xn,e)
2a) bXqX,Xn/bXnqXn,X = dnorm(X,m,s)/dnorm(Xn,m,s)
2b) bXqX,Xn/bXnqXn,X = dnorm(X,m,s)dnorm(Xn,X,e)/(dnorm(Xn,m,s)dnorm(X,Xn,e))
This is implemented in mcmc1.
Try the following two cases:
1) m=0,s=1 and e=0.1
2) m=20,s=3 and e=1
Compare this algorithm, and its implementation, with the accept-reject algorithm. Here we needed practically no calculations at all.
The are two main difficulties with the MCMC method in practice:
1) It can take a lot of computational effort, For example if we want to generate just 1 variate at a time we still might have to generate 10000 others before the stationary distribution is reached.
2) It can be very difficult in practice to know when the stationary distribution is reached, that is when the "burn-in" period is over. There are Example s where the chain seems to have settled down for very long periods but is not actually at the stationary distribution yet.
Example : Let's consider again the normal mixture model and carry out a Bayesian analysis. To keep things simple we assume that m1, s1, m2 and s2 are known and the only parameter is a. An obvious prior on a is a beta distribution, and again to keep things simple we will use a~beta(t,t), that is we use a prior centered at 0.5. Then we have

Finding the exact posterior distribution means finding the marginal distribution which in this case is hopeless analytically. Fortunately we don't need it for the Hastings-Metropolis algorithm.
The routine is implemented in mcmc2. Compare the cases a=0.9, N=100, t=1, a=0.9, N=100, t=10 and a=0.9, N=1000, t=10,

j). Here is the algorithm:
j) Example : Say we have X~Bin(n,p) and p~Beta(a,b) and we want a sample from the posterior distribution p|X. Then the joint distribution of X and p is the beta-binomial distribution given by

To use the Gibbs sampler we need the conditional distributions of X|p and p|X:
1) X|p ~ Bin(n,p)
2) p|x ~ Beta(x+a,n-x+b)
so the Gibbs sampler is as follows:
1) Choose an initial value for p, say p(0)=0.5 and set k=1
2) X(k) ~ Bin(n,p(k))
2) p(k+1) ~ Beta(x(k)+a,n-x(k)+b)
This is implemented in mcmc3. Here we use the following strategy to get "independent" samples: run the sampler for a short period (B=10) and then start from scratch. For each run pick the last observation.
It can be shown that the Gibbs sampler is actually a special case of the Hastings-Metropolis algorithm: Let X = (X1, .., Xn) be a r.v. with probability mass function p(x) that need only be specified up to a multiplicative constant, and suppose that we want to generate a r.v. whose distribution is that of X. That is we want to generate a r.v. with mass function (or density) p(x) = Cg(x) where g(x) is known but C is not. Using the Gibbs sampler assumes that for any i and values xj, j
i, we can generate a r.v. X with pmf
P(X=x) = P(Xi=x|Xj=xj,j
i)
It operates the Hastings-metropolis algorithm on a Markov chain with states x = (x1, .., xn) and with transition probabilities defined as follows: Whenever the present state is x, a coordinate that is equally likely to be any of the 1,..,n is chosen. If coordinate i is chosen, then a r.v. X whose probability mass function is as above is generated and if X=x the state y=(x1, .., xn-1, x, xn+1,..,xn) is considered as the next candidate state. In other words, with x and y given the Gibbs sampler uses the Hastings -Metropolis algorithm with

Because we want the limiting distribution to be p the vector y is then accepted as the new state with probability

So in the Gibbs sampler the candidate state is always accepted as the next state.