The sequence of r.v. X1, X2, .. is said to be a Markov chain if for any event A we have P(Xn
A | X1 = x1, .., Xn-1=xn-1) = P(Xn
A | Xn-1 = xn-1), that is Xn depends only on Xn-1 but not on any of the r.v. before it.
Example : (Random Walk) Say we flip a coin repeatedly. Let the random variable Yi be 1 if the ith flip is heads, -1 otherwise. Now let Xn = ∑ni=1 Yi
Clearly we have

If we think of the index n as a time variable, then all that matters for the state of the system at time n is where it was at time n-1, but not on how it got to that state.
The random walk above is a discrete-time stochastic process because the time variable n takes discrete values (1,2, etc.) There are also continuous time stochastic processes {X(t),t≥0} . Our random walk example is also a discrete state space stochastic process because the r.v. Xn can only take countably many values (0, ±1, ±2 etc.). Other stochastic processes might have a continuous state space.
For a Markov chain (that is a stochastic process with a discrete state space) all the relevant (probability) information is contained in the probability to get from state i to state j in k steps. For k=1 this is contained in the transition matrix P = (pij), and in fact as we shall see P is all we need.
Example : (Random Walk, cont)
Here we have pij = 1/2 if |i-j|=1, 0 otherwise
Example : (Asymmetric Random Walk) As above the state space are the integers but now we go from i to i+1 with probability p, to i-1 with probability q and stay at i with probability 1-p-q.
Example : (Ehrenfest chain)
Say we have two boxes, box 1 with k balls and box 2 with r-k balls. We pick one of the balls at random and move it to the other box. Let Xnbe the number of balls in box 1 at time n. First we have Xn
{0,1,..,r}. Now
say Xn=k, so there are k balls in urn 1, therefore r-k balls in urn 2. In the next step we either move a ball from urn 1 to urn 2, or vice versa, so Xn+1=k+1 or Xn+1=k-1. Now
pk,k+1 = P(Xn+1=k+1|Xn=k) = P(move ball from urn 2 to urn 1 | k balls in urn 1) = P(pick one of the r-k balls in urn 2) = (r-k)/r
also
pk,k-1 = 1-pk,k+1 = k/r and pi,j = 0 otherwise.
Ehrenfest used this model to study the exchange of air molecules in two chambers connected by a small hole.
Example (Umbrella) Say you own r umbrellas, which are either at home or in your office. In the morning if it rains you take an umbrella, if there is one at home, equally in the evening in the office. Say it rains in the morning or in the evening independently with probability p. Analyze this as a Markov chain and find the transition matrix.
Solution 1: Say Xn is the number of umbrellas at home in the morning of the nth day, then Xn
{0,1,..,r}. Now
P(Xn=i|Xn-1=i) = P(it is raining in the morning and evening or it is not raining in the morning and evening) = p2+q2, 1≤i≤r
P(Xn=i-1|Xn-1=i) = P(it is raining in the morning but not in the evening) = pq, 1≤i≤r
P(Xn=i+1|Xn-1=i) = P(it is not raining in the morning but it is raining in the evening) = qp, 1≤i≤r-1
P(Xn=0|Xn-1=0) = P(it is not raining in the evening) = q
P(Xn=1|Xn-1=0) = P(it is raining in the evening) = p
P(Xn=r|Xn-1=r) = P(it is not raining in the morning or it is raining both times) = q+p2
so
Solution 2: Say Xn is the number of umbrellas at her present location, then Xn
{0,1,..,r}. Now
P(Xn=r|Xn-1=0) = P(no umbrellas where she was) = 1
P(Xn=r-i|Xn-1=i) = P(it is not raining) = q, 1≤i≤r
P(Xn=r-i+1|Xn-1=i) = P(it is raining) = p, 1≤i≤r

both of these describe the "experiment"
Say we have a Markov chain Xn, n=1,2,.. with transition matrix P. Define the n-step transition matrix P(n) = (p(n)ij) by p(n)ij = P(Xn=j|X1=i). Of course P(1) = P. Now
Example : (Ehrenfest chain)
Let's find the 2-step transition matrix for the Ehrenfest chain with r=3. The transition matrix is given by

and so the 2-step transition matrix is

For example , P(X3=2|X1=2) = 7/9 because if X1 = 2 the chain could have gone to 1 (with probability 2/3) and then back to 2 (with probability 2/3) or it could have gone to 3 (with probability 1/3) and then back to 2 (with probability 1), so we have P(X3=2|X1=2) = 2/3*2/3 + 1/3*1 = 7/9.
Example (Umbrellas)
Finding P2 for solution 1 (in this generality) is difficult, but for solution 2 we have

In order to find P(n) we could just find PPP..P n-times. With a little linear algebra this becomes easier: For many matrices P there exists a matrix U and a diagonal matrix D such that P=UDU-1. Here is how to find U and D:
First we need to find the eigenvalues of the matrix P, that is we need to find the solutions of the equations Px=λx. This is equivalent to (P-λI)x=0 or to det(P-λI)=0. So we have:

The D above now is the matrix with the eigenvalues on the diagonal. The columns of the matrix U are the corresponding eigenvectors (with Euclidean length 1), so for example

Of course we have det(P-λI)=0, so this system is does not have a unique solution. Setting x1=1 we can then easily find a solution x=(1,-1,1,-1).
This vector has Euclidean length √(12+(-1)2+12+(-1)2) = 2, so the normalized eigenvector is x=(1/2,-1/2,1/2,-1/2)
Similarly we can find eigenvectors for the other eigenvalues. Alternatively (and a lot easier!) we can use the R function eigen to do the calculations for us.
Why does this help in computing P(n)? It turns out that we have P(2) = PP = UDU-1UDU = UDDU-1 = UD2U-1, and

and in general we have P(n) = UDnU-1
The routine ehrenfest with which=1 computes P(n) for the Ehrenfest chain
Note 
so λ=1 is always an eigenvalue of a transition matrix P.
Example Umbrella, solution 2 and r=2, then
For the eigenvectors we have

and in this generality that's about it
An important consequence of the Markov property is the fact that given the present the past and the future are independent. This is formalized in the Chapman-Kolmogorov equation:

There are a number of properties of Markov chains. Here are some:
1) A Markov chain is said to be irreducible if for each pair of states i and j there is a positive probability that starting in state i the chain will eventually move to state j.
Both the two random walks and the Ehrenfest chain are irreducible.
Example : (Casino) Consider the following chain: you go to the Casino with $10. You play Blackjack, always betting $5. Let Xn be your "wealth" after the nth game. Then Xn
{0,5,10,15,..} and P(Xn+k = j|Xk=0) = 0 for all n > 1. ("0" is called a coffin or absorbing state). So this chain is not irreducible.
2) A Markov chain is said to be aperiodic if for some n ≥0 and some state j we have
P(Xn = j|X1=j) > 0 and
P(Xn+1 = j|X1=j) > 0
In other words there should be a chance to return to state j in either n steps or in n+1 steps.
Random walk I and the Ehrenfest chain are not aperiodic because it is only possible to return to the same state in an even number of steps, but not an odd number. Random Walk II is aperiodic.
3) A state x of a Markov chain is said to be recurrent if P(the chain returns to x infinitely often) = 1. A Markov chain is called recurrent if all its states are recurrent. A state that is not recurrent is called transient.
A recurrent state i is said to be positive recurrent if starting at i the expected time until the return to i is finite, otherwise it is called null recurrent.
It can be shown that in a finite-state chain all recurrent states are positive recurrent.
Example : The Ehrenfest chain is clearly recurrent. In the Casino example "0" is a recurrent state, the others are not. Are the random walks recurrent? Good question! It seems clear that the asymmetric r.v. is not (if p
0.5), because eventually one expects the walk to run off to infinity (or - infinity). How about Random Walk I? Actually let's consider a more general problem
Example : (Random Walk III) let the state space be the lattice of integers on Rd, that is Xn =(i1, .., id) for ij any integer. Then the chain goes from one point on the lattice to any of the 2d points that differ by one in one coordinate with probability 1/2d.
One of the great results in probability states:
The simple random walk is recurrent if d≤2, transient otherwise, or as Kakutani once said "A drunk man will find his way home but a drunk bird may get lost".
4) Positive recurrent aperiodic states are called ergodic.
Let S be the state space of a Markov Chain X with transition matrix P. Let π be a "measure" on S. Then π is called a stationary measure of X if πTP=πT.
We won't discuss exactly what it means for π to be a "measure". You can think of it in the same way as a probability distribution, only that we don't have ∑πi=1.
Note: πTP=πT iff (PTπ)T=πT iff PTπ=π iff (PT-I)π=0, so again the system of equations is singular.
Example : (Ehrenfest chain) To find a (?) stationary measure we have to solve the system of equations
πj = ∑i πiPij i=0,1..,r
often we can get unique solution by requiring that π be a proper probability distribution, that is that ∑πi = 1
Here this means the system
| π0 = 1/3π1 |
| π1 = π0+2/3π2 |
| π2 = 2/3π1 + π3 |
| π3 = 1/3π2 |
| π0 + π1 + π2 + π3 = 1 |
The interpretation is the following: Say we choose the initial state X0 according to π, that is P(X0=i) = πi. Then πi is the long-run proportion of time the chain spends at state i, that is πi = lim ∑Nk=1 I[Xn=i]/N.
Example : For the Ehrenfest chain this is illustrated in ehrenfest with which=2.
Example : (Umbrellas) For solution 1 the system of equations is

and so on shows x=c(q,1,..,1) solves the system. Now q+1+..+1=q+r, so the stationary distribution is π0=q/(q+r), πi=1/(q+r) i=1,..,r
For solution 2 we have
and we see that we get the same stationary distribution as in solution 1.
So, what percentage of times do you get wet? Clearly this is P(no umbrella and it rains) = qπ0=q2/(q+r)
Example : (Random Walk) Let S be the integers and define a Markov chain by pi,i+1 = p and pi,i-1 = q = 1-p. A stationary measure is given by πi=1 for all i because (πP)i = 1p+1q = 1.
Now assume p≠q and define πi =(p/q)i. Then

Note that this shows that stationary measure are not unique.
One use of the stationary distribution is an extension of the WLLN to Markov chains. That is, say h is a function on the state space, then

where Z is a r.v. with pmf π.
This is illustrated in ehrenfest with which=3 for h(x)=x, h(x)=x2 and h(x)=log(x+1)
One of the main results for Markov chains is the following:
If the Markov chain {Xn} is irreducible and ergodic, thenn
Example : Of course this result does not apply to the Ehrenfest chain, which is not aperiodic, but the result holds anyway as we have seen.
Here is another property of Markov chains: A Markov chain is said to be time-reversible if πiPij=πjPji for all i
j. It can be shown that for a time reversible Markov chain if the chain is started from π and run backwards in time it again has transition matrix P.
Example : The Ehrenfest chain is time-reversible. We will show this for the case i=k, j=k+1:
If we let Xn denote your "fortune" after n rounds {Xn} is a Markov chain on {0,1,..,N} with transition probabilities
p00=pNN=1
pi,i+1=p and pi,i-1=q for i in {1,..,N-1}
Also we X0=i
Let Pi denote the probability that, starting at i the fortune will eventually reach N. We have:

Note that PN=1 and that the formula above also holds for i=N, so we have

The main "trick" in this calculation was to condition on the "right" event (here X1). This is often the case when doing math with Markov chains.
Say in our example playing roullette you start with $100. In gambler with which=1 we plot the probability of reaching N before going broke.
Is it the same probability to start with $100 and reach $110 or to start with $200 and reach $220? The answer is no, use gambler with which=2 to verify.
Instead of playing roullette you play Blackjack. You are really good at it, so now p=0.5 (almost). How do the answers to the questions above change?
When I go to the Casino I don't expect to win money. I go for the entertainment of gambling. For me a successful evening is when I don't loose my money to fast, that is I want to maximize the time I spend in the Casino. So how much time can I expect to spend in the Casino if I start with $100 and I will leave if I either win $100 or if I go broke? In gambler with which==3 and p=0.5 we run a simulation to estimate the mean time until I have to leave. Note that a game of blackjack costs $5 each, so we use i=20 and N=40.
Example The two-state process
Here Xn takes only two possible states, coded as 0 and 1. Therefore the transition matrix is given by

Now

For the stationary distribution we find

Finally