p(q|x) = f(x|q)p(q)/m(x)
where m(x) is the marginal of the distribution of X, that is
Example : You want to see whether it is really true that coins come up heads and tails with probability 1/2. You take a coin from your pocket and flip it 10 times. It comes up heads 3 times. As a frequentist we would now use the sample mean as an estimate of the true probability of heads, p and find
= 0.3. But would we really believe that?
A Bayesian analysis would proceed as follows: let X1, .., Xn be iid Ber(p). Then Y= X1+..+ Xn is Bin(n,p). Now we need a prior on p. Of course p is a probability, so it has values on [0,1]. One distribution on [0,1] we know is the Beta, so we will use a Beta(a,b) as our prior. Remember, this is a perfectly subjective choice, and anybody can use their own. The joint distribution on Y and p is given by
which is known as the beta-binomial distribution.
Note that that (Y,p) is a random vector where one component is continuous (p) and the other is discrete (Y). So here we are combining a pdf with a pmf. It turns out that this is ok.
The posterior destribution of p given y is then
Of course we still need to "extract" some information about the parameter p from the posterior distribution. If we want to estimate p a natural estimator is the mean of the posterior distribution, given here by
B = (y+a)/(a+b+n)
This can be written as
where we see that the posterior mean is a linear combination of the prior mean and the sample mean.
How about our problem with the 3 heads in the 10 flips? Well, we have to completely specify the prior distribution, that is we have to choose a and b. The choice depends again on our belief. For example , if we feel strongly that this coin is just like any other coin and therefore really should be a fair coin we should choose them so that the prior puts almost all its weight at around 1/2. For example with a=b=100 we get Ep=0.5 and Vp=0.0016. Then
B = (3+100)/(100+100+10) = 0.4905 is our estimate for the probability of heads. Clearly for such a strong prior the actual sample almost does not matter, For example for y=0 we would have found
B = 0.476 and for y=10 it would be
B = 0.524.
Maybe we have never even heard the word "coin" and have no idea what one looks like, let alone what probability of "heads" might be. Then we could choose a=b=1, that is the uniform distribution, as our prior. Really this would indicate our complete lack of knowledge regarding p. (This is called an uninformative prior). Now we find
B = (3+1)/(1+1+10) = 0.3, which is just the sample mean again.
in bayescoin with which==1 we study the effect of the sample size on the estimate of p.
in bayescoin with which==2 we study the effect of alpha=beta on the estimate of p. A larger alpha means a prior more concentrated around 1/2.
Here is a very different, but probably more realistic, prior: Before we flip the coin we reason as follows: either the coin is fair, and we think that is most likely the case, or if it is not, we don't have any idea what it might be. We can "encode" this belief in the following prior:
Let d1/2 be the point mass at 1/2, that is a random variable which always takes the value 1/2, or P(d1/2 = 1/2) = 1. Let U~U[0,1] and let Z~Ber(a). Now p = Zd1/2 + (1-Z)U
So with probability a p is just 1/2 (and the coin is fair), and with probability 1-a p is uniform on [0,1] (and we don't have any idea what's going on).
This prior is a mixture of a discrete and a continuous r.v. so we will need to be a bit careful with the calculations.
The cdf and the prior "density" are as follows
The joint density of p and y is:
and the marginal distribution of y:
because the uniform is also a Beta(1,1) and we can use the result above.
Now we find the posterior distribution of p given y:
Finally as above we can find the mean of the posterior distribution to get an estimate of p for a given y:
In bayesex we run a simulation to check that indeed all these calculations are correct.
in bayescoin with which==3 we study the effect of the sample size on the estimate of p.
in bayescoin with which==4 we study the effect of alpha on the estimate of p. A larger alpha means a prior that puts more weight on the coin being fair..