Example : A census of all the students at the Colegio 10 years ago showed a mean GPA of 2.75. In our survey of 150 students we find today a mean GPA of 2.53. How much (if at all) has the GPA changed?
The problem of course is that the sample mean GPA depends on the sample, if we repeated our survey tomorrow with a different sample of 150 students, their mean GPA will not again be 2.53. But how far away from 2.53 might it be? Could it actually be higher than 2.75?
One way to answer such questions is to find an interval estimate rather than a point estimate.
Θ. Then (L(X), U(X)) is a 100(1-α) confidence interval for θ iff
P(L(X)≤θ≤U(X))=1-α
θ
Θ
Example say X1, .., Xn~N(μ,σ), then a 100(1-α)% confidence interval for the population mean is given by

First notice that the interval is given in the form point estimate ± error, which is quite often true in Statistics, although not always.
Example confidence interval for the mean (cont): Now
Example (GPA cont.)

and so our 90% confidence interval is (2.53-0.088, 2.53+0.088) = (2.442, 2.618)
What does that mean: a 90% confidence interval for the mean is (2.442, 2.618)? The interpretation is this: suppose that over the next year statisticians (and other people using statistics) all over the world compute 100,000 90% confidence intervals, many for the mean, others maybe for medians or standard deviations or ..., than about 90% or about 90,000 of those intervals will actually contain the parameter that is supposed to be estimated, the other 10,000 or so will not.
It is tempting to interpret the confidence inteval as follows: having found our 90% confidence interval of (2.442, 2.618), we are now 90% sure that the true mean GPA (the one for all the students at the Colegio) is somewhere between 2.442 and 2.618. Strictly speaking this interpretation is not correct because once we have computed the interval (2.442, 2.618) the true mean GPA is either in it or not. Nevertheless at least intuitively this interpretation is also useful.
The main property of confidence intervals is their coverage, that is just the equation above.
Example say X1, .., Xn~Ber(p), then by the CLT

so we have a candidate for a 100(1-α)% CI. But this is based on the CLT, so there is a question how large n needs to be for this to work. In cibernoulli we generate B=1000 datasets with n observations from a Ber(p), calculate the CI and check how many contain p, with p=0.01:0.01:0.99. As we can see, it works reasonably well if n is not to small and p is not to close to either 0 or 1.
Notice the ragged appearance of the coverage graph. This is typical for discrete rv's like the Bernoulli.
Θ. Then (L(x), U(x)) is a 100(1-α)% credible interval for θ iff
P(L(x)≤θ≤U(x)|X1=x1, .., Xn=xn)=1-α
Notice now the data appears in the conditional part, so this is a probability based on the posterior distribution of θ|X1=x1, .., Xn=xn
Example say X1, .., Xn~N(μ,σ)
T keep things simple we will assume that σ is known, so we just need a prior on μ. Let's say μ~N(μ0,τ) Then

so the posterior distribution of μ|X=x is again a normal.
How can we get a credible interval from this? The definition above does not determine a unique interval, essentially we have one equation for two unknowns, so we need an additional condition. Here are some standard solutions:
a) equal tail probability interval: choose L and U such that
P(L(x)<θ|X1=x1, .., Xn=xn)=α/2 and P(θ>U(x)|X1=x1, .., Xn=xn)=1-α/2
Example (GPA cont.) First we need to choose σ, μ0 and τ Let's use σ=0.65, μ0=3.0 and τ=1.0, then

and we find L(x)=qnorm(0.025,2.53132,0.052)=2.429 and U(x)=qnorm(0.975,2.53132,0.052)=2.633
b) highest posterior density interval. In addition to the first equation we also have
fθ(L(x)|X1=x1, .., Xn=xn)=fθ(U(x)|X1=x1, .., Xn=xn)
In our case this yields the same interval as in a) because the nromal density is symmetric around the mean.
c) quantiles from simulated data. If we can sample from the posterior distribution we can use these. Say Y1,..,Y1000
are 1000 simulated observations from μ|X=x, then (Y[25], Y[975]) is a 90% credible interval.
The main property of credible intervals is just the equation that defines them.
Example say X1, .., Xn~Ber(p), and let's use as a prior on p the U[0,1]. Then

Now if we use the equal tail probabilities method we find L(x)=qbeta(0.025,1+sum(x), n+1-sum(x)) and U(x)=qbeta(0.975,sum(x),n+1-sum(x))
In cibernoulli(2) we generate B=1000 p~U[0,1], then generate sum.x=∑x~Ber(p) and finally find the percentage of intervals (L(sum.x),U(sum.x)) that contain their respective p.
Example say X1, .., Xn~N(μ,σ) and we are interested in estimating μ and σ simultaneously. So we want to find a region A(x)
2such that
P((μ,σ)
A(X))=1-α
We already know that
and s are good point estimators of μ and σ. Morover it can be shown that 
s, so

confreg.norm shows that this indeed has the right level α. What does this region look like? confreg.norm(2) draws the region.