1) Analytically via calculus
2) Numerically
3) Stochastically via simulation
The first is of course preferred but often not possible, the second is usually the best option for low dimensional problems and the third often works best for high-dimensional ones.
One of the most important "objects" in Statistics is the likelihood function" defined as follows: Let X=(X1, .., Xn) be a random vector with joint pdf f(x|θ). then the likelihood function L is defined as
L(θ|x)=f(x|θ)
This must be one of the most deceivingly simple equations in math, actually it seems to be just a change in notation: L instead of f. What really matters and makes a huge difference is that in pdf we consider the x's as variables and the θ as fixed, in the likelihood function we consider the θ as the variable and the x's as fixed. Essentially we consider what happens after an experiment is done, that is what the data can tell us about the parameter.
Things simplify a bit if X1, .., Xn is an iid sample. Then the joint density is given by f(x|θ)=∏f(xi|θ).
The likelihood function is fundamental in both frequentist and Bayesian statistics. For example, we have the "likelihood principle" which states that if for any two experiments we have the same likelihood function we should end up with the same answers. Bayesian statistics adheres to this principle, frequentist methods sometimes violate it.
Example X1, .., Xn~Ber(p):
Example X1, .., Xn~N(μ,σ):
Example X1, .., Xn~Γ(α,β):
Example Y1~N(μ1,σ1), Y2~N(μ2,σ2), Z~Ber(p) and X=(1-Z)Y1+ZY2
An important problem in Statistics is parameter estimation, that is given data X1, .., Xn and a model (pdf or pmf), what is a good guess for the parameters? A very popular method is to find the parameter values that maximize the likelihood function:
Example
X1, .., Xn~Ber(p). First note that if a positive function f has a maximum at x=x0 then function log(f) has a maximum at x=x0 as well, so

(worry about Xi=0 for all i or Xi=1 for all i yourself)
The log-likelihood function is drawn in opt1(1,p=0.5)
Example X1, .., Xn~N(μ,σ):
The log-likelihood function is drawn in opt1(2,mu=0,sig=1)
Example X1, .., Xn~Γ(α,β):
Now it is no longer possible to find the mle's analytically, but at least we can solve for β in terms of α, so we have just a one-dimensional problem.
The log-likelihood function is drawn in opt1(3,alpha=1,beta=1)
Example Y1~N(μ1,σ1), Y2~N(μ2,σ2), Z~Ber(p) and X=(1-Z)Y1+ZY2
This system is of course completely unsolvable analytically.
The log-likelihood function is drawn in opt1(4,p=0.5,mu=0,sig=1,mu2=2,sig2=1)
So, what numerical methods are available to find the mle's? There is a long list of methods known, unfortunately there is not a single best method. The most widely used one is probably the Newton-Raphson method: say we want to solve the system of equations hi(x)=0, i=1,..,n. Pick a starting point (x1,..,xn) and find new points:

note that if this converges we have xnew=x, so x=x-H-1(x)hT(x)=0 so h(x)=0
Of course for finding maxima or mimima we have to solve the equation f'(x)=0
Example say f(x)=sin(πx), so h(x)=f'(x)=πcos(πx) and H(x)=f''(x)=-π2sin(πx)
see opt2(1,0.2)
Example f(x,y)=exp(-x2-3y2)
This function is visualized in 3D in opt2(3) You can rotate the graph by changing the theta argument.
see opt3(2,c(0.1,0.2))
One problem with the Newton-Raphson algorithm is that it is very sensitive to the starting point, so for example see opt3(2,c(0.1,0.3)) already fails to find the maximum. There are numerous extensions, for example xnew=x-λH-1(x)hT(x)
for some λ, see opt3(2,c(0.1,0.3),lambda=0.1)
Example X1, .., Xn~Γ(α,β)

and this is getting pretty ugly. Thankfully R has several useful functions already built in:
and we can use the fact that at the maximum β=
/α
see opt3()
Example consider the function f(x)=[cos(50x)+sin(20x)]2 on [0,1], drawn by opt4(). Clearly the Newton-Raphson algorithm would need an almost perfect starting point, otherwise it will almost always get stuck in a local maxima or minima. In one dimension we can usually do some exhaustive search, but that is already very slow. In this case one would need to evaluate this function several 1000 times.
Example How about a genuinely hard problem? Consider the function f(x,y)=(xsin(20y)+ysin(20x))2cosh(sin(10x)x)+(xcos(10y)-ysin(10x)2cosh(cos(20y)y) on [-1,1]2, see opt5(1) and opt5(2)