Stochastic Optimization and Simulated Annealing

Direct Simulation

As a first approach, if we can relate a function to a density and then simulate from this density we can find maxima/minima via simulation.

Example consider the function f(x)=[cos(50x)+sin(20x)]2 on [0,1]. Here is a simple and fast way to get an idea of the maximum: generate U1,..,Un iid U[0,1], calculate Xi=f(Ui) and plot X vs U, see opt4(2)

Example consider again the function f(x)=[cos(50x)+sin(20x)]2 on [0,1]. Because it is a continuous function on a finite interval there exists a constant c such that c·f is a density. (of course c=(∫f(x)dx)-1). Moreover using the Hastings-Metropolis algorithm we don't even need to know c. One problem is to extract the maximum from the generated data. In opt4(3) i use a histogram to estimate the density and pick the maximum. In general a non-parametric density estimator would be better and would need far fewer points.

Example Consider the function f(x,y)=(xsin(20y)+ysin(20x))2cosh(sin(10x)x)+(xcos(10y)-ysin(10x)2cosh(cos(20y)y) using the simple simulation approach in opt5(3). The solution via the histogram/non-parametric density estimate again is doable but needs a bit of works.

Simulated Annealing

this algorithm was actually also introduced by Metropolis, in the same 1953 paper where he first showed the "Metropolis" version of the Hastings-Metropolis algorithm . The fundamental idea is that a change of scale, called temperature, allows for faster moves on the surface of the function f to maximize, whose negative is called the energy. Therefore rescaling partially avoids the trapping of the algorithm in local minima/maxima. Given a temperature parameter T>0 a sample of θ1(T), θ2(T),.. is generated from the distribution π(θ)=c·exp(f(θ)/T) . Notice that

π'(θ)=c·exp(f(θ)/T)f'(θ)/T=0 iff f'(θ)=0,

so π(θ) has a maximum iff f(θ) has a maximum. Moreover even if ∫f(x)dx=∞, ∫exp(f(x))dx is often finite and so there exists a constant c which makes π a density.

Here is one popular version of the simulated annealing algorithm:

1) simulate Y from a distribution with the same support as h, say with density g(|y-θi|)

2) accept θi+1=Y with probability p=min{exp[(f(Y)-f(θi))/Ti),1}, take θi+1i otherwise

3) update Ti to Ti+1

Notice the similarities between this algorithm and the Hastings-Metropolis one: in each case we draw observations from a "proposal distribution" which depends on the current state x, and accept it as a new observation for X with a certain probability.

Example f(x)=[cos(50x)+sin(20x)]2 on [0,1]. For this one implementation of the algorithm is as follows:

at iteration i the algorithm is at (xi,f(xi))
1) simulate U~U[ai,bi] where ai=max(xi-0.5,0) and bi=min(xi+0.05,1)

2) accept xi+1=U with probability
pi=min{exp[(f(U)-f(xi))/Ti],1)
otherwise set xi+1=xi

3) set Ti+1=1/log(i+1)

This is implemented in opt4(4)

Example For f(x,y)=(xsin(20y)+ysin(20x))2cosh(sin(10x)x)+(xcos(10y)-ysin(10x)2cosh(cos(20y)y) opt5(4,eps=1). implements the simulated anealing algorithm. Different values of eps change the temperature T.