The MAP estimate of the object variables
is found by maximising the posterior density. Iterative numerical
methods are required, which can be grouped in two general categories:
Deterministic methods (such as simplex [ 14 ]) provide an estimate of a local probability maximum by stepping through the function space in directions intended to locate a maximum. They can provide an estimate of the uncertainty in the neighbourhood of that maximum, obtained by quadratic approximation of the function in that region. Such methods may not find the global maximum if the posterior density is multimodal. The estimate of uncertainty may be inaccurate if a quadratic approximation is inappropriate.
Stochastic methods (such as Markov chain Monte Carlo [ 2 ]) may be better able to find a global maximum and can more thoroughly estimate the posterior distribution. These methods are better suited to problems where the posterior density is complicated. However, they are more computationally expensive than deterministic methods.
The technique of Markov chain Monte Carlo overcomes many of the problems
faced by deterministic algorithms. Beginning at an arbitrary point, this
algorithm proceeds in a series of stochastic jumps through the variable
space. The jumps are defined by a proposal distribution for moving to a
new point
with a density
which can depend on the current point
.
Random steps are proposed and the posterior density at the proposed
location is compared with that at the current location. The proposed
value is accepted with probability given by min
where
p
depends on the ratio of the current and proposed posterior densities:
For maximum efficiency, the proposal distribution should be commensurate
with the posterior distribution. In many applications, symmetry ensures
that the density
disappears from Eq.
6
. A common proposal distribution for a scalar variable is a normal
density centred on the current variable value. It is often more
convenient to update one component of
at a time rather than all components simultaneously, although certain
subsets of variables may naturally form groups which it seems sensible
to update jointly. In our analysis, we update the two location variables
jointly and all others individually.
Because of its stochastic nature, the algorithm will often climb uphill, but will sometimes climb downhill. Local maxima do not, therefore, present an impassable barrier provided the proposal distribution has a width comparable to or greater than the typical width of such maxima. Also, a standard MCMC algorithm is not intended to terminate at a local or global maximum but will continue to traverse the variable space.
The result of an MCMC analysis is a sequence of values representing an estimate of the posterior distribution. Since the chain is started from an arbitrary value, which may be in an outlying region of the posterior distribution, it is common practice to permit a burn-in period where the sequence is allowed to move from the starting value to a more central position within the posterior. To accelerate this stage, a large proposal variance may be used so that the chain will traverse the variable space quickly and avoid becoming trapped in a local probability maximum. The section of chain during this burn-in period is discarded when drawing any inference regarding the posterior distribution. We adopt the point with highest posterior probability found during the burn-in as a starting point for the main sequence of MCMC simulations; the highest point found in the main sequence provides a natural estimate of the MAP value.
The proposal distribution
used in this analysis had a normal density centred on the current
variable value
with standard deviations in the main sequence of 5 pixels (for location
and scale) and 3 degrees (for orientation). During burn-in, these
standard deviations were a factor 5 higher. The binary values of
reflection were proposed with equal probability, and the proposals for
the shape variables were normally distributed with the same variability
observed in the training data.
There are a number of difficulties with MCMC analysis. In particular there are the problems of convergence and independence. The chain may not reach the posterior distribution within the allocated time. Also, the values in the chain are by construction highly autocorrelated and an estimated posterior distribution obtained from a finite length sequence may not be fully representative of the true posterior. Various diagnostic tools have been proposed to address these issues [ 4 ].
K De Souza