Next: 6 Results Up: Estimation of Objects In Previous: 4 The Likelihood

5 Optimization

The MAP estimate of the object variables is found by maximising the posterior density. Iterative numerical methods are required, which can be grouped in two general categories:

Deterministic methods (such as simplex [ 14 ]) provide an estimate of a local probability maximum by stepping through the function space in directions intended to locate a maximum. They can provide an estimate of the uncertainty in the neighbourhood of that maximum, obtained by quadratic approximation of the function in that region. Such methods may not find the global maximum if the posterior density is multimodal. The estimate of uncertainty may be inaccurate if a quadratic approximation is inappropriate.

Stochastic methods (such as Markov chain Monte Carlo [ 2 ]) may be better able to find a global maximum and can more thoroughly estimate the posterior distribution. These methods are better suited to problems where the posterior density is complicated. However, they are more computationally expensive than deterministic methods.

The technique of Markov chain Monte Carlo overcomes many of the problems faced by deterministic algorithms. Beginning at an arbitrary point, this algorithm proceeds in a series of stochastic jumps through the variable space. The jumps are defined by a proposal distribution for moving to a new point with a density which can depend on the current point .

Random steps are proposed and the posterior density at the proposed location is compared with that at the current location. The proposed value is accepted with probability given by min where p depends on the ratio of the current and proposed posterior densities:

For maximum efficiency, the proposal distribution should be commensurate with the posterior distribution. In many applications, symmetry ensures that the density disappears from Eq. 6 . A common proposal distribution for a scalar variable is a normal density centred on the current variable value. It is often more convenient to update one component of at a time rather than all components simultaneously, although certain subsets of variables may naturally form groups which it seems sensible to update jointly. In our analysis, we update the two location variables jointly and all others individually.

Because of its stochastic nature, the algorithm will often climb uphill, but will sometimes climb downhill. Local maxima do not, therefore, present an impassable barrier provided the proposal distribution has a width comparable to or greater than the typical width of such maxima. Also, a standard MCMC algorithm is not intended to terminate at a local or global maximum but will continue to traverse the variable space.

The result of an MCMC analysis is a sequence of values representing an estimate of the posterior distribution. Since the chain is started from an arbitrary value, which may be in an outlying region of the posterior distribution, it is common practice to permit a burn-in period where the sequence is allowed to move from the starting value to a more central position within the posterior. To accelerate this stage, a large proposal variance may be used so that the chain will traverse the variable space quickly and avoid becoming trapped in a local probability maximum. The section of chain during this burn-in period is discarded when drawing any inference regarding the posterior distribution. We adopt the point with highest posterior probability found during the burn-in as a starting point for the main sequence of MCMC simulations; the highest point found in the main sequence provides a natural estimate of the MAP value.

The proposal distribution used in this analysis had a normal density centred on the current variable value with standard deviations in the main sequence of 5 pixels (for location and scale) and 3 degrees (for orientation). During burn-in, these standard deviations were a factor 5 higher. The binary values of reflection were proposed with equal probability, and the proposals for the shape variables were normally distributed with the same variability observed in the training data.

There are a number of difficulties with MCMC analysis. In particular there are the problems of convergence and independence. The chain may not reach the posterior distribution within the allocated time. Also, the values in the chain are by construction highly autocorrelated and an estimated posterior distribution obtained from a finite length sequence may not be fully representative of the true posterior. Various diagnostic tools have been proposed to address these issues [ 4 ].

Next: 6 Results Up: Estimation of Objects In Previous: 4 The Likelihood

K De Souza
Fri Jul 11 14:17:05 BST 1997