The objective of the optimised robust correlation is to find the global
extremum in a multi-dimensional search space that corresponds to the
best match between a pair of images. This search space is defined by the
set of all valid geometric and photometric transformations. In our
implementation of the proposed method the geometric transformations are
translation, scaling and rotation
. Given a point in the multi-dimensional search space, a combined score
function is evaluated. This function and some of its properties are
described in Section
2.1
. To find the global optimum of the score function a search technique
based on random exponential perturbations is employed. This optimisation
method, which was shown to be particularly suitable for the given search
problem (see Section
3
), is discussed in Section
2.2
. Finally in Section
2.3
, a quasi-random sampling technique used for estimating the value of the
score function is outlined.
Given a transformation
t
, a match score
s
is computed as a weighted sum of two scores, an area score
and a grey level score
:
where
denotes a constant in the interval [0,1]
. The area score is included to encourage large overlaps and is defined
as
where
and
denote the sampling sets corresponding to the reference and test images,
and
, respectively. The derivation of this function is based on
probabilistic arguments [
8
]. The grey level score, which measures the similarity between the
intensity distributions, is defined as
where
denotes the robust kernel,
the intensity transformation,
the projection function and
the maximum response of the robust kernel. The kernel function
used in the experiments reported in Section
3
is a simple quadratic function and is defined as
where
and
denote the compared grey levels and
the cut-off distance. The latter defines the width of the kernel and
grey level differences greater than this constant will not contribute to
the final score. The intensity transformation
implements a linear mapping between grey levels in the reference and
test images. It is defined as
where
g
denotes a grey level. Pixels are projected from the reference image to
the test image using an affine projection function
where
p
denotes the pixel to be projected and
the result of the projection. The horizontal and vertical coordinates of
the projection are defined as
where
denotes the centre of gravity computed from the sampling set.
The search technique we employ is based on random exponential perturbations (see Algorithm 1 ). In each iteration, the transformation between reference and test image is perturbed by adding a random vector drawn from an exponential distribution. The new transformation is accepted only if the score was increased.
The approach described above is similar to simulated annealing [ 10 ] at zero temperature. Successful applications of simulated annealing within the areas of object detection and recognition have been reported in [ 9 ] and [ 1 ].
If the object under consideration can be adequately represented using only a fraction of the pixels then this is clearly advantageous since the execution time of the method is directly dependent on the number of samples used. In our application this is certainly the case since an image of a face contains a high degree of redundant information, and large regions can often be represented with only a few points.
A sampling technique commonly used in Monte Carlo integration is based
on Sobol sequences [
15
]. A Sobol sequence is a quasi-random sequence of numbers maximally
spread out over a given hyper-cube. The sequence is generated
number-theoretically, rather than randomly, and successive points at any
stage fill in the gaps in the previously generated distribution. The use
of Sobol sequences leads to faster convergence compared to uniformly
distributed random numbers since the fractional error of the
approximation decreases as
instead of
[
15
], where
N
is the number of samples and
d
the dimensionality of the approximated function. In Figure
1
, three Sobol sequences are shown.
Figure 1:
Two-dimensional Sobol sequences at three different sampling rates.
Combining the optimised robust correlation method with random sampling yields several benefits. The quasi-random nature of the process implies that the sampling points will not interfere with (and possibly cancel out) a specific frequency. In contrast, if sampling points are positioned on a grid, aliasing is much more likely. Furthermore, it is possible to continuously increase the sampling density -- until some convergence criterion is met -- and at the same time maintain approximately uniform density throughout the image. This is because the sampling points are avoiding the chance clustering that occurs with random points drawn from a uniform distribution. A direct consequence of this property is that Sobol sequences can be used for continuous multi-resolution matching. By varying the sampling density the image can be represented in different resolutions.
Kenneth Jonsson