In many face recognition applications the task is to locate faces in images, and identify them in a way which is robust with respect to changes in pose, expression, and lighting conditions. In this section we outline briefly an existing model-based approach to location and recognition, on which the current work is based.
Statistical modelling of facial appearance has proved a successful approach to coding and interpreting face images, also providing a useful basis for locating faces in images. Kirby and Sirovich [ 7 ] describe a compact representation of facial appearance, where face images are decomposed into weighted sums of basis images using a Karhumen-Loeve expansion. The patch containing the face is coded using 50 expansion coefficients from which an approximation to the original can be reconstructed. Turk and Pentland [ 9 ] describe face identification using this `Eigenface' representation. Lanitis et al. [ 8 ] describe the representation of both face shape and grey-level appearance; they use a Point Distribution Model (PDM) [ 3 ] to describe shape and an approach similar to Kirby and Sirovich [ 7 ] to represent shape-normalised grey-level appearance. More recently, Edwards et al. [ 4 ] have described the combination of shape and grey-level variation within a single statistical appearance model, which they call a Combined Appearance Model .
In each of the approaches mentioned above, a feature vector
which describes the facial appearance - either in terms of shape,
intensity or both - is represented by a combination of a small number of
parameters,
, which are assumed linearly independent. For example, in the Point
Distribution Model [
3
](PDM) a face shape is coded using
where
is an example of a shape,
is the mean shape over the training set and
is a matrix of the first
t
eigenvectors of the covariance matrix of the training set. If the
training set contains examples of different individuals obtained under
varying lighting conditions and showing a range of poses and
expressions, it is possible to approximate any plausible face shape by
choosing values of
, within limits derived from the training set. Since the eigenvectors
which form
are linearly independent it is possible to rearrange equation
1
to extract the shape parameters,
, for an example
, according to
A face-shape PDM can be used to locate faces in new images by using
Active Shape Model
(ASM) search. The mean shape is projected into the image and iteratively
modified to better fit the image evidence, subject to the shape
constraints represented by the model. At each step, the region around
each model point is searched for the best match to a local grey-level
model learnt during training. This gives a new proposed shape. The model
constraints are applied by using Equation
2
, then Equation
1
to find the closest approximation to the proposed shape consistent with
the model.
A Combined Appearance Model [
4
] can be generated from a set of examples as follows. First the shape
parameters for each example are calculated using Equation
2
. Next, a warping algorithm [
2
] is applied to each face patch to deform it to the mean shape; this
allows a model of the shape-free grey-level appearance to be built in
the form of Equation
1
. Finally, the extracted shape and grey-level model parameters for each
example are combined and a model is built, again in the form of Equation
1
, in which the parameters describe both shape and grey-level variation.
The final linear model accounts for correlations between shape and
grey-level variation and is more compact than a model which treats the
two separately. We have built such a model from a training set
containing a wide variety of individuals for a range of poses,
expressions and lighting conditions. Figure
1
shows the effect of varying the first few parameters of the Combined
Appearance Model.
Figure 1:
Effect of first few parameters of combined appearance model (+/- 3
standard deviations from mean
Given an example face we can extract the shape and shape-free grey-level parameters, and approximate the combined model parameters using Equation 2 . The reconstruction results obtained using the combined model parameters are shown in Figure 2 .
Figure 2:
Reconstructing faces using combined appearance parameters. For each face
the original is shown on the left, the reconstruction on the right.
Lanitis et al. [
8
] describe face recognition using shape and grey-level parameters. In
their approach the face is located in an image using Active Shape Model
search, and the shape parameters extracted. The face patch is then
deformed to the average shape, and the grey-level parameters extracted.
The shape and grey-level parameters are used together for
classification. As described above, we combine the shape and grey-level
parameters and derive Combined Appearance Model parameters, which can be
used in a similar classifier, but providing a more compact model than by
than considering shape and grey-level separately.
Given a new example of a face, and the extracted model parameters, the
aim is to identify the individual in a way which is invariant to
confounding factors such as lighting, pose and expression. If there
exists a representative training set of face images, it is possible to
do this using the Mahalonobis distance measure [
6
], which enhances the effect of inter-class variation (identity), whilst
suppressing the effect of between class variation
(pose,lighting,expression). This gives a scaled measure of the distance
of an example from a particular class. The Mahalanobis distance
of the example from class
i
, is given by
where
is the vector of extracted appearance parameters,
is the centroid of the multivariate distribution for class i, and
is the common within-class covariance matrix for all the training
examples. Given sufficient training examples for each individual, the
individual within-class covariance matrices
could be used - it is, however, restrictive to assume that such
comprehensive training data can be obtained.
The classifier described above assumes that the within-class variation
is very similar for each individual, and that the pooled covariance
matrix provides a good overall estimate of this variation. Edwards et
al. [
4
] used this assumption to linearly separate the inter-class variability
from the intra-class variability using Linear Discriminant Analysis
(LDA). The approach seeks to find a linear transformation of the
appearance parameters which maximises inter-class variation, based on
the pooled within-class and between-class covariance matrices. The
identity of a face is given by a vector of
Discriminant Parameters
,
, which ideally only code information important for identity. The
transformation between appearance parameters,
, and discriminant parameters,
is given by
where
is a matrix of orthogonal vectors describing the principal types of
inter-class variation. Having calculated these inter-class
modes of variation
, Edwards et al. [
4
] showed that a subspace orthogonal to
could be constructed which modelled only intra-class variations due to
change in pose, expression and lighting.The effect of this decomposition
is to create a combined model which is still in the form of Equation
1
, but where the parameters,
, are partitioned into those that affect identity and those that
describe within-class variation. Figure
3
shows the effect of varying the most significant identity parameter for
such a model; also shown is the effect of applying the first mode of the
residual (identity-removed) model to an example face. It can be seen
that the linear separation is reasonably successful and that the
identity remains unchanged.
Figure 3:
Varying the most significant identity parameter(top), and manipulating
residual variation without affecting identity(bottom)
Gareth J Edwards