Next: 3 Identification and Tracking Up: Learning to Identify and Previous: 1 Introduction

2 Background

In many face recognition applications the task is to locate faces in images, and identify them in a way which is robust with respect to changes in pose, expression, and lighting conditions. In this section we outline briefly an existing model-based approach to location and recognition, on which the current work is based.

2.1 Statistical Models

Statistical modelling of facial appearance has proved a successful approach to coding and interpreting face images, also providing a useful basis for locating faces in images. Kirby and Sirovich [ 7 ] describe a compact representation of facial appearance, where face images are decomposed into weighted sums of basis images using a Karhumen-Loeve expansion. The patch containing the face is coded using 50 expansion coefficients from which an approximation to the original can be reconstructed. Turk and Pentland [ 9 ] describe face identification using this `Eigenface' representation. Lanitis et al. [ 8 ] describe the representation of both face shape and grey-level appearance; they use a Point Distribution Model (PDM) [ 3 ] to describe shape and an approach similar to Kirby and Sirovich [ 7 ] to represent shape-normalised grey-level appearance. More recently, Edwards et al. [ 4 ] have described the combination of shape and grey-level variation within a single statistical appearance model, which they call a Combined Appearance Model .

2.2 Face Appearance Models

In each of the approaches mentioned above, a feature vector which describes the facial appearance - either in terms of shape, intensity or both - is represented by a combination of a small number of parameters, , which are assumed linearly independent. For example, in the Point Distribution Model [ 3 ](PDM) a face shape is coded using

where is an example of a shape, is the mean shape over the training set and is a matrix of the first t eigenvectors of the covariance matrix of the training set. If the training set contains examples of different individuals obtained under varying lighting conditions and showing a range of poses and expressions, it is possible to approximate any plausible face shape by choosing values of , within limits derived from the training set. Since the eigenvectors which form are linearly independent it is possible to rearrange equation 1 to extract the shape parameters, , for an example , according to

A face-shape PDM can be used to locate faces in new images by using Active Shape Model (ASM) search. The mean shape is projected into the image and iteratively modified to better fit the image evidence, subject to the shape constraints represented by the model. At each step, the region around each model point is searched for the best match to a local grey-level model learnt during training. This gives a new proposed shape. The model constraints are applied by using Equation 2 , then Equation 1 to find the closest approximation to the proposed shape consistent with the model.
A Combined Appearance Model [ 4 ] can be generated from a set of examples as follows. First the shape parameters for each example are calculated using Equation 2 . Next, a warping algorithm [ 2 ] is applied to each face patch to deform it to the mean shape; this allows a model of the shape-free grey-level appearance to be built in the form of Equation 1 . Finally, the extracted shape and grey-level model parameters for each example are combined and a model is built, again in the form of Equation 1 , in which the parameters describe both shape and grey-level variation. The final linear model accounts for correlations between shape and grey-level variation and is more compact than a model which treats the two separately. We have built such a model from a training set containing a wide variety of individuals for a range of poses, expressions and lighting conditions. Figure 1 shows the effect of varying the first few parameters of the Combined Appearance Model.

Figure 1: Effect of first few parameters of combined appearance model (+/- 3 standard deviations from mean

Given an example face we can extract the shape and shape-free grey-level parameters, and approximate the combined model parameters using Equation 2 . The reconstruction results obtained using the combined model parameters are shown in Figure 2 .

Figure 2: Reconstructing faces using combined appearance parameters. For each face the original is shown on the left, the reconstruction on the right.

2.3 Identification Using Statistical Models

Lanitis et al. [ 8 ] describe face recognition using shape and grey-level parameters. In their approach the face is located in an image using Active Shape Model search, and the shape parameters extracted. The face patch is then deformed to the average shape, and the grey-level parameters extracted. The shape and grey-level parameters are used together for classification. As described above, we combine the shape and grey-level parameters and derive Combined Appearance Model parameters, which can be used in a similar classifier, but providing a more compact model than by than considering shape and grey-level separately.
Given a new example of a face, and the extracted model parameters, the aim is to identify the individual in a way which is invariant to confounding factors such as lighting, pose and expression. If there exists a representative training set of face images, it is possible to do this using the Mahalonobis distance measure [ 6 ], which enhances the effect of inter-class variation (identity), whilst suppressing the effect of between class variation (pose,lighting,expression). This gives a scaled measure of the distance of an example from a particular class. The Mahalanobis distance of the example from class i , is given by

where is the vector of extracted appearance parameters, is the centroid of the multivariate distribution for class i, and is the common within-class covariance matrix for all the training examples. Given sufficient training examples for each individual, the individual within-class covariance matrices could be used - it is, however, restrictive to assume that such comprehensive training data can be obtained.

2.4 Isolating Sources of Variation

The classifier described above assumes that the within-class variation is very similar for each individual, and that the pooled covariance matrix provides a good overall estimate of this variation. Edwards et al. [ 4 ] used this assumption to linearly separate the inter-class variability from the intra-class variability using Linear Discriminant Analysis (LDA). The approach seeks to find a linear transformation of the appearance parameters which maximises inter-class variation, based on the pooled within-class and between-class covariance matrices. The identity of a face is given by a vector of Discriminant Parameters , , which ideally only code information important for identity. The transformation between appearance parameters, , and discriminant parameters, is given by

where is a matrix of orthogonal vectors describing the principal types of inter-class variation. Having calculated these inter-class modes of variation , Edwards et al. [ 4 ] showed that a subspace orthogonal to could be constructed which modelled only intra-class variations due to change in pose, expression and lighting.The effect of this decomposition is to create a combined model which is still in the form of Equation 1 , but where the parameters, , are partitioned into those that affect identity and those that describe within-class variation. Figure 3 shows the effect of varying the most significant identity parameter for such a model; also shown is the effect of applying the first mode of the residual (identity-removed) model to an example face. It can be seen that the linear separation is reasonably successful and that the identity remains unchanged.

Figure 3: Varying the most significant identity parameter(top), and manipulating residual variation without affecting identity(bottom)

Next: 3 Identification and Tracking Up: Learning to Identify and Previous: 1 Introduction

Gareth J Edwards
Thu Jul 10 11:17:39 BST 1997