Next: 4 Conclusion Up: Learning to Identify and Previous: 2 Background

3 Identification and Tracking from Sequences

Separating a combined appearance model into a part that deals with ID and a part that deals with residual variation allows classification of ID independently of confounding factors. It also has potential for applications in model-based tracking of faces. Intuitively, we can imagine different dynamic models for each separate source of variability. In particular, given a sequence of images of the same person we expect the identity to remain constant, whilst lighting, pose and expression vary each with its own dynamics.
In practise, the separation between the different types of variation which can be achieved using LDA is not perfect. The method provides a good first-order approximation, but, in reality, the within-class spread takes a different shape for each. When viewed for each individual at a time , there is typically correlation between the identity parameters and the residual parameters, even though for the data as a whole , the correlation is minimised.
For example, we can reason that the correlation between pose and identity must be class specific because of the 3D structure of the head; the way in which the appearance of the nose changes with pose, depends partly on its length - a person-specific quantity, not derivable from a frontal view. Ezzat and Poggio [ 5 ] describe class-specific normalisation of pose using multiple views of the same person, demonstrating the feasibility of a linear approach. They assume that different views of each individual are available in advance - here, we make no such assumption. We show that the estimation of class-specific variation can be integrated with tracking to make optimal use of both prior and new information in estimating ID and achieving robust tracking.

3.1 Class-Specific Refinement of Recognition from Sequences

We describe in a class-specific linear correction to the result of the global LDA, given new examples of a face. To illustrate the problem, we consider a simplified synthetic situation in which appearance is described in some 2-dimensional space as shown in figure 4 . We imagine a large number of representative training examples for two individuals, person X and person Y projected into this space. The optimum direction of group separation, , and the direction of residual variation , are shown.

Figure 4: Limitation of Linear Discriminant Analysis: Best identification possible for single example, Z, is the projection, A. But if Z is an individual who behaves like X or Y, the optimum projections should be C or B respectively.

A perfect discriminant analysis of identity would allow two faces of different pose, lighting and expression to be normalised to a reference view, and thus the identity compared. It is clear from the diagram that an orthogonal projection onto the identity subspace is not ideal for either person X or person Y. Given a fully representative set of training images for X and Y, we could work out in advance the ideal projection. We do not however, wish (or need) to restrict ourselves to acquiring training data in advance. If we wish to identify an example of person Z, for whom we have only one example image, the best estimate possible is the orthogonal projection, A, since we cannot know from a single example whether Z behaves like X (in which case C would be the correct identity) or like Y (when B would be correct) or indeed, neither. The discriminant analysis produces only a first order approximation of class-specific variation.
In our approach we seek to calculate class-specific corrections from image sequences. The framework used is the Combined Appearance Model, in which faces are represented by a parameter vector , as in Equation 1 .
LDA is applied to obtain a first order global approximation of the linear variation describing identity, given by an identity vector, , and the residual linear variation, given by a vector . A vector of appearance parameters, can thus be described by

where and are matrices of orthogonal eigenvectors describing identity and residual variation respectively. and are orthogonal with respect to each other and the dimensions of and sum to the dimension of . The projection from a vector, onto and is given by

and

Equation 6 gives the orthogonal projection onto the identity subspace, , the best classification available given a single example. We assume that this projection is not ideal, since it is not class-specific. Given further examples, in particular, from a sequence, we seek to apply a class-specific correction to this projection. It is assumed that the correction of identity required has a linear relationship with the residual parameters, but that this relationship is different for each individual.
Formally, if is the true projection onto the identity subspace, is the orthogonal projection, is the projection onto the residual subspace, and is the mean of the residual subspace (average lighting,pose,expression) then,

where is a matrix giving the correction of the identity, given the residual parameters. If is an p by 1 column vector, and an q by 1 column vector, then the matrix is p by q.
During a sequence, many examples of the same face are seen. We can use these examples to solve Equation 8 in a least-squares sense for the matrix , thus giving the class-specific correction required for the particular individual. The vector is unknown, but if we assume that the residual correction is linear, then can be found by normalising and about the local means of the sequence, , and , writing

and

Let represent the elements of The elements of and are independent and the value of the i th element of is given by

Thus, each row of relates the residual variation, , to one of the identity parameters, . If we have N > q examples of the individual face, we can solve for each row, i, of the correction matrix separately. Let be a vector of the examples of seen and a matrix of the examples of seen. Let be row i of the correction matrix, then we can write,

This is simply an overdetermined system of linear equations and can be solved for the elements of by standard methods. Having found , we can, given a new example, with measured identity, , and residual variation, , solve Equation 8 to find , the corrected identity.

Each column of describes the effect of each residual parameter on the correction of identity. The magnitude of the column is a measure of how much new information has been learnt about the corresponding residual parameter. For example, if there is very little lighting change in the sequence, those residual parameters corresponding to lighting will have little effect on the correction, and the estimate will revert to the orthogonal projection in that direction.

3.2 Tracking Face Sequences

In each frame of an image sequence, an Active Shape Model can be used to locate the face. The iterative search procedure returns a set of shape parameters describing the best match found of the model to the data. We can also extract the shape-free grey-level parameters from the extracted shape, and thence calculate the combined appearance model parameters.
Baumberg [ 1 ] has described a Kalman filter framework used as a optimal recursive estimator of shape from sequences using an Active Shape Model. In order to improve tracking robustness, we propose a similar scheme, based on the decoupling of identity variation from residual variation.
The combined model parameters are projected into the the identity and residual subspaces by Equations 6 and 7 . At each frame, t, the identity vector, , and residual vector are recorded. Until enough frames have been recorded to allow Equation 13 to be solved, the correction matrix, is set to contain all zeros, so that the corrected estimate of identity, is the same as the orthogonally projected estimate, . Once Equation 13 can be solved, the identity estimate starts to be corrected.
Two sets of Kalman filters are used, one for the corrected identity parameters, in which the underlying model of motion is treated as a zeroth order, or constant position model, and another for the residual parameters, where the motion model is assumed to be first order, or constant velocity. This models the sequence realistically during tracking since the system model treats identity as fixed - something which is certainly true for sequences - and thus the tracking is robust to any noise in the tracking corresponding to apparent change of identity.

3.3 Example

We present an example of this system applied to a face sequence. Figure 5 shows frames selected from a sequence, together with the result of the Kalman filter-based Active Shape Model search overlayed on the image. The filter tracks identity as a zeroth order process and residual variation as a first order process. The subject talks and moves while varying expression. The amount of movement increases towards the end of the sequence.

Figure 5: Tracking and identifying a face.

Figure 6 shows the values of the first 3 elements of the corrected identity vector, . Also shown are similar results without the class specific correction applied.

It can be seen that the corrected, filtered identity parameters are much more stable than the raw parameters.

Figure 6: First 3 parameters of corrected and uncorrected identity vectors. Parameters are scaled by their respective variance over the training set.

Next: 4 Conclusion Up: Learning to Identify and Previous: 2 Background

Gareth J Edwards
Thu Jul 10 11:17:39 BST 1997