Next: 3 Face recognition tasks Up: Face Recognition in Dynamic Previous: 1 Introduction

2 Locating and tracking faces using colour

Figure 2: Three MPEG movies showing real-time tracking of objects (including a face) in dynamic scenes using colour. The system copes with large pose variations and partial occlusion.

A system for detecting and tracking faces was previously described [ 8 ]. It combined motion detection by spatio-temporal filtering with an appearance-based face model in the form of a neural net. Multiple person tracking was performed using time-symmetric matching and Kalman filtering. In this section, the use of colour as a cue for detection and tracking is described. Colour provides a computationally efficient yet effective method which is robust under rotations in depth and partial occlusions. It can be combined with motion and appearance-based face detection.

Figure 3: The tight clustering of skin colour for three different races is illustrated here. The top row shows the face regions used to build the mixture models. The bottom row shows the colour distributions plotted in HS space with 2-component Gaussian mixtures overlaid.

Human skin forms a relatively tight cluster in colour space even when different races are considered [ 5 ]. Figure 3 shows the colour distribution of three faces in hue-saturation (H-S) space. Face colour distributions were modelled as Gaussian mixtures of the form:

The mixing parameter P ( j ) corresponds to the prior probability that the data, , was generated by component j . Each mixture component, , is a Gaussian with mean and covariance matrix . Given n face pixels , , Expectation-Maximisation (EM) provides an effective maximum-likelihood algorithm for learning a Gaussian mixture model [ 9 ]. An expectation (E) step consists of evaluating the posterior probabilities for each mixture component. Let the sum of these probabilities be . A maximisation (M) step then updates the mixture components as follows:

The E and M steps are iterated until convergence. If M =1, the parameters of the Gaussian are estimated directly.

In practice, an H-S model of a single person functions well with other races. The mixture model is used to assign a probability to each pixel in an image and faces are detected by grouping suitably sized areas of high probability. A face is tracked by estimating the position as the mean and the spatial extent as the vertical and horizontal standard devaitions of the local colour probability distribution in the image plane. For a given frame t, the box position is estimated as an offset from the position :

where ranges over all image coordinates in the region of interest and is the colour point at image position . To improve accuracy, probabilities are thresholded. Values lower than the threshold are taken to be background and are consequently set to zero in order to nullify their influence on the estimation of and . The size of the bounding box is estimated by computing the standard deviation weighted by the pixel probabilities:

Figure 2 shows a sequence of a face being tracked with a moving camera against a cluttered background. The tracker's ability to deal with changes in scale, large rotations in depth and partial occlusion are all clearly demonstrated.

The colour-based tracking system has been implemented on a 200MHz Pentium PC equipped with a Matrox Meteor frame grabber and a Sony EVI-D31 active camera. The camera can be driven by maintaining the mean position, m , at the centre of the image. Tracking is performed at approximately 15 frames per second. Some problems are inevitably caused by large changes in the spectral composition of scene illumination. It has been found necessary to use at least two colour models, one for interior lighting and one for exterior natural daylight.

Next: 3 Face recognition tasks Up: Face Recognition in Dynamic Previous: 1 Introduction

Shaogang Gong
Fri Jul 11 10:14:24 BST 1997