Next: 2 Representation of shape Up: Using Hidden Markov Models Previous: Using Hidden Markov Models

1 Introduction

An important, but very challenging problem in computer vision is automatic gesture recognition. Hand gesture recognition is, in particular, of tantamount importance in a number of applications, from sign language interpretation to man/machine interfaces. This problem has been tackled by a number of researchers who proposed different technical approaches some with promising results, see for example [ 8 ], [ 10 ], [ 5 ], [ 2 ].

In this paper, we present a new technique based on dynamic shape representation. In order to avoid difficulties related to the interpretation of the ``meaning" of a gesture, we shall define a gesture as a set of trajectories in the sequence of images. In other words, the only reality we acknowledge are the signals provided by the sensors, which in the literature is called visual behavior [ 10 ]. No references to human interpretations of these events are made. In this setup, the problem consists in recognizing and classifying prototypes of gestures in the observations sequence. Clearly, sufficiently ``rich" observations must be provided, rich meaning that two different gestures must not generate the same observations sequence.

One of the goals of the work presented in this paper was understanding what could be done with a single b/w camera in a fixed position. To simplify the problem even further we eliminated the background so that in the images only the hand appears. The occluding contour of the hand in each image is measured and represented with size functions which are topological descriptions of shape. Size functions have been studied by Frosini [ 3 , 4 ] and applied by Verri [ 8 , 9 ] for automatic sign language interpretation. They allow for a complete description of shape and they show a remarkable tolerance to noise and to small variations of shape.

Conceptually, we model a gesture as a sequence of hand postures which project into sets of size functions. The hypothesis at the basis of this work is that the sets of trajectories corresponding to different gestures are discriminated by dynamics. The kinematics of the posture of a human hand can be described by a mechanical system with 27 degrees of freedom [ 7 , 6 ]. Measuring the configuration of such a complex system from monocular images is a very difficult problem, mainly because of self-occlusions and size functions may not suffice to estimate the full configuration even if integrated in time. Common experience suggests, however, that a gesture may be characterized by a sequence of only a finite number of hand postures and that, therefore, it is not necessary to describe hand posture as a continuum. This led us to describe the dynamic behaviour of gestures by means of a probabilistic finite-state models using the well known formalism of hidden Markov models.

This approach has many advantages like: robustness to noise and small perturbations of the data caused by system calibration errors, robustness with respect to changes of the observed subject (hands of different people), flexibility, since it allows, for example, to neglect insignificant changes in duration of the execution of gestures, statistically measurable performance, etc.

Next: 2 Representation of shape Up: Using Hidden Markov Models Previous: Using Hidden Markov Models

Adrian F Clark
Mon Jul 28 12:54:58 BST 1997