A fundamental hypothesis of the proposed technique for automatic hand gesture recognition is that gestures may be modeled as sequences of a finite number of ``canonical" postures of the hand. Each posture is associated to a state of a probabilistic finite state machine, in particular of a Hidden Markov Model. Each gesture is identified with a HMM with an appropriate number of states and transition probabilities. The recognition problem becomes that of estimating the number of states and identifying the parameters of the model from the observations sequence. The time trajectory of the state estimates describe the estimated gesture.
The equations of a HMM with discrete states and continuous range observations are
in which is a finite-state Markov chain and are vector real-valued observations. The components of are the means of the size functions. is a sequence of N(0,1) independent and identically distributed random variables. The model is specified by the transition probability matrix A and by the matrices C and of appropriate dimensions. Without loss of generality the N states can be identified with the set of the unit vectors in .
The observation equations are the sum of a component directly determined by the state and a gaussian noise term with variance determined by the elements of . When the first term of this sum becomes exactly the j -th column of C , hence this column is a symbol of the correspondent state in the observation space. In other words, this quantity is the mathematical description of what we called the canonical posture associated with the considered state. It is a posture model.
Well known algorithms [ 1 ] generate the sequence of estimates of the states by measuring at any time k the probabilistic distance between the current observation and each posture model, according to the following expression
where is, except for a scale factor, the gaussian density of the current observation given the i -th posture model and is the i -th column of A .
According to our hypotesis, we expect clustering of the observations around the posture models. This is confirmed in real image sequences as it can be seen in Figure (5).
Adrian F Clark