BMVA 
The British Machine Vision Association and Society for Pattern Recognition 

BibTeX entry

@PHDTHESIS{200612Teofilo_de_Campos,
  AUTHOR={Teofilo de Campos},
  TITLE={3D Visual Tracking of Articulated Objects and Hands},
  SCHOOL={Oxford University},
  MONTH=Dec,
  YEAR=2006,
  URL={http://www.bmva.org/theses/2006/2006-decampos.pdf},
}

Abstract

The ability to track multiple and articulated objects is an important one, not least in the areas of autonomous and teleoperated robotics, visual surveillance and human motion analysis. This thesis is concerned with marker-free real-time detection and tracking of articulated objects, targeting human hands with the aim to study methods that can be applied to enhance the interaction between humans and 3D (real or virtual) objects. A survey summarises methods used to approach this and related problems in the literature. It indicates that, despite the large body of research in this field over twenty or so years, the area still proves challenging. Two main approaches have been identified. The first, known as generative tracking, uses an explicit kinematical representation of linkages or constraints between object parts and tracks by minimising error of projected control points. The second, known as discriminative approach, little is specified beforehand, but training data is used in order to create a map between image observations and 3D poses. This thesis describes novel work in both areas. In the generative area, a method for tracking of articulated objects is described. It is a new extension of a method for tracking rigid objects in which the motion constraints between parts of the object are imposed up-front within the tracking process. The inter-frame pose update is derived as the solution of a linear system. This method has been applied to track articulated objects, including hands and multiple objects with motion constraints. An alternative method is that based on estimating the motion of each subpart independently, thereby introducing redundant degrees of freedom, and imposing constraints later in a lower dimensional subspace. This method is reviewed and a comparison between this and the aforementioned method is presented in terms of accuracy, efficiency and robustness. In the discriminative area, an inference-based approach is adopted in which a non-parametric relation between global image measurements and 3D poses is learnt using a multivariate regressor based on Relevance Vector Machine. This relation is a continuous map that allows fast and efficient pose estimation from static images. This method can detect and estimate the 3D pose of hands from static images, so it can be applied to (re-)initialise the generative tracker. In this thesis, the use of multiple view is adopted as a solution to reduce the ambiguities for both generative and discriminative methods. Experiments with single and multiple views are described and a novel extension of the discriminative method for multiple views is proposed and evaluated. Supporting videos are available from: http://www.robots.ox.ac.uk/~teo/thesis/