BMVA 
The British Machine Vision Association and Society for Pattern Recognition 

BibTeX entry

@PHDTHESIS{200610N._D._H._Dowson,
  AUTHOR={N. D. H. Dowson},
  TITLE={Simultaneously Modelling and Tracking using Mutual Information},
  SCHOOL={University of Surrey},
  MONTH=Oct,
  YEAR=2006,
  URL={http://www.bmva.org/theses/2006/2006-dowson.pdf},
}

Abstract

The aim of this work is to track non-rigid objects using appearance in real-time without a pre-learned model. The tracker should be robust to noise, occlusions, changes in lighting conditions, and pose variations. Recovery from tracking failure should be possible. The approach taken is to localise small image patches within each frame of a video sequence using function optimisation. Mutual Information (MI) is chosen as the similarity metric, due to its robustness to outlier pixel values and lack of assumptions about linear intensity relationships. The choice of similarity metric and descriptor strongly influence performance, whatever the application, hence the contributions made here have wide applicability. Despite the wide use of MI for registration, many open problems still exist. MI suffers from artefacts in its function surface, which can prevent convergence. Numerous different MI methods have been proposed to overcome artefacts, but few methods have published analytic derivatives, precluding their use in fast Newton-type optimisation. This thesis shows that MI methods only differ in how the joint-histogram is generated from image intensity samples, and places them into a single mathematical framework consisting of four families. Analytic derivatives are derived in every case. The availability of analytic derivatives allows MI to be placed into a Lucas-Kanade tracking framework, termed MILK. An inverse-compositional formulation for optimisation is presented. This yields a faster convergence, because the Hessian is pre-computed. In addition to slowing convergence, artefacts shift the maximum away from its “true” position, i.e. they induce bias. Multiple proposals to reduce the effect of artefacts are tested in this light. Of these, only super-sampling reduces artefacts in a controllable way. This concurs with existing theory that histogram (and hence MI) accuracy is limited by the number of samples and the histogram resolution. Super-sampling is taken to its logical conclusion using an extension of the Non-Parametric (NP) windowing method. NP-windowing is equivalent to taking samples at infinite resolution, by utilising the implicit structure of image data to integrate interpolation functions over entire pixels. Novel simplifications to existing theory using Green’s theorem are also introduced. NP-windowing yields biases of half (or less) than that of other MI methods. It also improves convergence, and is independent of template size. MI is applied to tracking where the problems of drift and mismatch, must be traded off. The primary cause of drift and mismatch is simplistic appearance models. This is overcome using a proposed method to cluster appearance exemplars on-the- y using a Bayesian approach to discriminate between cluster inliers and outliers. The method is termed Simultaneous Modelling and Tracking (SMAT). Greedy selection of particular clusters for tracking allows real-time performance. SMAT is extended to model object shape and recovery from tracking failures is made possible by selective correction of the feature position. The zig-zag-zen family of semi-automated tests is proposed, making empirical tests using multiple (11) sequences of many frames (568 on average) practical. In testing, the advantages of: MI over SSD, SMAT over other template based methods, and shape modelling are clearly demonstrated.