BibTeX entry
@PHDTHESIS{201104Zdenek_Kalal,
AUTHOR={Zdenek Kalal},
TITLE={Tracking-Learning-Detection},
SCHOOL={University of Surrey},
MONTH=Apr,
YEAR=2011,
URL={http://www.bmva.org/theses/2011/2011-kalal.pdf},
}
Abstract
Visual tracking is the process of locating an object in a video sequence. This thesis investigates visual tracking of an unknown object, which significantly changes its appearance and moves in and out of the camera view. The object is defined by its location and extent in a single frame. In every frame that follows, the task is to determine the object’s location and extent or indicate that the object is not present. We propose a novel tracking paradigm (TLD) that decomposes the visual tracking task into three sub-tasks: Tracking, Learning and Detection. The tracker follows the object from frame to frame. The detector localizes appearances that have been observed during tracking and corrects the tracker if necessary. Exploiting the spatio-temporal structure in the video sequence, the learning component estimates errors performed by the detector and updates it to avoid these errors in the future. The components are analyzed in detail. In tracking, we develop a method for detection of tracking failures that we call Forward- Backward (FB) error. The FB error allows us to measure the reliability of point trajectories in video. Next, we design a novel object tracker, which represents the object of interest by a grid of points the reliability of which is measured using the FB error. The performance of the tracker is compared with state-of-the-art approaches. In detection, we focus on supervised learning of object detectors from large data sets. We develop a learning algorithm that optimally combines two popular learning approaches: boosting and bootstrapping. The improvements in terms of classifier speed and accuracy are achieved. In learning, we focus on incremental, real-time learning of object detectors from a video stream. We develop a novel learning theory, P-N learning, which drives the learning process by a pair of ”experts” on estimation of detector errors: (i) P-expert estimates missed detections; (ii) N-expert estimates false alarms. Convergence properties of the learning method are analyzed and conditions that guarantee improvement of the detector are found. The theory is validated on both synthetic and real data and specific examples of the experts are given. Finally, a real-time implementation of the TLD is described and comparatively evaluated on benchmark sequences. A significant improvement over state-of-the-art methods is achieved.