Tools for visual tracking may be divided into four classes:
While the context of highways and vehicles is clearly very structured, we avoid using direct scene models in the low-level tracking algorithms, and this distinguishes our work from that of Dickmanns's group [ 4 ], for instance. Thus we have rejected model-based trackers (class 1). We draw on the large amount of work on scene reconstruction from multiple images in unstructured scenes, in particular the work on robust motion segmentation [ 20 ], and affine reconstruction [ 18 ]. These approaches are able to take advantage of the redundant information in images, because they latch onto whatever features are available, whereas model-based methods are restricted to the features associated with the chosen model. Redundancy is a vital issue here, because vision is a massively redundant sensor, and approaches which negate this aspect are likely to be discarded in the long term. Previous experience of vehicle tracking using features, without explicit models [ 1 ] leads us to believe that this approach is valid. Because we reconstruct the geometry of the lead vehicle, the algorithms generalize naturally to different vehicle types.
In our highway scenario we can expect the apparent motion of the background to be large, so we cannot use methods in class 2. Velocity-based trackers (class 3) have had some success in motion tracking, but they have a basic problem that restricts their usefulness. In order to track an object over an extended time, it is necessary to compute the position of the object, which in this context may only be computed by integrating the velocity over time. Since there are inevitable errors in the computed velocities, these errors will tend to accumulate over time. Thus we can expect the computed position to drift over time.
Thus we are drawn to feature tracking algorithms, which have the capability to allow the position of an object to be accurately estimated over an extended time. This aspect is of vital importance in the context of sensing for control, where the sensor is required to return accurate error feedback during the whole period of the control task. Moreover we have demonstrated in previous work [ 1 ] that vehicle tracking using features can be made robust both to partial occlusion of the vehicles and to lighting changes in the environment. The major problems that have to be overcome when tracking features are the fragmentary nature of the data (features appear, disappear and change shape) and the integration of feature data from multiple images in a statistically valid manner. We now turn to those problems.
Adrian F Clark