Next: 8 Discussion and Conclusions Up: Vision for Longitudinal Vehicle Previous: 6 Robust stereo and

7 Results

The experimental setup is illustrated in figure 2 .

Figure 2: The experimental setup.

We logged approximately 20 minutes of synchronized video and laser radar [ 8 ] data at the HPCC (Honda Proving Center of California) near Mojave. We then digitized a sequence of 2000 stereo images from the video tapes at 3 frames per second, i.e. every tenth image was digitized. We selected an initial window surrounding the lead vehicle, although subsequent processing was completely automatic. In the real-time implementation we intend to drive visual focus of attention from the output of the laser radar. Figure 3 show some example images, with the tracking results superimposed. The corner features are shown as small crosses, white for those matched over time or in stereo, and black for unmatched features. The black and white circle indicates the position of the fixation point, which ideally should remain at the same point on the lead car throughout the sequence. The white rectangle describes the latest estimate of the bounding box for the vehicle, whose size is updated using the diagonal entries in the and matrices to estimate the change of scale. We have attempted here to summarize significant aspects of our data. Images 1 and 2 show the first stereopairs in the sequence, where the vehicle is close (17m) to the camera and range estimates from stereo disparity may be expected to be accurate. By contrast images 121-123 and 421-423 were taken when the vehicle was 40m and 60m respectively from the camera (the greatest distances achieved during the sequence). Here we can predict that depth estimates from stereo will be unreliable, since the disparity relative to infinity is only a few pixels and so difficult to measure, whereas it will still be feasible to use the change in apparent size measured by the motion processing to obtain reasonable range estimates.

Figure 3: Example stereo-pairs from the tracking sequence.

Images 165 to 169 illustrate a bumpy section of road where the image position of the car jumps by ten pixels or more between frames. The tracker worked for approximately four minutes before failure, tracking reconstructing the shape and motion over 760 images. Failure was due to a gradual drift in the motion estimates. This is likely to have been caused by the constant interchange between tracked features, which eventually causes the fixation transfer algorithm to drift. This effect can be seen in the later images from the sequence, notably image 642 onwards, where the fixation point in the left image has shifted downwards from its original position on the car, and in the final image 761 the fixation point is about to leave the tracking window, when failure occurs. This drift effect may impose a limit on the time that the tracker may run continuously before being reset.

We computed the range and bearing estimated from the laser range finder and plot them together with corresponding data collected from the vision algorithms in figures 4 and 5 .

Figure 4: Comparison of range estimates from laser radar and vision.

Figure 5: Comparison of bearing estimates from laser radar and vision.

Depth from stereo is computed by inverting the projection of the fixation point at each image pair and finding the closest point of intersection of the two resulting space rays. The cameras are very roughly calibrated. The depth/scale ambiguity in the motion estimates is removed by fixing the depth estimated by motion to be identical to the stereo measurement at the first frame. We then obtain independent estimates of depth from motion from the left and right image sequences as the inverse of the top-left element of (horizontal diagonal element), scaled by the initial stereo depth. This procedure explains why there is such good agreement between motion and stereo early in the sequence. As predicted, the stereo depth estimates become more noisy when the range is large, whereas depth from motion remains fairly smooth.

The distance to the lead vehicle fluctuates widely in the range from 6m to 60m. Any significant depth change causes the whole feature set to be replaced, and we would suggest that the performance in maintaining track and consistent fixation point transfer over several complete feature set replacements constitutes a significant achievement. However as the sequence progresses, the tracker gradually degrades in performance. This should be expected. In practice one would want to reset the tracker from time to time. It should be noted that the bearing estimates from vision remain good until towards the end of the sequence, when the left image tracker estimate starts to break.

Next: 8 Discussion and Conclusions Up: Vision for Longitudinal Vehicle Previous: 6 Robust stereo and

Adrian F Clark
Thu Jul 10 21:18:54 BST 1997