Next: 3 Detecting features for Up: Document Mosaicing Previous: 1 Introduction

2 Obtaining suitable images

In many mosaicing systems the images to be stitched together are selected by hand [ 10 , etc,]. For this application we have automated this part of the process. The document is moved about by hand under a stationary, over-the-desk camera until all parts of the document have passed through the camera's field of view. At all times the motion of the document is coarsely tracked by the vision system. Snapshots of the document are taken periodically such that successive snapshots overlap by about 50%. We have found this to give a good compromise between registration accuracy (which benefits from larger overlaps) and mosaicing efficiency (which requires the document to be tiled by the smallest number of images).

The tracking is performed using a simple correlation process -- see Figure 1 . In the first frame, a snapshot is taken and a small patch extracted from the centre of the image for use as a correlation template. Correlation is performed in an area four times the size of the patch in the next frame, with the peak in the correlation function indicating the motion of the paper. The template is resampled from this frame and tracking continues. When the template reaches the edge of the document, another snapshot is taken and a fresh template is sampled from the centre of the current frame.

Figure 1: Document tracking. A simple correlation approach runs at frame rate on standard hardware. The correlation pattern tends to mirror the rows of text, though there is generally a definite peak reflecting the true inter-frame motion. The template is continuously updated so that an incorrect correlation peak results in mild tracking inaccuracy and not total failure. The accuracy is sufficient to tell when the centre of the last snapshot has reached the edge of the field of view.

This simple tracking process runs at frame rate on a low-end Silicon Graphics Indy Workstation. It occasionally looses track, but this does not matter: the small search window ensures that an erroneous motion estimate will not be too far from the true value, and tracking can resume with the updated template. We are not trying to measure accurate motion, we only want to know roughly when the centre of the last snapshot has moved to the edge of the field of view.

The snapshots are stored in an ordered list until the user is satisfied that the whole document has been imaged. The images are then stitched together pair-wise, using the algorithms described below.

Next: 3 Detecting features for Up: Document Mosaicing Previous: 1 Introduction

A.H. Gee
Wed Jun 25 11:02:12 BST 1997