Next: 6 Conclusions and further Up: Document Mosaicing Previous: 4 Establishing correspondences

5 Mosaicing the images

Given the list of corresponding points between the two images, it remains to find the transformation which registers the overlapping portions of the images. Assuming a pinhole camera model, the transformation between pixels ( u , v ) in image 1 and pixels in image 2 is described by a plane-to-plane projectivity [ 8 ]:

The 8 parameters of the projectivity can be found from four pairs of matching points. Since we typically have many more than four matches, we use RANSAC regression [ 3 ] to reject outlying matches and estimate the projectivity from the remaining, good matches. The projectivity is fine-tuned using correlation at the corners of the overlap region to obtain four correspondences to sub-pixel accuracy. Image 1 is then transformed into image 2's coordinate system using ( 1 ), and the two displayed together in image 2's coordinate system: Figure 5 (a) shows a typical result. For comparison, we show in Figure 5 (b) the result obtained using the optimal affine transformation in place of the projectivity. Since there is little depth variation in the image one might expect an affine model to suffice: however, even though some parts of the overlap region are well registered, other parts are clearly blurred. The projectivity is necessary to achieve the high accuracy required for document mosaicing, where even single pixel registration errors are noticeable.

Figure 5: Mosaicing of two document images. In the overlap region the pixels are blended, using an unweighted mean at the centre of the region and increasingly weighted means towards the edges. This blending operation eliminates any abrupt seams in the mosaic, but will result in a blurred composite if the registration is not accurate. Blurring is evident in the affine mosaic (b), but not in the mosaic constructed using a plane-to-plane projectivity (a). Close-ups of typical seams from (a) and (b) are shown in (c) and (d) respectively. Note the system's ability to cope with mixtures of fonts and unaligned columns.

Coping with many images

Overlapping images are registered pair-wise in the order that they are acquired. A final composite of the whole page is then constructed by mapping all the images into the coordinate system of an ``anchor'' image, usually chosen to be the one nearest the page centre. The transformations to the anchor frame are calculated by concatenating the pair-wise transformations found earlier. Care must be taken to ensure that images acquired late in the sequence do not overlap images acquired much earlier on. Without such precautions error accumulation could be a problem, though such errors could be eliminated using hierarchical sub-mosaics [ 7 ]. A typical whole-page mosaic (rotated through ) is shown in Figure 6 . The mosaic is approximately pixels, giving a resolution of about 150 dpi.

Figure 6: A whole page mosaic. To construct the mosaic all images were transformed to the anchor frame by concatenating the pair-wise projectivities. Where images overlap a weighted blending operation was performed, as described in the caption to Figure 5 . Blending was strictly pair-wise: at any pixel where more than two images overlap, only the two intensities with the largest weights were blended. Note how the system copes with multiple fonts, including mathematics.

Note that, typically, the camera is not perfectly parallel to the tabletop. In the example in Figure 6 , the bottom of each image is slightly more distant than the top. The effect is barely noticeable in the individual images, but more evident in the mosaic. While it is straightforward to rectify the mosaic using a single plane-to-plane projectivity, we chose to display the raw mosaic to illustrate this point. Likewise, shading variations could be removed by histogram equalisation.

Next: 6 Conclusions and further Up: Document Mosaicing Previous: 4 Establishing correspondences

A.H. Gee
Wed Jun 25 11:02:12 BST 1997