Given the list of corresponding points between the two images, it
remains to find the transformation which registers the overlapping
portions of the images. Assuming a pinhole camera model, the
transformation between pixels (
u
,
v
) in image 1 and pixels
in image 2 is described by a plane-to-plane projectivity [
8
]:
The 8 parameters of the projectivity can be found from four pairs of matching points. Since we typically have many more than four matches, we use RANSAC regression [ 3 ] to reject outlying matches and estimate the projectivity from the remaining, good matches. The projectivity is fine-tuned using correlation at the corners of the overlap region to obtain four correspondences to sub-pixel accuracy. Image 1 is then transformed into image 2's coordinate system using ( 1 ), and the two displayed together in image 2's coordinate system: Figure 5 (a) shows a typical result. For comparison, we show in Figure 5 (b) the result obtained using the optimal affine transformation in place of the projectivity. Since there is little depth variation in the image one might expect an affine model to suffice: however, even though some parts of the overlap region are well registered, other parts are clearly blurred. The projectivity is necessary to achieve the high accuracy required for document mosaicing, where even single pixel registration errors are noticeable.
Figure 5:
Mosaicing of two document images.
In the overlap region the pixels are blended, using an unweighted mean
at the centre of the region and increasingly weighted means towards the
edges. This blending operation eliminates any abrupt seams in the
mosaic, but will result in a blurred composite if the registration is
not accurate. Blurring is evident in the affine mosaic (b), but not in
the mosaic constructed using a plane-to-plane projectivity (a).
Close-ups of typical seams from (a) and (b) are shown in (c) and (d)
respectively. Note the system's ability to cope with mixtures of fonts
and unaligned columns.
Overlapping images are registered pair-wise in the order that they are
acquired. A final composite of the whole page is then constructed by
mapping all the images into the coordinate system of an ``anchor''
image, usually chosen to be the one nearest the page centre. The
transformations to the anchor frame are calculated by concatenating the
pair-wise transformations found earlier. Care must be taken to ensure
that images acquired late in the sequence do not overlap images acquired
much earlier on. Without such precautions error accumulation could be a
problem, though such errors could be eliminated using hierarchical
sub-mosaics [
7
]. A typical whole-page mosaic (rotated through
) is shown in Figure
6
. The mosaic is approximately
pixels, giving a resolution of about 150 dpi.
Figure 6:
A whole page mosaic.
To construct the mosaic all images were transformed to the anchor frame
by concatenating the pair-wise projectivities. Where images overlap a
weighted blending operation was performed, as described in the caption
to Figure
5
. Blending was strictly pair-wise: at any pixel where more than two
images overlap, only the two intensities with the largest weights were
blended. Note how the system copes with multiple fonts, including
mathematics.
Note that, typically, the camera is not perfectly parallel to the tabletop. In the example in Figure 6 , the bottom of each image is slightly more distant than the top. The effect is barely noticeable in the individual images, but more evident in the mosaic. While it is straightforward to rectify the mosaic using a single plane-to-plane projectivity, we chose to display the raw mosaic to illustrate this point. Likewise, shading variations could be removed by histogram equalisation.
A.H. Gee