Rigid Motion Recovery from less

than Eight Feature Point Matches

 

Miroslav Trajkovic and Mark Hedley

Department of Electrical Engineering

University of Sydney, NSW 2006, Australia

miroslav@ee.usyd.edu.au

 

Abstract

In this paper we propose a new method for the estimation of rigid motion from two monocular images when less than eight point correspondences are available. The motion parameters are found using the essential matrix. By employing previously unused constraints on essential matrix, we show that it can be estimated through the minimisation of a two-dimensional cost function defined over the space of all possible directions of translation. The essential matrix and the direction of translation are estimated simultaneously. Once they have been found the rotation and depths are readily computed. The new formulation is easier to understand and implement than previously proposed approaches, and has a low computational cost. The algorithm has been evaluated on synthetic data. Our experiments show that that the new method is capable of finding all solutions and that choice of initial state is not critical.

1. Introduction

Motion is a powerful clue for an observer to explore and interact with the environment. It is well known that given two images of the static environment taken by a moving camera, it is possible to recover the motion of the camera (rotation and direction of translation) between two acquired images, and the relative depths of the scene points in a reference frame.

At least five pairs of matched feature points between two images are needed to determine the motion between images [18]. Furthermore, if eight or more matched points are available then a linear algorithm may be used to recover the motion [9]. While the case of eight or more matched points has been widely studied [1, 9, 14, 18], considerably less research has been done for the case where there are between five and seven matched points. The case of five point matches has been investigated by ([4], [13]), and they have shown that up to ten solutions may exist in general. Netravali et al have also given a generalisation of their algorithm for the case of six and seven point matches. When six or more point matches are available there is, in general, a unique solution for camera orientation [13], and in the case of seven point matches a closed form for the motion may be found (which requires solving a cubic equation).

In this paper, we present a simple algorithm for recovering 3D motion (rotational and translational) and relative depths when less than eight point matches are available. We formulate the problem in a new way that leads to a simpler algorithm for recovering the motion ( cf . [4, 13]). As a consequence our algorithm is easier to implement and understand, and it appears to be more stable.

The paper is organised as follows. In section two we give the necessary background to the projective transformation and introduce the essential matrix and some of its properties. In section three we give a brief review of the related work. Section four presents methods for recovering rigid motion from five, six and seven point matches. Experimental results are presented in section five. A discussion of results and possible extensions are given in section six and concluding remarks are given in the last section.

2. Background

2.1 Notation

In this paper we assume a pinhole camera model. The coordinate system is fixed on the camera with the origin coinciding with the projection centre of the camera and the z -axes coinciding with the optical axes and pointing toward the scene. The focal length is assumed known, and without loss of generality set to unity. Therefore, the equation of the image plane is z = 1 .

Let us consider an arbitrary point before and after the motion. The following notation is introduced and will be used throughout the paper.

spatial vector of the point before the motion;

spatial vector of the point after the motion;

image vector of the point before the motion; and

image vector of the point after the motion.

A mapping from a 3-dimensional vector to a 3 ´ 3 matrix is defined as follows:

Using this mapping, we can express the cross product operation of two vectors by the matrix vector multiplication:

A function svd that maps a 3 ´ 3 matrix to a 3-dimensional vector is defined as

where and are singular values of A in decreasing order.

2.2 Motion Model

The motion model that is assumed in this paper is a rigid motion model which can be decomposed into a rotation and a translation and is described by the following equation:

(1)

and after some rearrangement (see [19]) the following equations is obtained:

(2)

where T 0 denotes direction of translation and . The matrix E is called the essential matrix and has been found (along with the fundamental matrix which is corresponding matrix for an uncalibrated camera) to be very useful in Computer Vision for motion recovery ([9], [18]) and motion segmentation [16]. The essential matrix has the following properties that we will use throughout the paper.

Lemma 1. T 0 is a unit eigenvector of corresponding to the zero eigenvalue.

Proof: .

Lemma 2. or equivalently, eigenvalues of are 1, 1 and 0. For a proof see [7, 19].

Lemma 3. . See [18] for a proof.

Given n point correspondences, equation (3) can be rewritten as linear equations in the elements of E and we obtain

A E = 0

(3)

where

and

, for .

If eight or more ideal matches are available the rank of A is exactly eight (except for some degenerate cases, see [10]) and where h is the unit eigenvector asso-ciated with the zero eigenvalue of . In the presence of noise h is found as a unit vector that minimises || A h ||, and this is a unit eigenvector of associated with the smallest eigenvalue. The matrix E is then determined as

(4)

and R and T can be computed using method described in [19]. The methods that are based on solving system (3) when more than eight matches are present are known as so-called E –matrix algorithms. Their advantage is that they are linear and simple to compute, but they can not be used if the number of points is between five and seven, in which case a solution still can be found.

3. Prior Research

The motion recovery problem has been extensively studied by the computer vision community for the last fifteen years. There are two types of commonly used motion recovery algorithms – linear (described in the previous section) and non-linear. Non-linear algorithms may be broadly divided into two classes: 1) General methods based on non-linear estimation which simultaneously solve for camera orientation and relative depths through the minimisation of a non-linear cost function ( e.g . [15]); and 2) Subspace methods ( e.g. [5]), which subdivide the problem and solve separately for rotation, direction of translation and relative depths. The second class of methods is more efficient, as it usually first involves the minimisation in a two or three dimensional space to find the direction of translation or rotation, then the computation of other parameters by solving linear equations.

While both linear and non-linear algorithms have been proposed for the situation where eight or more point matches exist, the case for less than eight point matches has been investigated only in two papers ([4] and [13]). In both a complicated polynomial system needed to be solved to recover the motion.

Faugeras and Maybank [4] used two different approaches in their paper. The first one was based on the properties of essential matrix, which was introduced by [9]. They employed the fact that the set of essential matrices can be described by polynomial equations and investigated the multiplicity of solutions using algebraic geometry. A similar approach was taken by Natravaly et al [13], a set of three cubic and five linear equations need to be solved. This is even more computationally demanding to implement. The second approach of [4] employed projective geometry and was based on ideas developed by Chasles [3] and Kruppa [8]. Using this approach a high order polynomial needed to be solved. The problems with this approach are that it is difficult to understand and it is difficult to implement, as it requires extensive symbolic algebra.

Heeger and Jepson [5] (and before them [2]) assumed small rotation between two frames and have shown that in this case motion equation is bilinear what enabled them to find direction of translation through the optimisation of the cost function in the two-dimensional space (unit sphere). After that, they found rotation and relative depths by solving linear system. The main disadvantage of this method is that it assumes a small rotation, so it can not be applied in general. When the rotation is not small, the motion equation is no longer bilinear and subspace methods, at best, can be reduced to a three-dimensional optimisation.

In this paper, we propose a subspace method, which is specially designed for the case when less than eight motion correspondences are available and is based on evaluation of the essential matrix E given the constraints described in the previous section. The advantages of this method are: 1) it is simple; and 2) it requires only a minimisation in the compact two-dimensional space.

4. Finding motion from less than eight points

In this section we describe how to compute motion parameters when fewer than eight point matches are available. We will separately consider three cases: five, six and seven point matches. We will show that when seven point matches are available, the unique solution can be found by solving a cubic equation, and that in case of five or six point matches the camera motion can be recovered through the minimisation of two-dimensional functions defined on the unit sphere.

4.1 Computation of motion from five point matches

Our method is based on solving equation (4) with the constraints that E satisfies the properties of an essential matrix. For clarity, we will rewrite this here:

A h = 0 , subject to svd( E )=(1 1 0) T

(5)

where E is related to h by equation (5).

As A has rank 5 (A is 5 ´ 9 matrix), the solution space of equation A h = 0 will be a hyperplane spaned by eigenvectors and of the matrix associated with its four zero eigenvalues. Hence, the solution of equation (5) may be written as

(6)

where are the coefficients to be found. The essential matrix corresponding to h is given by

(7)

where are matrices associated with eigenvectors , and our task is to find which are solution to equation

.

This may also be formulated as a minimisation problem

.

The space of solutions for a may be further reduced in the following way. If E is an essential matrix, then according to Lemma 1 for the direction of translation T (unit vector) we have:

i.e . the search space for a can be reduced by noting that a must lie in the null space of B . The rank of B is 3 (it is a 3 ´ 4 matrix), so has a single eigenvector with zero eigenvalue.

The cost function with the reduced search space can be defined as a function of T (hence defined over a two dimensional space) as follows:

This cost function is always positive and will have minimum for those values of T for which . The search space is small (the unit hemisphere which is compact two-dimensional space) and the computational complexity of the evaluation of the cost function is low. In practice, to find all solutions, we performed the minimisation several times (the Simplex method, implemented in MATLAB, was used) with randomly chosen initial values. Usually all solutions were found. An example of typical cost function is shown in Figure 1.

4.2 Six point matches

When six point matches are available, matrix A has rank six and has three zero eigenvalues. The zero space of matrix is three-dimensional and the solution of equation (5) can be written as

,

where a= [ a 1 a 2 a 1 ] T are coefficients to be found and are eigenvectors of associated with its zero eigenvalues. As with 5 point matches, the essential matrix cor-responding to h is given by where are matrices associated to eigenvectors . Again, using Lemma 1 we obtain

The necessary condition for above system to have a non-zero solution is rank( B ) = 2, and we can employ another constraint on T , e.g . . This equation may be written after some rearrangement as a homogenous third order polynomial in T , but is very difficult to use in practice. Therefore, for this case, we employ the same minimisation procedure as for five points. The only difference is that vector a is three-dimensional and that it is computed as the eigenvalue of B associated with its smallest eigenvalue.

4.3 Seven point matches

With seven point matches, matrix A has rank seven and has two zero eigenvalues. The null space of matrix is two-dimensional and the solution of equation (5) can be written as

The essential matrix corresponding to h is given by where and are matrices associated to eigenvectors and . The equation to be solved is:

.

Using the restriction that det( E ) = 0, coefficients a 1 and a 2 may be found by solving a cubic polynomial. As this polynomial can have three solutions, it is necessary to check above equation and choose the solution that satisfies it. This method has been previously described and was used for estimation of the fundamental matrix [16, 11].

5. Experimental Results

In this section we present results of our algorithm applied to synthetic motion fields consisting of five or six point matches. Synthetic motion fields were generated as in [5], using a random depth map. The focal length was set to unity and the image was 9 ´ 9 pixels. The pixel width was set to 0.1 and all the points in first frame have integer (in unit of pixels) positions. Experiments were performed for various values for translation and rotation, and the motion fields used here were generated by rotation around axis (80 ° , 10 ° ) (in spheric coordinates) for the rotation angle of 16 ° and various translation vectors.

5.1 Five point matches

When five point matches are available the number of solutions corresponding to each set of point matches is up to 10. A solution is feasible if it yields a 3D reconstruction in which all the points are on the same side of the camera for both camera positions (if all the depths are negative, than we simply change the sign of translation vector and the sign of depths).

We give two examples of results using our algorithm. In the first, we applied rotation [80 ° 10 ° 16 ° ] and translation [1 1 2] and obtained the set of point matches , , , and .

 

Solution Number

Rotation Vector

Axis of Translation

Feasibility

1

116.351 16.714 17.601

0.2890 0.1383 0.9473

Yes

2

40.331 80.880 87.725

-0.3479 0.4074 0.8444

Yes

3

-143.215 106.410 74.660

0.6674 0.6086 0.4292

Yes

4

80.000 10.000 6.000

0.4082 0.4082 0.8165

Yes

Table 1 . Four solutions found for the given set of motion correspondences.

 

Figure 1 . Cost function corresponding to the above set of point matches. Solutions are marked with white squares with a black point in the middle. The solutions are shown in one quadrant only. Note that each solution has three counterparts in other quadrants but they all yield the same motion and structure parameters (up to a sign). The axes the are polar coordinates of the direction of translation ( T 0 ).

For this configuration we found all four solutions, with all of them being feasible. The results are shown in Table 1, while the cost function is shown in Figure 1. Light areas indicate a high value for the cost function, while dark areas indicate low cost. Solutions (minima) are marked as white squares with a black dot in the centre.

In most of our experiments we found four, or less frequently six, solutions. To check how the algorithm works when all ten solutions exist, we tested it using data for which we know that ten solutions exist. Let us consider a set of image correspondences a « a ¢ , b « b ¢ , c « c ¢ , d « d ¢ , e « e ¢ . It is shown in [4] that if a , b , c and a ¢ , b ¢ , c ¢ are two triads of mutually orthogonal vectors and d = e ¢ , e = d ¢ , these image correspondences yield ten solutions. If this set is exposed to a small perturbation it should still yield ten real solutions. One such set of data was generated, and all ten solutions were found. For further details see [17].

5.2 Six Point Matches

With six point matches, a unique solution exists in general, and in all our experiments, we were able to find this unique solution.

One set of six points is given in Table 2. The cost function associated with this set of points is shown in Figure 2. As can be seen on Figure 2. (and as obtained by our algorithm), the cost function has four minima. However, when the second criteria ( A h=0 ) is applied, only one of these solutions remains valid.

 

Points in the first image

Points in the second image

3.0326 -1.5163 15.1629

4.1587 -0.5626 15.1833

1.4777 -4.4332 14.7773

3.4366 -3.7915 14.8263

-3.8609 -7.7217 19.3044

-0.5848 -8.4147 19.5448

-1.3282 -1.3282 13.2823

-0.1752 -1.5628 13.5082

-4.7669 6.3559 15.8898

-5.4392 4.8936 16.3847

0 3.7694 18.8471

-0.0124 3.6940 19.0810

Table 2. Coordinates of the six corresponding points on the image plane.

Solution

Rotation Vector

Axis of Translation

|| A h || ´ 10 5

89.1358 11.1697 16.0629

0.0106 -0.1058 0.9943

14.4913

81.7507 10.3269 16.0037

0.3330 0.3345 0.8816

4.4721

80.0000 10.0000 16.0000

0.4082 0.4082 0.8165

0

59.7177 27.6787 17.7157

-0.7516 0.5870 0.3008

191.7811

Table 3 . Motion parameters that minimise the cost function. Only one of them satisfies the condition A h = 0 , and this is a true solution .

Figure 2. Cost function associated to the six point correspondences given in the Table 2. The white arrow point to the solution for which A h = 0 .

 

6. Discussion

The main characteristic of our algorithm is that it employs the fact that the essential matrix corresponding to the five point matches must lie on a known four-dimensional hyperplane. Thus, if the direction of translation is known, the essential matrix (and accordingly rotation and relative depths) can be recovered using a linear algorithm. Consequently, the problem of finding camera orientation is solved through the optimi-sation of the cost function in the space of all possible directions (on the unit sphere).

If this fact were not employed, then even if the direction of translation were known, finding the rotation vector would be still complicated, and not possible in closed form. Therefore, optimisation in a higher dimensional space would have to be employed.

One immediate application of our algorithm is rigidity checking. In [12], rigidity checking was performed using six point matches by non-linear optimisation in an 11-dimensional parameter space (motion parameters and relative depths of the scene points in the reference frame). It seems that our algorithm would produce a much simpler solution to this problem. Namely, it is enough to find minima of the cost function described in Section 3.2 and check if minimum is below some threshold. This can be done using the two dimensional optimisation. Moreover, it can be readily visualised ( e.g . see Fig. 2).

Another potential application is in motion segmentation from a sequence of images taken from a calibrated camera. For motion segmentation the robust estimators such as RANSAC or Least Median of Squares can usually be successfully used [16]. These algorithms require a certain number of p -tuples ( p motion correspondences) to be randomly chosen. In the case of an uncalibrated camera p is usually seven ([16, 11]) or eight [19]. For a calibrated camera, however, this number can be five or six. This is important, as the required number of samples is exponentially related to p .

7. Conclusion

In this paper we have presented a new approach to the problem of determining the motion between two images where the number of matched points is less than eight. The main advantages of our algorithm are simplicity and low computational cost . Not only that, but the formulation of the problem is considerably simpler than previous presentations.

The algorithm was implemented and tested using simulated data. When five point correspondences were given, we usually found four solutions which is in line with observations made in [4, 6]. For those configurations that yield ten decomposable solutions, the algorithm found all solutions. For six and seven point matches, the unique solution was always found.

References

[1] Aggarwal J.K. and Nandhakumar N., "On the computation of Motion from Sequences of Images – A review ", Proceedings of the IEEE , vol. 76, pp. 917-935, 1988.

[2] Bruss A. and Horn B., "Passive navigation", Comp. Vision, Graph. And Image Processing, vol. 21 pp. 3-20, 1983.

[3] Chasles M., "Question No. 296", Nouv. Ann. Math. 14:50, 1855.

[4] Faugeras O. and Maybank S., "Motion from Point Matches: Multiplicity of Solutions", Intl. Journal Computer Vision, vol. 4, pp. 225-246, 1990.

[5] Heeger D. and Jepson A., "Subspace Methods for Recovering Rigid Motion I: Algorithm and Implementation", Intl. Journal Computer Vision, vol. 7, pp. 95-117, 1992.

[6] Horn B. "Closed-form solution of absolute orientation using unit quaternions", J. Opt. Soc. Am. A , vol. 4, pp. 629-642, 1987.

[7] Huang T. and Faugeras O., "Some Properties of the E Matrix in Two-View Motion Estimation", IEEE PAMI, vol. 11, 1310-1312, December 1989.

[8] Kruppa E., "Zur Ermittlung eines Objektes aus zwei Perspektiven mit innerer Orientirung", Sitz.-Ber. Akad. Wiss., Wien, math. Naturw. Kl. Abt. Iia ., 122: 1939-1948, 1913.

[9] Longuet-Higgins, H.C., "A computer algorithm for reconstructing a scene from two projections", Nature 293, pp.133-135, 1981.

[10] Longuet-Higgins, H.C., The reconstruction of a scene from two projections – configurations that defeat the 8-point algorithm", Proc. IEEE 1 st Conf. On Artif. Intell. Applications , pp. 395-397, 1984.

[11] Luong Q.T. et al, "On determining the fundamental matrix: analysis of different methods and experimental results", INRIA TR-1894 , INRIA-Sophia Antipolos.

[12] McReynolds D. and Lowe D., "Rigidity Checking of 3D Point Correspondences Under Perspective Projection", IEEE Trans. PAMI, vol. 18, pp. 1174-1185, 1996.

[13] Netravali A. et. al . "Algebraic Methods in 3-D Motion estimation from Two-View Point Correspondences", Intl. Journal of Imaging Systems and Technology , vol. 1., pp. 78-99, 1989.

[14] Philip J.,"Estimation of three-dimensional motion of rigid objects from noisy observations", IEEE Trans. PAMI, vol. 13, pp. 61-66, 1991.

[15] Szeliski R. and Kang S. B., "Recovering 3D shape and motion from image streams using nonlinear least squares", Journal of Visual Communication and Image Representation , vol. 5, pp.10-28, 1994.

[16] Torr P. and Murray D., "A review of Robust Methods to Estimate the Fundamental Matrix", to appear in Intl. Journal Computer Vision.

[17] Trajkovi } M. and Hedley M. "The eigenspace method for motion recovery from less than eight point correspondences", (manuscript in preparation).

[18] Ullman S., "The interpretation of visual motion", MIT Press , Cambridge, MA, 1979.

[19] Weng J., Huang. T and Ahuja N., "Motion and Structure from Image Sequences", Springer-Verlag, 1993.