Precise calibration of binocular vision system used for vision measurement.

Precise calibration of binocular vision system used for vision measurement Yi Cui,1 Fuqiang Zhou,1,* Yexin Wang,1 Liu Liu,1 and He Gao1,2 1

Key Laboratory of Precision Opto-mechatronics Technology, Ministry of Education, Beihang University, Beijing 100191, China 2 School of information & Electrical Engineering, Shangdong Jianzhu University, Ji’nan 250101, China * [email protected]

Abstract: Binocular vision calibration is of great importance in 3D machine vision measurement. With respect to binocular vision calibration, the nonlinear optimization technique is a crucial step to improve the accuracy. The existing optimization methods mostly aim at minimizing the sum of reprojection errors for two cameras based on respective 2D image pixels coordinate. However, the subsequent measurement process is conducted in 3D coordinate system which is not consistent with the optimization coordinate system. Moreover, the error criterion with respect to optimization and measurement is different. The equal pixel distance error in 2D image plane leads to diverse 3D metric distance error at different position before the camera. To address these issues, we propose a precise calibration method for binocular vision system which is devoted to minimizing the metric distance error between the reconstructed point through optimal triangulation and the ground truth in 3D measurement coordinate system. In addition, the inherent epipolar constraint and constant distance constraint are combined to enhance the optimization process. To evaluate the performance of the proposed method, both simulative and real experiments have been carried out and the results show that the proposed method is reliable and efficient to improve measurement accuracy compared with conventional method. ©2014 Optical Society of America OCIS codes: (150.0150) Machine vision; (150.1488) Calibration; (330.1400) Vision -binocular and stereopsis.

References and links 1.

S. Zhu and Y. Gao, “Noncontact 3-d coordinate measurement of cross-cutting feature points on the surface of a large-scale workpiece based on the machine vision method,” IEEE Trans. Instrum. Meas. 59(7), 1874–1887 (2010). 2. F. Zhou, Y. Wang, B. Peng, and Y. Cui, “A novel way of understanding for calibrating stereo vision sensor constructed by a single camera and mirrors,” Meas. J. Int. Meas. Confed. 46(3), 1147–1160 (2013). 3. Z. Ren, J. Liao, and L. Cai, “Three-dimensional measurement of small mechanical parts under a complicated background based on stereo vision,” Appl. Opt. 49(10), 1789–1801 (2010). 4. J. Wang, X. Wang, F. Liu, Y. Gong, H. Wang, and Z. Qin, “Modeling of binocular stereo vision for remote coordinate measurement and fast calibration,” Opt. Lasers Eng. 54, 269–274 (2014). 5. T. Xue, L. Qu, Z. Cao, and T. Zhang, “Three-dimensional feature parameters measurement of bubbles in gasliquid two-phase flow based on virtual stereo vision,” Flow Meas. Instrum. 27, 29–36 (2012). 6. M. Xie, Z. Wei, G. Zhang, and X. Wei, “A flexible technique for calibrating relative position and orientation of two cameras with no-overlapping FOV,” Meas. J. Int. Meas. Confed. 46(1), 34–44 (2013). 7. M. Machacek, M. Sauter, and T. Rosgen, “Two-step calibration of a stereo camera system for measurements in large volumes,” Meas. Sci. Technol. 14(9), 1631–1639 (2003). 8. Z. Y. Zhang, “A flexible new technique for camera calibration,” IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1330–1334 (2000). 9. J. Heikkila, “Geometric camera calibration using circular control points,” IEEE Trans. Pattern Anal. Mach. Intell. 22(10), 1066–1077 (2000). 10. R. Y. Tsai, “A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off the-shelf TV cameras and lenses,” IEEE J. Robot. Autom. 3(4), 323–344 (1987). 11. R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2nd ed. (Cambridge University, 2003).

#204987 - $15.00 USD (C) 2014 OSA

Received 27 Jan 2014; revised 18 Mar 2014; accepted 1 Apr 2014; published 8 Apr 2014 21 April 2014 | Vol. 22, No. 8 | DOI:10.1364/OE.22.009134 | OPTICS EXPRESS 9134

12. T. Xue, B. Wu, J. G. Zhu, and S. H. Ye, “Complete calibration of a structure-uniform stereovision sensor with free-position planar pattern,” Sens. Actuators A Phys. 135(1), 185–191 (2007). 13. P. F. Luo and J. Wu, “Easy calibration technique for stereo vision using a circle grid,” Opt. Eng. 47(3), 033607 (2008). 14. J. Sun, Q. Liu, Z. Liu, and G. Zhang, “A calibration method for stereo vision sensor with large FOV based on 1D targets,” Opt. Lasers Eng. 49(11), 1245–1250 (2011). 15. Y. Zhao, X. Li, and W. Li, “Binocular vision system calibration based on a one-dimensional target,” Appl. Opt. 51(16), 3338–3345 (2012). 16. H. Habe and Y. Nakamura, “Appearance-based parameter optimization for accurate stereo camera calibration,” Mach. Vis. Appl. 23(2), 313–325 (2012). 17. Y. Furukawa and J. Ponce, “Accurate camera calibration from multi-view stereo and bundle adjustment,” Int. J. Comput. Vis. 84(3), 257–268 (2009). 18. T. Dang, C. Hoffmann, and C. Stiller, “Continuous stereo self-calibration by camera parameter tracking,” IEEE Trans. Image Process. 18(7), 1536–1550 (2009). 19. F. Dornaika, “Self-calibration of a stereo rig using monocular epipolar geometries,” Pattern Recognit. 40(10), 2716–2729 (2007). 20. F. Zhou, Y. Cui, B. Peng, and Y. Wang, “A novel optimization method of camera parameters used for vision measurement,” Opt. Laser Technol. 44(6), 1840–1849 (2012). 21. R. I. Hartley and P. Sturm, “Triangulation,” Comput. Vis. Image Underst. 68(2), 146–157 (1997). 22. D. C. Brown, “Decentering distortion of lenses,” Photogramm. Eng. Remote Sensing 32, 444–462 (1966). 23. D. C. Brown, “Close-range camera calibration,” Photogramm. Eng. Remote Sensing 37, 855–866 (1971). 24. Z. Zhang, “Determining the epipolar geometry and its uncertainty: A review,” Int. J. Comput. Vis. 27(2), 161– 195 (1998). 25. W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C: The Art of Scientific Computing, 2nd ed. (Cambridge University, 1992). 26. J. Heikkila and O. Silven, “A four-step camera calibration procedure with implicit image correction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Institute of Electrical and Electronics Engineers, San Juan, PR, USA, 1997), pp. 1106–1112. 27. H. Longuet-Higgins, “A computer algorithm for reconstructing a scene from two projections,” Nature 293(5828), 133–135 (1981). 28. Y. Wan, Z. Miao, and Z. Tang, “Robust and accurate fundamental matrix estimation with propagated matches,” Opt. Eng. 49(10), 107002 (2010). 29. G. Bradski and A. Kaehler, Learning OpenCV: Computer Vision With the OpenCV Library (O'Reilly, 2008). 30. (http://opencv.org/). 31. F. Zhou, Y. Cui, Y. Wang, L. Liu, and H. Gao, “Accurate and robust estimation of camera parameters using RANSAC,” Opt. Lasers Eng. 51(3), 197–212 (2013). 32. F. Zhou, Y. Cui, H. Gao, and Y. Wang, “Line-based camera calibration with lens distortion correction from a single image,” Opt. Lasers Eng. 51(12), 1332–1343 (2013).

1. Introduction Binocular vision technology has been widely used in the field of vision measurement such as precision measurement of advanced manufacturing [1–3], remote coordinate measurement [4], 3D measurement of bubbles in gas-liquid two-phase flow [5], rail gauge measurement [6], and so on. Before any measurement can be conducted, the a priori unknown parameters of the camera model, describing the geometric relation between the scene and the camera images, have to be determined in some way [7]. If the stereo correspondence between two image points has been established, then the calibrated camera parameters can be applied to reconstruct the 3D space coordinate by triangulation. Since it is a significant factor to measurement accuracy, the accurate calibration of the binocular vision system plays an import role in the 3D metrological tasks mentioned above. Generally, the calibration process of the binocular vision system mainly includes two parts: one is the single camera calibration that has been widely discussed [8–10], the other part is to determine the structure parameters which describe the spatial relationship between the two cameras. Since true Euclidean reconstruction (including determination of overall scale) is not possible without extraneous information, a target with points of accurate known positions in the scene is needed during the photogrammetric calibration used for vision measurement. In this case, the metric properties, such as angels between lines and ratios of lengths, can be measured on the reconstruction and have their veridical values [11]. Nowadays, there exist several methods to solve the stereo vision calibration problem using geometrical targets, and the camera parameters obtained by initial estimation are usually refined through iteration with the constraint of minimizing an objective function. For

#204987 - $15.00 USD (C) 2014 OSA


example, Xue et al. [12] proposed a flexible approach to estimate all the primitive parameters of the stereo vision sensor with 2D free-position planar pattern. After the initial estimation, the structure parameters can be optimized with the feature point constraint algorithm. In fact, the so-called feature point constraint in this reference is another form of famous epipolar geometry relationship. Luo et al. [13] presented an easy calibration technique for stereo vision using a circle grid. A nonlinear optimization procedure was also implemented to obtain all camera parameters in such a way that the computed positions of the image points were near their actual positions. Machacek et al. [7] calibrated the structure parameters by utilizing a stick with two LEDs to provide the corresponding point pairs and a reference length. The scale ambiguity of translation vector between two cameras was found by the minimization of the error in the distance of the two marker points of the calibration bar. Similar to Machacek’s work, Sun et al. [14] presented a calibration technique for stereo vision based on two types of 1D targets: two 1D targets that had multiple features were used to calibrate intrinsic parameters, and another 1D target with only two feature points was used to compute the structure parameters. Also, the epipolar constraint and distance constraint were combined to refine the results of structure parameters. Recently, Zhao et al. [15] proposed an adjustment method for binocular vision measurement to calibrate all the parameters based on a 1D target with two known points in the field of view. The obtained distance between the two feature points and the actual value was compared as an error criterion for parameters optimization. Apart from the conventional point based methods of calibration only take into consideration limited numbers of feature points, a method proposed by Habe [16] takes into account all points (pixels) observed in the images by comparing the intensity values at each pixel. Although this is the first appearance-based optimization method for stereo vision calibration, the objective functions for optimization were all constructed on image plane to compare the intensity values of two images. The listed literatures above are devoted to solving the stereo vision calibration problem mainly used in the field of vision measurement, so the primary consideration is the accuracy of the acquired parameters, which is more important than other solutions of self-calibration [17–19] used in robot navigation, video tracking, etc. One of the crucial steps to improve the calibration accuracy is parameters optimization procedure, which is usually carried out through the application of the Levenberg–Marquardt or Newton–Gauss iterative algorithm. As described previously, the current optimization methods for binocular calibration are almost carried out based on 2D image plane, and the most frequently adopted approach is to refine the estimated parameters by minimizing the sum of squared reprojection errors in image coordinate system. However, the practical vision measurement is implemented in 3D metric coordinate system. Thus, the different reference coordinates between the optimization and measurement may lead to accuracy decrease. In our previous work [20], we have proposed a novel optimization method of single camera parameters by minimizing the metric distance between actual and computed object points in camera coordinate system. The drawback of this method is that the computed object point is calculated by the intersection of target plane and the corresponding ray, and in fact the result is an approximation to the reconstructed one. Besides, the method is only suitable for single camera calibration. In this paper, we intend to transfer this idea to binocular vision system to improve the measurement accuracy. Due to the existence of two views of the same scene, we could find the computed point in space given its position in two images taken with cameras with known calibration and pose, the process of which is commonly called triangulation [21]. Hence, the objective function of all the binocular system parameters is constructed by minimizing the metric distance error between the reconstructed point and the real point in 3D measurement coordinate system. In this way, the error criterion during the optimization process is in accordance with vision measurement process. In addition, we have considered the epipolar constraint that represents the location of one point referred to its corresponding point in the other image. Another distance constraint with respect to the adjacent points in 3D space is also taken into account since the real 3D reconstruction of each point is possible. Since the epipolar constraint is a native characteristic to the binocular vision system and the distance

#204987 - $15.00 USD (C) 2014 OSA


constraint is a standard length reference to the reconstructed results, the adding of the two constraints would be consistent to the 3D error function and therefore would be advantageous to the iterative convergence. In a word, the method proposed in this paper allows combining different geometric constraints (i.e., 3D error constraint, epipolar constraint, distance constraint) in a common framework to update the calibration parameters through a non-linear optimization process. To the best of our knowledge, the 3D error cost function established in the measurement coordinate system is first put forward in the binocular vision system, which is different from the conventional 2D reprojection error criterion based on image plane. Theoretically, it would make the camera parameters optimization consistent with the final measurement process, and the refined parameters would be more accurate. By comparison with the traditional method, our experiment will verify the effectiveness of the proposed method. The rest of the paper is organized as follows. Section 2 gives some preliminaries about the binocular vision model. The detailed procedure of the calibration method based on 3D optimization with multiple constraints is described in Section 3. Section 4 provides two kinds of accuracy evaluation functions to assess calibration results. In Section 5, both computer simulative and real data are used to validate the proposed method compared with traditional one. The paper ends with some concluding remarks in Section 6. 2. Preliminaries 2.1. Camera pin-hole model with lens distortion A camera is modeled by the usual pin-hole and the relationship between a 3D point M and its image projections in the left and right camera are denoted as ml and mr:

λl m l = Al ( Rl tl ) M ,

 f xl Al =  0  0

 f xr  λr m r = Ar ( Rr t r ) M , Ar =  0  0

0 f yl 0 0 f yr 0

u0 l  v0l  1 

(1)

u0 r  v0 r  1 

(2)

 are the homogeneous coordinates Where λl and λr are arbitrary scale factors, m l , m r and M of image points and their corresponding space point, (Rl,tl) and (Rr,tr) called the extrinsic parameters, are the rotation and translation which relate the local world coordinate system to the camera coordinate system, and Al, Ar are the camera intrinsic matrixes, consisting of the following parameters: (fxl, fyl) and (fxr, fyr) the effective focal length, (u0l,v0l) and (u0r,v0r) the coordinates of the principal point. Considering the first two terms of radial distortion, we choose the most commonly used model to handle lens distortion effects [22, 23]:

mld = 1 + k1l rl 2 + k2l rl 4  ml

(3)

mrd = 1 + k1r rr 2 + k2 r rr 4  mr

(4)

Where rl , rr are the distances from undistorted image point ml , mr to each principal point, (k1l,k2l) and (k1r,k2r) are the coefficients of the radial distortion, mld , mrd denote the coordinates of distorted image points. Equations (1)–(4) completely describe the real perspective projection model of the two cameras including lens distortion effects.

#204987 - $15.00 USD (C) 2014 OSA


2.2. Epipolar geometry Two perspective images of a single rigid object/scene are related by the so-called epipolar geometry, which can be described by a 3 × 3 singular matrix. It contains all geometric information that is necessary for establishing correspondences between two images, from which three-dimensional structure of the perceived scene can be inferred [24]. In a stereo vision system where the camera geometry is calibrated, it is possible to calculate such a matrix from the camera perspective projection matrixes through calibration. Consider the case of two cameras as shown in Fig. 1. Let ocl and ocr be the optical centers of the left and right cameras, respectively. Given a point ml in the left image, its corresponding point in the right image mr is constrained to lie on a line called the epipolar line of ml , denoted by lr. Moreover, one observes that all epipolar lines of the points in the left image pass through a common point el, which is called the epipole. Epipole el is the intersection of the line ocl ocr with the image plane π ul . Algebraically, the following equation (epipolar constraint) must be satisfied for the corresponding image points: with F = Ar−T [t ]× RAl−1

m r T Fm l = 0

(5)

Where R and t are the structure parameters (rotation and translation) which bring points expressed in the left camera coordinate system to the right one, and [t ]× is the antisymmetric matrix defined by t such that [t ]× x = t × x for all 3D vector x. The 3 × 3 matrix F is called the fundamental matrix, which is independent of scene structure and only depends on the cameras' internal parameters and relative pose [11]. The calculation of the structure parameters can be solved from the calibrated extrinsic parameters of each camera: R = Rr Rl−1 , t = t r − Rr Rl−1tl

(6)

M

π ul ll

π ur lr ml

mr

el

er

ο cl

ο cr

R, t

Fig. 1. The epipolar geometry.

2.3. Triangulation After we calibrate all the parameters and establish the stereo correspondence between ml and mr, the 3D space coordinate M can be solved according to Eq. (7):   λl m l = Al ( I 0 ) M   λr m r = Ar ( R t ) M

(7)

Where I is the 3 × 3 identity matrix and Eq. (7) is derived from Eqs. (1) and (2) since we assume the left camera coordinate system as the measurement coordinate system in this paper. Geometrically, the triangulation problem is to find the intersection of the two rays corresponding to the two image points in space, which has been represented algebraically as Eq. (7). However, naive triangulation by back-projecting rays from the measured image points will fail, because the rays will not intersect in general [11]. It is thus necessary to

#204987 - $15.00 USD (C) 2014 OSA


estimate a best solution for the point in 3D space, and the best solution requires the definition and minimization of a suitable cost function that will describe in the next paragraph. For each given point correspondence ml ↔ mr and a fundamental matrix F, we will seek the corrected correspondences ml′ ↔ mr′ that minimize the geometric error: d (ml , ml′) 2 + d (mr , mr′ ) 2

(8)

(where d(*,*) is the geometric distance between two points) subject to the epipolar constraint m l′T Fm r′ = 0 . Here we adopt the optimal method that finds the global minimum of the cost function using a non-iterative algorithm proposed by Hartley and Strum [21] and the minimization problem reduces to finding the real roots of a polynomial of degree 6 [11]. Assuming a Gaussian error distribution to the image points, the points ml′ and ml′ are the most likely values for true image point correspondences. Once ml′ and ml′ are found, the point M may be found by any triangulation method, since the corresponding rays will meet precisely in space [21]. 3. Binocular vision calibration

In this section, the procedure of the precise binocular vision calibration method is described in detail. The whole process can roughly be divided into two steps. At the first stage, the single camera calibration is conducted by means of the well-known Zhang’s method [8] to obtain the intrinsic parameters, extrinsic parameters and distortion coefficients. After that, the structure parameters which represent the relative position and rotation of the two cameras are determined by a novel optimization method. The second step is the innovative part of this paper because the 3D optimization method combined with multiple constrains for binocular vision system is first presented to the best of our knowledge. 3.1 Single camera calibration Firstly, moving target freely in the measurement range of the camera, we can obtain multiple views of planar patterns at different orientations. The real 2D image coordinate of feature point is detected to localize the position of the chessboard corner at sub-pixel accuracy from digital images. Then the correspondence of the detected 2D image point and the 3D object point is established and we can use the famous Zhang’s method to calibrate left camera and right camera, respectively. It is worth noting that we must make sure the left camera and right camera photograph the complete target simultaneously, so these captured image pairs can be used to estimate structure parameters in the subsequent step. Note that all the views of the planar target are acquired by each camera in different positions, thus each view has different extrinsic parameters, but common intrinsic parameters and structure parameters. The more detailed algorithm is described in [8]. 3.2 Structure parameters calibration based on 3D optimization The intrinsic parameters Al and Ar, extrinsic parameters of each view Rl and Rr, distortion coefficients Dl (k1l and k2l) and Dr (k1r and k2r) can be calculated from Section 3.1. After that, the initial estimation of the structure parameters R and t is achieved according to Eq. (6) from one of the corresponding image pairs. Note that the rotation matrix R is parameterized by a vector of 3 parameters to make sure of its orthogonality. Since it is probably that the estimation of the extrinsic parameters from Section 3.1 is not accurate enough, the initial calculation of the rotation matrix R and translation vector t differs one by one. In order to give a better initial estimation, we could take the median values of the {Ri} and {ti} (the subscript i denotes the sequence number of the image pairs ranging from 1 to N) instead of a random value as the initial approximation of the structure parameters. Since no iterations are required, the initial estimation of the structure parameters described previously is computationally fast. However, the rotation matrix R and the translation vector

#204987 - $15.00 USD (C) 2014 OSA


t, which relate the left camera to the right camera geometrically, are not equal if the image pair used for calculation is different whereas they should be constant since the relative rotation and position between two cameras are always fixed. That is because the intermediate result does not satisfy the geometrical constraints and the accuracy of the close-form solution is relatively poor. Therefore, it is necessary to refine the initial parameters through maximumlikelihood estimation and simultaneous estimation of the parameters involves applying an iterative algorithm such as the Levenberg-Marquardt algorithm which has been shown to provide the fastest convergence [25]. Typically, parameter optimization methods are carried out based on 2D image plane, with the aim of minimizing the pixel distance between the calculated image point and the real one. With respect to binocular vision system, the general method is to establish the objective function which is a simple sum of reprojection errors of both cameras. The function is presented in the following form: N

L

 m i =1 j =1

d l ,i , j

N

L

− mˆ ld,i , j ( Al , Dl , Rl ,i , tl ,i ) +  mrd,i , j − mˆ rd,i , j ( Ar , Dr , Rl ,i , tl ,i , R, t ) (9) 2

2

i =1 j =1

Where the subscript i denotes the sequence number of images, and the subscript j denotes the sequence number of points in each image. Note that the extrinsic parameters of the right camera could be calculated from the following equation which is transformed from Eq. (6):

Rr = RRl , t r = Rr Rl−1tl + t

(10)

In this case, the structure parameters keep invariant for all views during the optimization procedure, which is in accordance with the practical situation. As described previously, the conventional 2D optimization method usually addresses the issue of minimizing the pixel distance on image plane. Nevertheless, the vision measurement process requires back-projecting the extracted image point to 3D space, since the final goal is to determine the accurate 3D coordinate of the measured object. Even though the imaging distance errors are equal, the spatial back-projection distance errors are not identical at different distance before the camera [20]. Therefore, to satisfy the special request of vision measurement system for binocular vision calibration, we propose a novel 3D optimization strategy based on measurement coordinate system, i.e., left camera coordinate system in this article. The detailed explanation is described in the following. At first, according to Eqs. (3) and (4), the detected image points are corrected by means of radial distortion coefficients. Note that calculating the corrected point from distorted one is a nonlinear problem because Eqs. (3) and (4) indicates a forward projection model that is not able to be utilized directly. We can overcome this difficulty by virtue of recursion which is provided by Heikkila and Silven [9, 26] to obtain the undistorted image points: Dl ( k1l , k2 l ) Dr ( k1 r , k2 r ) mld ⎯⎯⎯⎯ → ml , mrd ⎯⎯⎯⎯→ mr (11) Subsequently, since the intrinsic parameter matrixes are already known, the undistorted image points are transformed to normalized image plane:

m nl = Al−1m l ,

m nr = Ar−1m r

(12)

It is indicated that the application of normalized image coordinated systems onl -xnl ynl and onr -xnr ynr is equivalent to assuming the focal length of the camera is one unit. This transformation would eliminate the influence caused by the intrinsic parameters in order to simplify the following formula derivation. Combining Eq. (5) and Eq. (12), the epipolar constraint relationship is expressed as: m nr T Em nl = 0

#204987 - $15.00 USD (C) 2014 OSA

with E = [t ]× R

(13)


Where the matrix E is called as essential matrix which is first introduced to the computer vision community by Longuet-Higgens [27]. It is the specialization of the fundamental matrix F to the case of normalized image coordinates. Reversely, the fundamental matrix F may be thought of as the generalization of the essential matrix in which the assumption of calibrated cameras is removed. After that, the normalized image points mnl and mnr may be thought of as the image of the points with respect to a camera having the identity matrix I as camera intrinsic matrix. Since the effect of the known camera intrinsic matrix having been removed, the form of Eq. (7) is changed as:   λl m nl = ( I 0 ) M   λr m nr = ( R t ) M

(14)

According to Eq. (14), the 3D coordinate of the object point under the measurement coordinate system, i.e. left camera coordinate system in this paper, could be calculated ˆ . through the optimal triangulation method [21]. We denote the calculated object point as M It is worth noting that we have used the normalized image points instead of the real image points for triangulation, thus the cost function for seeking the corrected correspondences mnl′ ↔ mnr′ mentioned in Section 2.3 becomes as follows: d (mnl , mnl′ ) 2 + d (mnr , mnr′ ) 2

subject to

m nr′ T Em nl′ = 0

(15)

ˆ could be obtained from the corrected matches The final calculated object point M mnl′ ↔ mnr′ . On the other hand, the coordinate of object point under the measurement coordinate system ocl -xcl ycl zcl , denoted by M, is given by:

 =  Rl M 0T 

tl   MW 1 

(16)

Where M w is the known feature point in the local world coordinate system ow -xw yw zw for  and M  are homogeneous coordinates of M and M , respectively. each view, M w w ˆ , for there exists Note that it is quite complicated to identify the calculated object point M several transformations and all calibration parameters are involved. In addition, employing image process skill to extract observed points from images brings into extra errors absolutely. ˆ to the hypothetical ideal point is For these reasons, the deviation of the calculated point M relatively large. On the other side, the calculation of M only involves with a simple rigid transformation, and the original feature point M w in ow -xw yw zw is precisely known, whose accuracy depends primarily on the calibration target. As a result, we assume M as the true value and establish a novel objective function which minimizes the metric distance between ˆ under measurement coordinate system. Supplied with an appropriate starting M and M point, exact calibration parameters are computed by virtue of nonlinear optimization algorithm. To sum up, the proposed 3D optimization procedure for binocular vision system, as illustrated in Fig. 2, is mainly composed of six steps: (1) Calculate the undistorted image points ml and mr from the detected counterpart mld and mrd according to distortion coefficients Dl , Dr and Eqs. (3), (4), and (11).

#204987 - $15.00 USD (C) 2014 OSA


(2) Transform the corrected image point ml and mr to normalize image plane by applying the value of intrinsic parameters Al , Ar and Eq. (12). (3) Calculate the intersection of the rays ocl mnl and ocr mnr under the left camera coordinate system by applying the value of structure parameters R , t and Eq. (14). The optimal triangulation method [21] is adopted to compute the intersection of two ˆ . rays, represented as M (4) Transform the know feature point M w from local world coordinate system to left camera coordinated system by applying the value of extrinsic parameters Rl , tl and Eq. (16), denoted by M . ˆ with the assumed true value M one by one (5) Compare the calculated object point M and construct the objective function which minimizes the difference between the two sets of feature points under 3D measurement coordinate system. The function is rendered in the following form: N

L

ˆ ( A , A , D , D , R, t ) J 3 D =  M ( Rl ,i , tl ,i ) − M l r l r

2

(17)

i =1 j =1

The meaning of subscript i and j is the same as what described in Eq. (9).

zw

yw

ow

Rl , t l

xw

zcl oul

yul

π ul

xul

M

5

mld Dl 1

onl

ml

xnl

Al

ynl

ocl

xcl mnl

2

Mˆ

3 R, t

4 Rl , t l

zcr Mw

our

ˆ min M - M

mrd 1 Dr

xur

Rr , t r

π ur

yur

mr

onr ynr

2 Ar mnr

ycl

xnr

xcr

ocr ycr

R, t Fig. 2. 3D optimization steps for binocular vision calibration.

It should be emphasized that the ration matrix R and translation vector t always keep constant for different views under the constraint J 3 D and the implicit extrinsic parameters of right camera could be calculated from Eq. (10), which is redundancy and not need to be optimized. Thus, the cost function is more close to the practical situation of the binocular vision system with fixed space structure. Moreover, in order to accelerate the convergence of the nonlinear optimization, we also consider the epipolar constraint J e and distance constraint J dis :

#204987 - $15.00 USD (C) 2014 OSA


N

L

J e =  (d (m l ,i , j , ll ,i , j ) 2 + d (m r ,i , j , lr ,i , j ) 2 )

(18)

i =1 j =1

ˆ ,M ˆ ) J dis =  Ldis − d ( M p q p

2

(19)

q

Where lr = Fm l and ll = F T m r are the corresponding epipolar lines, Ldis is the constant ˆ and M ˆ are the adjacent feature points minimum interval of the chessboard target, M p

p

calculated by triangulation in both horizontal and vertical directions on the chessboard target. We want to indicate that the adding of the epipolar constraint and distance constraint is not conflict with the 3D error constraint since J e is an inherent property belongs to any binocular vision system and J dis is an important criterion to test the length measurement result. Joining the geometrical constraints J 3D , J e and J dis together, the following complete objective function arises: J = J 3D + J e + J dis

(20)

Supplied with an appropriate initial guess to Eq. (20), the calibration parameters can be updated iteratively by employing Levenberg-Marquardt algorithm. 4. Accuracy evaluation It is necessary to define suitable error criteria for evaluating accuracy in terms of camera parameters, so we can immediately know whether the corresponding calibration is well done. Since the so-called the reprojection error is based on the calibration of a single camera and does not give any information about the accuracy of a multiple-camera set-up [7], an error criterion for a multiple-camera set-up can be defined as the averaged Euclidean distance ˆ : between the given global coordinates M and the reconstructed global coordinate M Ept =

1 n  n i =1

ˆ Mi − M i

2

(21)

Where n is the total number of feature points. Another criterion for a two-camera set-up is based on the definition of the fundamental matrix, represented as an indicator of the projective reconstruction quality for a two-camera setup: EF =

1 n d (m l ,i , F T m r ,i ) 2 + d (m r ,i , Fm l ,i ) 2  2n i =1

(22)

The average epipolar distance is also used to evaluate accuracy of fundamental matrix [11, 28]. 5. Experimental results In this section, the proposed calibration method for binocular vision system based on 3D optimization has been tested on both computer simulated data and real data. A comparison with the conventional 2D optimization method is also carried out. Results are shown by comparing the computed camera parameter values and calibration accuracy or measurement accuracy in a comprehensive way.

5.1 Computer simulation In this experiment, the simulative image is not synthesized, rather only the point correspondence is simulated. It is reasonable to adopt this approach due to the aim is to assess

#204987 - $15.00 USD (C) 2014 OSA


the performance of the calibration technique, rather than that of the feature extraction algorithm which is used to localize the distorted image points. Either of the simulated camera’s image size is 800 × 600pixels with the principal point at (u0, v0) = (400,300)pixel. The skew factor is set to zero. The effective focal length along the x and y direction in pixels is fx = 800pixel and fy = 800pixel, respectively. A second-order radial distortion is simulated with the coefficients k1 = −0.1mm−2, k2 = 0.08mm−4 and the distortion center is the same as the principal point for simplification. The rotation vector r = ( rx , ry , rz ) and translation T

vector t = ( t x , t y , t z ) between the camera coordinate frames of the two cameras is fixed as T

r = ( 0.01, 0.005, −0.003) and t = ( −80, 0, 0 ) , respectively. The model plane is a checkerboard target with 54 corners (9 × 6) uniformly distributed and the minimum point interval is 30mm.The images are taken from 8 different orientations in front of the virtual cameras. All the images are captured randomly in the range of the following position parameters: object-to-camera distance is 150~400mm and the angle between the target plane and the image plane is 0  60 . Gaussian noise of mean 0 and standard deviations ranged from 0pixel to 1pixel is added to the point’s coordinates of the images. For each noise level, we perform 25 independent trials, and the average results are shown in Figs. 3 and 4. The following contents will give a quantitative analysis of how image noises affect the accuracy of the proposed calibration method compared with the traditional 2D optimization method [9, 26] whose objective function is presented in Eq. (9). Structure parameters with respect to noise level: Fig. 3 shows the influence of noise on structure parameters of the binocular vision system at different states. The initial solution is firstly obtained as the process described in Section 3.2. Then both the conventional 2D optimization method based on reprojection error and the presented 3D optimization method with multiple constraints are used to refine the initial estimation. As we can see, the relative errors of the rotation vector r and the absolute errors of the translation vector t are reduced greatly after optimization and they increase almost linearly with the noise level in 2D or 3D optimization process. It is noticeable that the 3D optimization method with multiple constraints proposed in this article produces better result in most cases than conventional 2D optimization method, since the calibrated parameters are more close to the ground truth. Even though occasionally the 2D optimization method may achieve slightly better results than 3D optimization at high noise level (e.g. Figure 4(d) at the noise 0.8pixel), this case happens rarely because the error of locating feature points in image for real vision measurement system is usually less than 0.2 pixel. Consequently, from a practical and statistical point of view, the structure parameters acquired by the presented method are more precise than the conventional results under the circumstances of image noises. Calibration accuracy with respect to noise level: Fig. 4 displays the effect of the image noise level on the calibration accuracy at different states. The 3D reconstructed point error Ept and the 2D epipolar distance error EF described in Section 4 are employed to evaluate the accuracy of all the calibrated parameters. By comparison, we can see that the calibration accuracy obtained by 3D optimization precede that of 2D optimization when the noise level is below 1 pixel. These experimental results can be ascribed to the presented objective function which is devoted to minimize the discrepancy of the 3D reconstructed point and its true value. Additionally, the epipolar constraint is also considered to reduce the geometrical distance from the image point to its corresponding epipolar line. Therefore, the 3D optimization method with multiple constraints may improve the calibration accuracy compared with the conventional method. T

#204987 - $15.00 USD (C) 2014 OSA

T


Fig. 3. Effects of pixel coordinates noise on structure parameters using different calibration techniques (initial solution, 2D optimizaion, 3D optimizaion): (a) rx ;(b) ry;(c) rz;(d) tx;(e) ty;(f) tz.

Fig. 4. Effects of pixel coordinates noise on calibration accuracy using different calibration techniques (initial solution, 2D optimizaion, 3D optimizaion): (a) Ept ;(b) EF.

5.2 Real data For the experiment with real data, a set of standard image pairs provided by the famous OpenCV library [29, 30] are utilized to test our method. The calibration target is a planar chessboard pattern with 9 × 6 corner points evenly distributed and the distance between the adjacent points is 30 mm in the horizontal and the vertical directions. The image pairs are taken from 13 different orientations by the left and right camera simultaneously. 8 of them are taken as training data for calibration and the other 5 are testing data. The image resolution of the given camera is 640 × 480pixels. One of image pairs used for calibration is shown in Fig. 5.

Fig. 5. A sample of image pairs used for calibration [30]: (a) left image; (b) right image.

Tables 1 and 2 show the comparative result of intrinsic parameters and distortion coefficients through different techniques. It is clearly that different calibration techniques #204987 - $15.00 USD (C) 2014 OSA


have little influence on these parameters since they depend on the properties of single camera’s imaging sensor or optical lens, and can be precisely determined by Zhang’s method. Table 3 shows the comparative result of structure parameters and calibration accuracy. Note that differences among the structure parameters obtained from diverse techniques are more obvious than other parameters in Tables 1 and 2. It indicates that the structure parameters play more significant role in the nonlinear optimization process and are the primary impact to the calibration results. According to Eqs. (21) and (22), the calibration accuracy is calculated for training data. We can see that the values of both evaluation functions are significantly reduced after optimization. Moreover, the average calibration accuracy calculated by 2D optimization is Ept = 0.462mm and EF = 0.0858pixel while the same accuracy achieved by 3D optimization is Ept = 0.364mm and EF = 0.0795pixel, respectively. As show in Figs. 6 and 7, the distribution about the 3D calibration error and the epipolar distance error are displayed in a statistical form. As expected, the gross error distribution and distance error statistics acquired by the proposed 3D optimization method turns to produce a more satisfactory outcome. In other words, through the proposed method, more reconstructed points manifest a small deviation to the corresponding 3D ground truth and more image feature points are close to the corresponding epipolar lines. This phenomenon indicates a more accurate estimation for camera parameters of the binocular vision system. Table 1. Comparative result of intrinsic parameters and distortion coefficients for left camera Calibration technique

fxl(pixel)

fyl(pixel)

u0l(pixel)

v0l(pixel)

k1l(mm−2)

k2l(mm−4)

Initial solution

533.58

534.19

341.24

235.40

−0.296

0.124

2D optimization

533.68

534.01

343.99

233.55

−0.294

0.116

3D optimization

534.39

533.99

341.99

233.46

−0.298

0.123

Table 2. Comparative result of intrinsic parameters and distortion coefficients for right camera Calibration technique

fxr(pixel)

fyr(pixel)

u0r(pixel)

v0r(pixel)

k1r(mm−2)

k2r(mm−4)

Initial solution

536.54

535.98

327.47

249.71

−0.291

0.106

2D optimization

535.82

535.02

327.92

251.02

−0.291

0.105

3D optimization

537.50

536.54

327.17

252.71

−0.290

0.105

Table 3. Comparative result of structure parameters and calibration accuracy Calibration technique Initial solution

rx 5.67E-3

ry 2.89E-3

rz −2.97E-3

tx(mm) −100.27

ty(mm) 1.059

tz(mm)

Ept(mm)

−0.998

0.541

0.234 0.0858 0.0795

2D optimization

1.18E-2

5.55E-3

−3.58E-3

−99.597

1.154

−1.253

0.462

3D optimization

1.53E-2

4.21E-3

−3.14E-3

−100.00

1.170

−0.963

0.364

#204987 - $15.00 USD (C) 2014 OSA

EF(pixel)


Fig. 6. 3D calibration error distribution and distance error statistics for training data using different technique: initial solution, 2D optimization and 3D optimization. (a)-(c): the 3D error distribution for all the points; (d)-(f): the 3D distance error statistics for all the points. The computation of 3D calibration error can be referred to Eq. (21).

Fig. 7. Epipolar distance error statistics for training data using different technique: (a) initial solution; (b) 2D optimization; (c) 3D optimization. The computation of epipolar distance error can be referred to Eq. (22).

To further investigate the validity of the proposed 3D optimization method compared with the traditional 2D optimization method, five testing images are employed to assess the precision of calibrated parameters. Since the testing images are not involved in the process of calibration, the evaluation results computed from testing data would be more persuasive than training data [31, 32]. As shown in Table 4, the comparative results of measurement accuracy evaluated by testing data are listed. The average errors of Ept and EF achieved by the two methods are displayed in Fig. 8, respectively. Figure 9 shows the 3D calibration error distribution and distance error statistics for testing data while the epipolar distance error statistics are represented in Fig. 10. As illustrated by Table 4 and Figs. 8–10, the measurement accuracy obtained from the proposed 3D optimization method is advantageous to the conventional method in terms of either the average accuracy level or error distribution. We can recognize that the consequences are consistent with simulative results or theoretical analysis and the merits of the 3D optimization for binocular vision measurement have been validated. In addition, in order to give a concrete display for the proposed method, the epipolar lines of the corresponding feature points and all the reconstructed 3D points are shown in Figs. 11 and 12, respectively.

#204987 - $15.00 USD (C) 2014 OSA


Table 4. Comparative results of measurement accuracy Calibration technique 2D Optimization 3D Optimization

Ept(mm) EF(pixel) Ept(mm) EF(pixel)

Image1

Image2

Image3

Image4

Image5

Average

0.730 0.0696 0.698 0.0318

0.400 0.0483 0.383 0.0392

0.380 0.0517 0.368 0.0433

0.405 0.0850 0.390 0.0830

0.437 0.147 0.317 0.133

0.470 0.0803 0.431 0.0661

Standard error 0.0655 0.0179 0.0679 0.0189

Fig. 8. Comparative results of average measurement accuracy for testing data: (a) Ept; (b) EF.

Fig. 9. 3D calibration error distribution and distance error statistics for testing data using different techniques: 2D optimization and 3D optimization. (a)-(b): the 3D error distribution for all the points; (c)-(d): the 3D distance error statistics for all the points. The computation of 3D calibration error can be referred to Eq. (21).

Fig. 10. Epipolar distance error statistics for testing data using different techniques: (a) 2D optimization; (b) 3D optimization. The computation of epipolar distance error can be referred to Eq. (22).

#204987 - $15.00 USD (C) 2014 OSA


Fig. 11. The feature points’ epipolar lines computed from the proposed method (depicted in the corrected image pair).

Fig. 12. The reconstruction of all the feature points of testing data using the proposed method.

6. Conclusion In this paper, we propose a precise calibration method for binocular vision system used for vision measurement. Up to now, the conventional parameter optimization methods of stereo vision system are carried out mostly based on 2D image plane, leading to accuracy decrease since the vision measurement is conducted in 3D space based on metric unit. What’s more, the objective function in the optimization process is not the same as the measurement error criterion. To overcome these shortcomings, a novel 3D optimization method that is carried out in the measurement coordinate system is firstly presented for binocular vision system, with the aim of establishing an objective function which minimizes metric distance between the reconstructed point and the real point in 3D space. In this case, the procedure of parameters optimization in calibration and the subsequent measurement process are consistent. Additionally, both the inherent epipolar constraint and the constant distance constraint are combined simultaneously to improve the speed and accuracy of the nonlinear optimization. To validate the effectiveness of the proposed method, we have carried out experiments on synthetic and real data compared with the conventional method. Results from the computer simulation and real data demonstrate the advantage of this approach. Finally, we want to point out that the proposed 3D optimization method takes more steps than conventional 2D optimization, thus the computation time of the proposed method is indeed a little more than the conventional one. However, most binocular calibration process in vision measurement is carried out off-line and the current computer’s speed is high enough for most cases. So we consider the computation time is not a major factor to take into account and the accuracy of the calibration is more important than it in terms of vision measurement. But to reduce the computational cost of the proposed method is one of the future works we will focus on. Acknowledgments This work was supported by the National Natural Science Foundation of China (No. 61372177) and the Innovation Foundation of BUAA for PhD Graduates.

#204987 - $15.00 USD (C) 2014 OSA


Binocular Vision Should Be Examined.

What is normal binocular vision?

Structural Parameters Calibration for Binocular Stereo Vision Sensors Using a Double-Sphere Target.

Rebalancing binocular vision in amblyopia.

[Is vision measurement by vision tables accurate?].

On the neurophysiological organization of binocular vision.

Binocular vision in the twenty-first century.

Linking binocular vision neuroscience with clinical practice.

Moirés maintained internally by binocular vision.

Assessing the binocular advantage in aided vision.

Body mass index and binocular vision skills.

Binocular chromatic rivalry and single vision.

Geometry of binocular vision and a model for stereopsis.

Artificial vision support system (AVS(2)) for improved prosthetic vision.

Wearable Improved Vision System for Color Vision Deficiency Correction.

INS Integrated Navigation System for Poor Vision Navigation Environments.

Strabismus surgery in children: the prospects for binocular single vision.

The Effects of Sports Vision Training on Binocular Vision Function in Female University Athletes.

Vision in avian emberizid foragers: maximizing both binocular vision and fronto-lateral visual acuity.

The biomechanical significance of pulley on binocular vision.

Binocular interaction in strabismic kittens deprived of Vision.

Yellow spectacles to improve vision in children with binocular amblyopia.

The role of binocular vision in prehension: a kinematic analysis.

Influences of monocular and binocular vision on postural stability.