Simultaneous Calibration: A Joint Optimization Approach for Multiple Kinect and External Cameras.

sensors Article

Simultaneous Calibration: A Joint Optimization Approach for Multiple Kinect and External Cameras Yajie Liao 1 , Ying Sun 1,2 , Gongfa Li 1,2, *, Jianyi Kong 1,2 , Guozhang Jiang 1,2 , Du Jiang 2 , Haibin Cai 3 , Zhaojie Ju 3 , Hui Yu 3 and Honghai Liu 3 1

2 3

*

Key Laboratory of Metallurgical Equipment and Control Technology, Wuhan University of Science and Technology, Ministry of Education, Wuhan 430081, China; [email protected] (Y.L.); [email protected] (Y.S.); [email protected] (J.K.); [email protected] (G.J.) Hubei Key Laboratory of Mechanical Transmission and Manufacturing Engineering, Wuhan University of Science and Technology, Wuhan 430081, China; [email protected] School of Computing, University of Portsmouth, Portsmouth PO1 3HE, UK; [email protected] (H.C.); [email protected] (Z.J.); [email protected] (H.Y.); [email protected] (H.L.) Correspondence: [email protected]; Tel.: +86-189-0715-9217

Received: 5 April 2017; Accepted: 20 June 2017; Published: 24 June 2017

Abstract: Camera calibration is a crucial problem in many applications, such as 3D reconstruction, structure from motion, object tracking and face alignment. Numerous methods have been proposed to solve the above problem with good performance in the last few decades. However, few methods are targeted at joint calibration of multi-sensors (more than four devices), which normally is a practical issue in the real-time systems. In this paper, we propose a novel method and a corresponding workflow framework to simultaneously calibrate relative poses of a Kinect and three external cameras. By optimizing the final cost function and adding corresponding weights to the external cameras in different locations, an effective joint calibration of multiple devices is constructed. Furthermore, the method is tested in a practical platform, and experiment results show that the proposed joint calibration method can achieve a satisfactory performance in a project real-time system and its accuracy is higher than the manufacturer’s calibration. Keywords: joint calibration; Kinect; external camera; depth camera

1. Introduction Camera calibration is a process of estimating intrinsic parameters (such as focal length, principal point and lens distortion) and extrinsic parameters (such as rotation and translation) of camera (including color camera and depth camera) [1]. It has been widely used in computer/machine vision, and it makes the measurement of distances in the real world from their projections on the image plane possible [2]. Thus, with the continuous development of computer/machine vision, the camera calibration has been widely applied in 3D reconstruction [3,4], structure from motion [5], object tracking [6–8] and gesture recognition [9,10], etc. On 4 November 2010, with the launch of low-cost Microsoft Kinect sensors (Los Angeles, CA, USA) (the image capture device of the Kinect includes a color camera and a depth sensor which consists of an infrared (IR) projector combined with an IR camera), 3D depth cameras are increasingly attracting researchers due to their versatile applications in computer vision [11]. However, it is well known that Kinect intrinsics vary from device to device, which leads to the fact that the factory presets are not accurate enough for many applications [12]. To deal with the above issue, Burrus [13] presented a basic Kinect calibration algorithms by using camera calibration process based on OpenCV. However, it only calibrated the intrinsic parameters of the infrared camera. On the other hand, Hirotake et al. [14] tried to independently calibrate the intrinsic parameters of the depth sensor and color camera, and then register Sensors 2017, 17, 1491; doi:10.3390/s17071491

www.mdpi.com/journal/sensors

Sensors 2017, 17, 1491

2 of 16

both in a common reference frame. Herrera et al. [15] proposed a color camera calibration method with high-precision to assist the Kinect calibration. Their approach can achieve a high accuracy. In addition, Zhang et al. [16] augmented Herrera’s work with correspondences matching between the color and depth images, but they did not address distortions in the depth values. Smisek et al. [17] first considered the distortions in the projection and the depth estimation. After calibrating the internal and external parameters of the device, the depth distortion of each pixel was estimated by averaging the metric error. Moreover, focusing on the distortion for depth maps, Herrera et al. [18] proposed a joint depth and color camera calibration, and used the Lambert W function to solve the disparity distortion model. This process improved the calibration accuracy and corrected the depth distortion. However, their methods were generally limited to a single external camera, and could not be effectively employed with multiple devices. After that, Carolina et al. [19] and Guo et al. [20] improved the performance on the basis of the Herrera’s work: Carolina et al. proposed a metric constraint and used an open-loop post-processing step; Guo et al. simplified the disparity distortion model with the Taylor formula. Both of them improved the calibration speed and reduced the amount of input pictures. Han et al. [21] used two Kinects to form up a depth camera network, and accordingly achieved a fast and robust camera calibration process. Nonetheless, they still did not consider a joint calibration for multiple external cameras. Current research only focuses on the calibration of a single external camera instead of the calibration of multiple external cameras. To this end, this paper aims at filling this gap. This paper introduces a novel method and a corresponding workflow framework, which can simultaneously calibrate a Kinect, three external cameras, and their relative positions. By optimizing the final cost function and adding corresponding weights to the external cameras in different locations, the joint calibration of the depth sensor in Kinect and multiple external high- resolution color cameras is realized. The paper is organized as follows: Section 2 introduces the calibration model; Section 3 proposes the approach to jointly calibrate the multiple sensors; Section 4 discusses the comparative experimental results and the conclusions are presented in the final session. 2. Calibration Model 2.1. Color Camera Projection Model In this paper, the intrinsic model of the color camera is similar to that in [22], which is described by a pinhole model with radial and tangential distortion coefficients. It is assumed that the color camera coordinate is C = [ xc , yc , zc ] T , and it can be normalized as Xn = [ xn , yn ] T = [ xc /zc , yc /zc ] T . In the pinhole model, a straight line may bend due to the effect of radial distortion [23], which can be solved by the following formula: xcor = xn 1 + k1 r2 + k2 r4 + k3 r6 ycor = yn 1 + k1 r2 + k2 r4 + k3 r6

(1)

Similarly, tangential distortion happens when the camera lens is not perfectly parallel to the image plane, which causes some areas of the image to look closer than expected [24]. It can be solved by the following formula: xcor = xn + 2p1 xn yn + p2 r2 + 2xn2 (2) ycor = yn + p1 r2 + 2y2n + 2p2 xn yn where r2 = xn2 + y2n , ( xcor , ycor ) represents the corrected coordinate point. k1 , k2 , k3 and p1 , p2 are the radial and tangential distortion coefficients, respectively [25]. Therefore, K = [k1 , k2 , p1 , p2 , k3 ] is used to represent the distortion coefficients. In addition, Kc = [k c1 , k c2 , pc1 , pc2 , k c3 ] and Kd = [k d1 , k d2 , pd1 , pd2 , k d3 ] represent the distortion coefficients of the color and depth cameras, respectively. Then, the image coordinates can be obtained by:

Sensors 2017, 17, 1491

3 of 16

"

u v

#

"

=

fx 0

0 fy

#"

xcor ycor

#

"

+

u0 v0

# (3)

where f = f x , f y is the focal length and P0 = (u0 , v0 ) is the principal point of the image coordinate P = (u, v). The same model can be applied to the color and external cameras [26]. In this paper, the subscript c and d are used to distinguish the same parameters for the color camera and the depth camera, respectively. For example, f c = f cx , f cy represents the focal length of the color camera. 2.2. Depth Camera Intrinsic The transformation relation between the depth camera coordinates and the depth image coordinates is similar to the model for the color camera. The distortion of the color camera is a forward model (i.e., from the world coordinates to the image coordinates), and for easy calculations, the geometric distortion of the depth camera uses the backward model [18] (i.e., from the image coordinates to the world coordinates). According to the imaging principle of the depth sensor, the relation between the obtained disparity value dk and the depth value zk can be expressed as: zk =

1 = c1 d k + c0

1 1 f d b dk

+

1 z0

(4)

where z0 is the distance from the reference point to the reference plane, f d is the focal length of the depth camera, and b is the baseline length, which is the distance between the infrared camera and the laser emitter. c1 = 1/( f d b) and c0 = 1/z0 are part of the intrinsic parameters of the depth camera that are required to be calibrated. If the measured value of disparity d is directly substituted into Equation (4) for calibration (i.e., the disparity distortion correction is not performed). The depth information in the observation process produces a fixed error that could be corrected by adding a spatially varying offset Zδ . It can effectively reduce the re-projection error [17], where the depth value zkk can be re-expressed as: zkk = zk + Zδ (u, v) (5) In order to improve the calibration accuracy, the method in [18] is used to directly correct the original disparity d. The method in [18] took the errors of all pixels from planes at several distances and normalized them. It can be found that the normalization error satisfies the exponential decay [19]. Therefore, a distortion model can be constructed to use an attenuated spatial offset to counteract the increasing disparity error. It can be expressed as: dk = d + Dδ (u, v) exp(α0 − α1 d)

(6)

where d is the uncorrected disparity value obtained from Kinect, Dδ is used to eliminate the influence of the distortion, and it represents the spatial distortion related to each pixel. α0 , α1 represent the decay of the distortion effect, and dk is the corrected disparity value. Equations (4) and (6) are used to calculate the disparity-to-depth transformation process, and the inverse of these equations can be used to calculate the re-projection error. According to the inverse of Equation (4), it is known that: 1 c dk = − 0 (7) c1 z k c1 Equation (6) has an exponential relationship, so its inverse is much more complex than the inverse of Equation (4). Therefore, we can use Guo’s method that simplified Equation (6) by Taylor’s equation [20]: dk = d + Dδ (u, v) exp(α0 − α1 d) (8) ≈ d + Dδ (u, v)(1 + α0 − α1 d) Hence,

Sensors 2017, 17, 1491

4 of 16

d k  d  D u, v exp  0  1d   d  D u, v 1   0  1d 

Sensors 2017, 17, 1491

(8) 4 of 16

Hence,

d  D  D  0 d k (9) dk − D − D1δ α0 1 δ D (9) d= 1 − Dδ α 1 The model for the depth camera is described by L  f , P 0 , K d , c0 , c1 , D , 0 ,1 , where the The model for the depth camera is described by Ld d= { fdd , Pdd0 , Kd , c0 , c1 , Dδ , α0 , α1 }, where the first three represent internal parameters of depth camera, and the last five are used to transform first three represent internal parameters of depth camera, and the last five are used to transform disparity-to-depth values. disparity-to-depth values. 3. Joint Calibration for Multi-Sensors 3. Joint Calibration for Multi-Sensors The block diagram of the proposed calibration method is presented in Figure 1. The proposed The block diagram of the proposed calibration method is presented in Figure 1. The proposed calibration method consists of three main consecutive steps: (1) selecting all the checkerboard corners calibration method consists of three main consecutive steps: (1) selecting all the checkerboard corners by Zhang’s method [27] to initially estimate the intrinsic parameters of camera, and the four corners by Zhang’s method [27] to initially estimate the intrinsic parameters of camera, and the four corners of the calibration plane are extracted in a depth map to initially estimate the intrinsic parameters of of the calibration plane extracted in a depth to initially estimate the (extrinsic intrinsic parameters depth camera; (2) usingare Herrera’s method [18] tomap estimate the relative positions parameters) of depth camera; using and Herrera’s method [18] to estimate the relative positionsThen, (extrinsic parameters) between the (2) devices; (3) initializing the disparity distortion parameters. substituting all between the devices; and (3) initializing the disparity distortion parameters. Then, substituting all the the parameters into the new proposed cost function and attaching different weights to iteratively parameters the new proposed cost function and attaching different weights to iteratively calculate calculate into the nonlinear minimization. the nonlinear minimization. In the workflow framework, Step 1 and Step 2 contribute to the initialization of the parameters. In the workflow framework, Step 1 and 2 contribute initialization of the parameters. They introduce the new parameters to the costStep function in Step 3 to forthe nonlinear minimization. In Step 3, They introduce the new parameters to theis cost function in Step 3 forsquares nonlinear minimization. In Step 3, when the disparity distortion function calculated with the least method, the cost function of disparity distortion is thefunction same as is the corresponding intermediate term of the new cost function function of when the disparity distortion calculated with the least squares method, the cost and does not interact with the as other Therefore, after providing initial disparity distortion is the same theparameters. corresponding intermediate term of the corresponding new cost function and value, the nonlinear minimization of the parameters can be achieved by iteratively calculating the does not interact with the other parameters. Therefore, after providing the corresponding initial value, cost function. When all the jointcalculating calibration the results thenew nonlinear minimization ofthe theparameters parametersmeet can abepredefined achieved range, by iteratively newcan cost be output. Otherwise, it will continue the next loop until the maximum number of can iterations is function. When all the parameters meet atopredefined range, the joint calibration results be output. reached. Otherwise, it will continue to the next loop until the maximum number of iterations is reached. Depth sensor Kinect

External Camera0

Depth plate calibration

Select calibration plane Corner C Color based Camera calibration Select corners Corner 0 Select corners 0 based calibration

External Camera 1

Select corners 1

External Camera 2

Select corners 2

Ld

Initial value (Dδ、α )

Lc

Disparity Distortion Estimation

TVD TWC TWE0

Relative pose estimation

Refinement of all parameters (Nonlinear No Minimization)

High weight

Le0

Corner 1 based calibration

TWE1

Corner 2 based calibration

TWE2

Dδ、α

Low weight Normalized

Le1 Low weight Le2

Initialization

Yes Joint Calibration Results

Figure1.1.The Theillustration illustration of of work work flow of Figure of the the proposed proposedmethod. method.

Platform Setting andPreprocessing Preprocessing 3.1.3.1. Platform Setting and The experimentalplatform platformwith withmultiple multiple sensors sensors is is shown in in front The experimental shown in inFigure Figure2.2.Kinect Kinectisislocated located front of children with Autism Spectrum Disorders (ASD), and the same place has an external color camera, of children with Autism Spectrum Disorders (ASD), and the same place has an external color camera, which is called a Middle Camera (External Camera 0). Similarly, in the lower left corner and lower which is called a Middle Camera (External Camera 0). Similarly, in the lower left corner and lower right corner are the other external color cameras, which are called Left Camera (External Camera 1) and Right Camera (External Camera 2), respectively. They are fixed on the same rigid platform and do not change the relative position during the course of the experiment, and the color camera of Kinect is set to coincide with the origin of the experimental frame coordinate system. At the same time, the direction of the experimental frame coordinate system is also shown in Figure 2.

right right corner corner are are the the other other external external color color cameras, cameras, which which are are called called Left Left Camera Camera (External (External Camera Camera 1) 1) and and Right Right Camera Camera (External (External Camera Camera 2), 2), respectively. respectively. They They are are fixed fixed on on the the same same rigid rigid platform platform and and do do not not change change the the relative relative position position during during the the course course of of the the experiment, experiment, and and the the color color camera camera of of Kinect is Kinect2017, is set set to coincide coincide with with the the origin origin of of the the experimental experimental frame frame coordinate coordinate system. system. At At the the same Sensors 17, to 1491 5same of 16 time, time, the the direction direction of of the the experimental experimental frame frame coordinate coordinate system system is is also also shown shown in in Figure Figure 2. 2. XX ZZ

YY

Left Left Camera Camera

Kinect Kinect and and Middle Middle Camera Camera

Right Right Camera Camera

Figure Figure 2. 2. Relative Relative positions positions among among Kinect Kinect and and three three cameras cameras on on the the framework. framework. Figure 2. Relative positions among Kinect and three cameras on the framework.

In In the the process process of of selecting selecting the the checkerboard checkerboard corners, corners, Zhang’s Zhang’s method method [27] [27] is is used used to to initialize initialize the the In the process of selecting checkerboard corners, Zhang’s method [27] is used tocheckerboard initialize the parameters of camera in and external cameras. Using aa standard parameters of the the color color camerathe in Kinect Kinect and three three external cameras. Using standard checkerboard parameters the color camera Kinect and Using ain checkerboard grid of m, and there are nine and corner the and grid with with aaofwidth width of 0.025 0.025 m,in and there arethree nineexternal and six sixcameras. corner points points instandard the x-axis x-axis and y-axis y-axis grid with a width of 0.025 m, and there are nine and six corner points in the x-axis and y-axis directions, the directions, respectively. respectively. The The detection detection of of corners corners is is shown shown in in Figure Figure 3a. 3a. When When the the number number of ofdirections, the input input respectively. Thethan detection shown in 3a.(3) When the number the input images is images three, the unique of Equation can by Zhang’s method [27]. images is is larger larger than three,of thecorners uniqueissolution solution of Figure Equation (3) can be be found found by of Zhang’s method [27]. larger three, the unique solution of Equation (3)calibration can be found by Zhang’s method [27]. In this In paper, in to the of results, when the In this thisthan paper, in order order to ensure ensure the accuracy accuracy of the the calibration results, when acquiring acquiring the image, image, paper, in order to ensure the accuracy of the calibration results, when acquiring the image, three three three datasets datasets are are recorded recorded at at the the distance distance of of 0.8 0.8 m, m, 1.6 1.6 m m and and 2.4 2.4 m m away away from from the the camera camera frame frame datasets are recorded at the distance of 0.8 m, 1.6 m and 2.4 m away from the camera frame plane. plane. plane. Each Each dataset dataset is is divided divided into into five five pictures, pictures, which which include include one one picture picture of of frontal frontal plane, plane, two two Each dataset isx-axis divided into five pictures, which include one picture of frontal plane, two pictures of pictures of rotated plane and pictures of y-axis rotated plane. Generally speaking, pictures of the the x-axis rotated plane and two two pictures of the the y-axis rotated plane. Generally speaking, the x-axis rotated plane and two pictures of the y-axis rotated plane. Generally speaking, the corners the the corners corners of of the the checkerboard checkerboard cannot cannot be be displayed displayed in in the the depth depth image, image, and and we we can can only only select select four four of the checkerboard cannot be displayed in the depth image, and we can only select four corners of corners of corners of of the the calibration calibration plate plate in in the the depth depth image, image, as as shown shown in in Figure Figure 3b. 3b. Although Although the the accuracy accuracythe of calibration plate in the depth image, as shown in Figure 3b. Although the accuracy of Kinect depth Kinect Kinect depth depth image image is is on on the the millimeter millimeter level, level, however, however, there there is is still still aa lot lot of of noise noise in in these these corners. corners. image is on thethe millimeter level,by however, is still a lot can of Consequently, Consequently, plane the selected corners only be used to estimate Consequently, the plane formed formed by the four fourthere selected corners cannoise onlyin bethese used corners. to initially initially estimate the the the plane formed by the four selected corners can only be used to initially estimate the depth data of depth depth data data of of the the calibration calibration plate plate plane. plane. the calibration plate plane.

(a) (a)

(b) (b)

Figure Figure 3. 3. The The key key steps steps in in the the operation operation of of the the program. program. (a): (a): the the detection detection of of corners—there corners—there are are 54 54 Figure 3. The key steps in the operation of the program. (a): the detection of corners—there are 54 corners corners of of our our checkerboard; checkerboard; (b): (b): selected selected the the four four corners corners of of the the calibration calibration plate—the plate—the manually manually corners of our checkerboard; (b): selected the four corners of the calibration plate—the manually selected selected plane plane coincides coincides with with the the plane plane of of the the calibration calibration plate. plate. selected plane coincides with the plane of the calibration plate.

3.2. 3.2. Relative Relative Pose Pose Estimation Estimation 3.2. Relative Pose Estimation In In the the relative relative position position estimation, estimation, the the color color camera camera of of Kinect Kinect is is assumed assumed to to be be the the origin origin of of the the In the relative position estimation, the color camera of Kinect is assumed to be the origin of the experimental experimental frame frame coordinate coordinate system. system. All All of of the the equipment equipment is is fixed fixed on on the the same same rigid rigid frame frame during during experimental frame coordinate system. All of the equipment is fixed on the same rigid frame during the the whole whole experiment. experiment. All All of of the the reference reference frames frames and and transformations transformations are are illustrated illustrated in in Figure Figure 4. 4. {D}, {D}, the whole experiment. All of the reference frames and transformations are illustrated in Figure 4. {D}, {C}, {C}, {W}, {W}, {V} {V} and and ({E0}, ({E0}, {E1}, {E1}, {E2}) {E2}) are are the the coordinate coordinate system system of of depth, depth, color, color, checkerboard checkerboard (world), (world), {C}, {W}, {V} and ({E0}, {E1}, {E2}) are the coordinate system of depth, color, checkerboard (world), calibration plate and external cameras, respectively. A point on a coordinate system can be transformed to another coordinate system by T = {R, t}, where R is the rotation matrix, and t is the translation matrix.

Sensors 2017, 17, 1491

6 of 16

calibration plate and external cameras, respectively. A point on a coordinate system can be Sensors 2017, 17,to 1491 16 transformed another coordinate system by T = {R, t}, where R is the rotation matrix, and t is6 of the translation matrix. For example, TWC represents the transformation from the checkerboard to the color camera coordinate system, a point X Wfrom in {W} be transformed by thecoordinate equation For example, TWC represents the and transformation the can checkerboard to the into color{C} camera . in {W} can be transformed into {C} by the equation XC = RWC XW + tWC . X C  Rand  tWC WC XaWpoint system, XW Kinect

TDC {D} {C}

TVD

TWC

TWE0 {E0}

{V}

TWE1 {E1}

{W}

TWE2

Checkerboard Calibration plate

{E2}

Figure Figure 4. 4. Reference Reference frames frames and and transformations. transformations. {D}, {D},{C} {C}and and ({E0}, ({E0}, {E1}, {E1}, {E2}) {E2}) are are the the coordinate coordinate systems of depth, color, and external cameras, respectively. systems of depth, color, and external cameras, respectively.

The above formulas can achieve the conversion of most coordinate systems, such as TWC , TWE The above formulas can achieve the conversion of most coordinate systems, such as TWC , TWE and but they they cannot cannot describe describe the the relationship relationship between between {D} {D} and VD ,, but and TTVD and {C}. {C}. Here, Here, we we use use Herrera’s Herrera’s [18] [18] method. Since the calibration plate ({V}) and the checkerboard ({W}) have coplanar characteristics, method. Since the calibration plate ({V}) and the checkerboard ({W}) have coplanar characteristics, and are known. Hence, we can get T . Specific . Specific steps as follows, anddefine we define a WC ,, T and TTWC TVD steps are are as follows, and we a plane VD are known. Hence, we can get T DCDC plane with Formula each reference ({W}, {V}): with Formula (10) in(10) eachinreference framesframes ({W}, {V}): (10) nT X    0 nT X − δ = 0 (10) where n is the unit normal and  is the distance to the origin. In addition, if the rotation matrix is T where nas is the and is the distance to the origin. In addition, the rotation defined theδ parameters of the plane in both frames areif chosen as n matrix r1, r2normal R  unit , r3  , and  0， 0， 1is T defined = (the r1 , rplane the parameters of the planecoordinate in both frames are({C}) chosen 2 , r3 ), and   as 0 , Rthen and parameters in the color camera system are as n = [0, 0, 1] and δ = 0, then the plane parameters in the color camera coordinate system ({C}) are (11) n  r3 and   r3T t n = r3 and δ = r3T t (11) where it can use R WC , tWC for the color camera and R VD , t VD for the depth camera [18, 20]. where it can useparameters’ RWC , tWC forvectors the color and Rimage for the camera [18,20]. VD , tVD The plane forcamera each color could be depth concatenated by the matrices: The plane parameters’ vectors for each color image could be concatenated byvectors the matrices: MC  n c1 , n c2 ,  , n cn  and bC   c1, c 2 ,  , cn  [28]. Furthermore, the plane parameters in the MC = [nc1 , nc2 , · · ·, ncn ] and bC = [δc1 , δc2 , · · ·, δcn ] [28]. Furthermore, the plane parameters vectors depth camera could also be represented by M D and b D . Then, the relative transformation in the depth camera could also be represented by MD and bD . Then, the relative transformation TCD  R CD , t CD  is shown as: TCD = {RCD , tCD } is shown as: R0'CD =MMDDMMTCCT R (12) (12) −1 tCD = MC MCT T 1MC (bC − bD )TT (13) (13) t CD  MC MC MC bC  b D 



T



Finally, the rotation matrix R = UV is obtained by singular value decomposition (SVD), where Finally, the rotation matrix CD R CD  UVT is obtained by singular value decomposition (SVD), USVT is the SVD of R0CD . TDC can also be obtained by TCD . Now, the relative position between the T where is the SVD of . canofalso be can obtained by TCD . Now, the relative position R ' T USV CD DC three external cameras and the color camera Kinect be obtained directly. between the three external cameras and the color camera of Kinect can be obtained directly.

Sensors 2017, 17, 1491

7 of 16

3.3. Nonlinear Minimization Least square method is a basic, practical, and widely used mathematical model [29], by minimizing the sum of squares of the error between samples and its reconstruct samples to find the best cost function. During the camera calibration, the core of the calibration method aims to minimize the weighted sum of squares of the measurement re-projection errors over all parameters. The re-projection error for the color camera and external camera are the Euclidean distance between the measured corner position and its re-projected position. We assume that the re-projection positions of the color camera and the external camera are pˆ c , pˆ e , respectively, and their actual measurement positions are pc , pe , respectively. For the depth camera, the re-projection error is the difference between the original disparity measurement value d and the re-projection value dˆ (i.e., the estimated value of the original disparity) of the disparity. In Formula (4), c0 and c1 are the internal parameters of the depth camera, zk can be obtained by the depth information, and then we can get the original disparity estimated value d.ˆ The method of [30] can be used to obtain the parameter Zkk in Equation (5), and the original disparity measurement value d can also be obtained. At this point, we have a preliminary cost function: 2 Σ dˆ − d Σk pˆ e − pe k2 Σk pˆ c − pc k2 + c= + σc2 σe2 σd2

(14)

where σc2 , σd2 and σe2 are the variances of the measurement error of color camera, depth camera and external camera, respectively. Obviously, Formula (14) does not comply fully with our requirements. For example, some external camera parameters are completely not used. Hence, Equation (14) needs to be modified. First of all, taking into account the disparity distortion correction of the depth camera, the estimated value dˆ of the original disparity is replaced by dˆk corrected by Equation (7). The measurement value d of original disparity is replaced by dk corrected by Equation (8). In Equation (8), the parameters Dδ and α = {α0 , α1 } are independent from all of the other parameters. They only depend on the observed values of the pixel (u, v). Therefore, it can be optimized through least squares method individually, and the cost function of disparity distortion can be described as Equation (15). The initial values of Dδ and α are provided, and then the optimal solution by iteration is achieved: 1 c cd = Σ dˆk − dk = Σ − 0 − (d + Dδ (u, v) exp(α0 − α1 d)) (15) u,v u,v c1 z k c1 Secondly, the cost Function (14) cannot achieve the simultaneous calibration of all external cameras. On this basis, we extend the intermediate term in Equation (14) that is associated with external cameras. Meanwhile, adding different weights to the external cameras, that is, adding coefficients β i (i = 0, 1, 2, 3 ...) to their corresponding re-projection errors. It can be found that the additional weights are related to the distance from the external cameras to the Kinect. Figure 5 shows the top view of the experimental framework during the image acquisition process. There are multiple rotation direction of the checkerboard plane, and the frontal plane is selected as the analysis object. The distance between points A and B is the total width of the checkerboard, and the distance between points B, C and points B, D are the width of the checkerboard shown in the pictures, which is taken by the external camera 0 and 1, respectively. Apparently, the distance between points B and C is longer than the distance between points B and D [31]. In other words, under the same condition, the checkerboard area occupies more pixels in the picture taken by the external camera 0. That is, the pictures that are taken by the external camera 0 contain more calibration information [32]. Therefore, it is believed that it should have a higher weight. That is to say, in the calibration process, when attaching a high weight to the camera0 that comes closer to the Kinect, and attaching low weights to camera1 and camera2 that are far from the Kinect, the calibration results are more accurate.

Sensors 2017, 17, 1491 Sensors 2017, 17, 1491

8 of 16 8 of 16

Color External Camera Camera0

External Camera1 Experimental Frame

C

D

A

B

Calibration Plate

Figure Figure 5. 5. The The top top view view of of the the experimental experimental framework. framework. Table 1. External camera correspondence coefficient.

This paper uses I to represent the spatial distance between the color camera and the corresponding  external cameras on the experimental X frame. By a large of calibration results, the Y analyzing Z I number relationship between the spatial the correspondence β can be summed up. E.C.0distance 71.58 I and 39.91 33.03 88.36 coefficient 1.2 E.C.1 482.43 cameras 585.62 is 346.42 834.04 0.9 the value of coefficients β When the value of I for all of the external less than 600 mm, −505.84the569.28 846.95 0.9 external cameras is greater does not vary with I, and β iE.C.2 = 1; when value of370.63 I for one or more and it Z can are be thedefined corresponding coordinates in the frame system, than X, 600Y mm, that A = β =experimental 1 − 0.02 × A, and coordinate A is a natural number (I − 600)/50, I is the spatial distance the color camera the corresponding camera incameras the (e.g.,respectively; 1.1 calculated as 2). At the sameoftime, in order toand reduce the influenceexternal of the external coordinate system; is the corresponding coefficient of the external camera.  on Kinect internal parameters calibration, we specify β 0 + β 1 + ... + β i = i + 1 [33], and the other external for which value of less than 600 mm have the value of coefficient; Thiscameras paper uses I to the represent theI isspatial distance between thesame color camera and the when the value of I for all of the external cameras is greater than 600 mm, all the external cameras corresponding external cameras on the experimental frame. By analyzing a large number of coefficients are processed according to the same formula A = I − 600 /50, β = 1 − 0.02 × A. In this ( ) calibration results, the relationship between the spatial distance I and the correspondence coefficient paper, between each of corresponding cameras and colorthan camera be relative summedposition up. When the value I for all of theexternal external cameras is less 600 can mm,been the  canthe calculated, and the corresponding external cameras coefficients as shown in Table 1. After analysis of value of coefficients  does not vary with I, and  i  1 ; when the value of I for one or more the external cameras, the modified optimized cost function can also be obtained: external cameras is greater than 600 mm, it can be defined that A  I  600 / 50 ,   1  0.02  A , 2 and A is a natural number (e.g., 1.1 2). At the same time, in order to reduce the influence ˆk − das 2 calculated Σ d k Σk pˆ c − pc k Σk pˆ ei − pei k2 of the external cameras on Kinect internal parameters calibration, we specify  0 ) 1  ...  i  i(16) 1 c= + + βi , (i = 0, 1, 2, 3... σc2 σd2 σei2 [33], and the other external cameras for which the value of I is less than 600 mm have the same value of coefficient; of I for allisofthe thesame external cameras is greaterintermediate than 600 mm,term all the It is easy when to seethe thatvalue Formula (15) as the corresponding of external the new   1  0 02  A . cameras coefficients are processed according to the same formula ,   A  I  600 / 50 cost Function (16) and does not interact with the others parameters. Therefore, we can directly .replace

In paper, the relative position between(16). eachThe corresponding external cameras and color camera thethis corresponding initial value in Equation nonlinear minimization of the parameters can be can been calculated, and the corresponding external cameras coefficients as shown in Table 1. After achieved by iteratively calculating the new cost function. The specific iteration process is as follows: analysis of the external the modified optimized functionβcan the first step is to keep Dcameras, while assigning thecost coefficients andbe β 2obtained: by 1.2, 0.9 and 0.9, 0 , β 1also δ as a constant respectively. Then, all the other parameters are substituted into Equation (16) to minimize the value of 2 2 2 Σ pˆ c  pc  pˆ ei  pei  dˆ k  d k , ( i  0,1,2， 3 ... ) are assigned c. In the second step, cthe distortion model to  initial2 values of α0 , α21 and Dδ i in the depth (16) 2 c d zero, and then they are taken into Equation (15) to optimize theei disparity distortion parameter Dδ for eachItpixel individually. Once the(15) newisvalue Dδ isasobtained, the old value Dδ is replaced in the the new first is easy to see that Formula the same the corresponding intermediate term of step.Function Repeat Steps 1 and 2 as many times as the necessary the residuals converge to directly a minimum. cost (16) and does not interact with others until parameters. Therefore, we can replace





the corresponding initial value in Equation (16). The nonlinear minimization of the parameters can 1. External correspondence be achieved by iterativelyTable calculating thecamera new cost function. coefficient. The specific iteration process is as follows: the first step is to keep D as a constant while assigning the coefficients  0 , 1 and  2 X

Y

Z

I

fi

482.43

585.62

346.42

834.04

0.9

by 1.2, 0.9 and 0.9, respectively. Then, all the other parameters are substituted into Equation (16) to E.C.0 71.58 39.91 33.03 88.36 1.2 minimize the value of c. In the second step, the initial values of  0 , 1 and D in the depth E.C.1

distortion model are assigned to zero, and then they370.63 are taken 846.95 into Equation0.9(15) to optimize the E.C.2 −505.84 569.28 disparity distortion parameter D for each pixel individually. Once the system, new value D isI obtained, X, Y and Z are the corresponding coordinates in the experimental frame coordinate respectively; is the  spatial distance of the color camera and the corresponding external camera in the coordinate system; β is the the old value D is replaced in the first step. Repeat Steps 1 and 2 as many times as necessary until corresponding coefficient of the external camera.

the residuals converge to a minimum.

Sensors 2017, 17, 1491

9 of 16

4. Experiments In order to demonstrate the performance of the proposed method in the real project, all of the input images in this experiment come from the same database, which were collected and produced by our existing experimental equipment. All pictures were collected in the way described in Section 3.1 and saved in JPG format. For comparison with Herrera’s method, all depth images in this experiment are saved in the same PGM format as in Herrera’s method. In addition, since Herrera’s method had a strong dependency on the number of input pictures, the results were random when the number of pictures was less than 20 [19], and the joint calibration method proposed in this paper only needs 15 pictures. The devices’ intrinsic parameters calculated by our method are shown in Tables 2 and 3, wherein C.C. represents Color Camera and E.C. represents External Camera. Table 2. Color camera intrinsic parameters.

C.C. E.C.0 E.C.1 E.C.2

fcx

fcy

uc0

vc0

kc1

kc2

pc1

pc2

kc3

518.52 ±0.07 1619.83 ±4.66 1652.77 ±8.04 1638.09 ±7.15

520.68 ±0.06 1626.44 ±4.95 1652.86 ±8.06 1637.34 ±7.88

324.31 ±0.10 633.85 ±15.13 695.53 ±15.22 766.41 ±18.46

243.74 ±0.10 475.47 ±21.08 477.55 ±13.44 503.78 ±15.55

−0.0124 ±0.0016 −0.0540 ±0.1706 −0.3491 ±0.1422 0.1555 ±0.0635

0.2196 ±0.0225 −3.1424 ±4.1635 0.4026 ±2.5525 0.8307 ±0.7039

0.0014 ±0.0001 −0.0011 ±0.0025 −0.0004 ±0.0017 −0.0049 ±0.0013

−0.0003 ±0.0001 −0.0034 ±0.0020 0.0047 ±0.0018 0.0120 ±0.0025

−0.5497 ±0.0995 7.4728 ±2.7466 6.4104 ±4.2151 −2.1837 ±2.2819

This table shows the focal length f cx , f cy , the principal point (uc0 , vc0 ) and the distortion coefficient Kc = [k c1 k c2 pc1 pc2 k c3 ], respectively, wherein C.C. and E.C. represents Color and External Camera, respectively.

Table 3. Depth sensor intrinsic parameters. fdx

fdy

ud0

vd0

kd1

kd2

pd1

573.87 ±0.00

573.13 ±0.00

327.10 ±0.00

234.93 ±0.00

0.0487 ±0.0000

0.0487 ±0.0000

−0.0035 ±0.0000

pd2

kd3

c0

c1

α0

α1

−0.0042 ±0.0000

0.0000 ±0.0000

3.42 ±0.001457

−0.003162 ±0.00

0.8656 ±0.0460

0.0018 ±0.0001

This table shows the focal length ( f dx , f dy ), the principal point (ud0 , v0 ), the distortion coefficient Kd = [k d1 k d2 pd1 pd2 k d3 ], the depth parameters (c0 , c1 ) and the depth distortion (α0 , α1 ), respectively.

4.1. Herrera’s Method Results for Comparison In our results, each device corresponds to a unique set of values. In this paper, the Herrera’s method results are used to compare with the proposed method. However, Herrera’s calibration method is limited to a single external camera and could not be effectively employed in multiple devices. We can only calibrate each external camera one by one. Therefore, in the actual calibration process, each of the different external cameras will correspond to a new set of Kinect data. How to choose from multiple sets of Kinect parameters is also a problem. In the actual comparison process, Herrera’s method is still used to calibrate the external camera 0, 1, 2, and there are three different sets of Kinect parameter values. In the process of selecting Kinect parameters for Herrera’s method, the re-projection error value of color camera and depth camera is an important reference, the smaller the value is, the greater the selectivity of this set of Kinect parameters will be. Then, we select the single set of Kinect parameters, based on which the lowest re-projection error summed over all three external cameras is calculated. In addition, we can also put each set of Kinect parameter values into the 3D reconstruction module, respectively. By observing the effect of 3D reconstruction, the best group of values for Herrera’s method is chosen. However, the randomness of this method is too large, and the choice of Kinect parameters

Sensors Sensors 2017, 2017, 17, 17, 1491 1491

10 of of 16 16 10

method is chosen. However, the randomness of this method is too large, and the choice of Kinect may be affected by the observation error. Therefore, this paper selects the Kinect parameters by the parameters may be affected by the observation error. Therefore, this paper selects the Kinect first method described above. parameters by the first method described above. In order to visually present the difference between the two methods, in this paper, In order to visually present the difference between the two methods, in this paper, the the corresponding rotation, translation and distortion correction are made to the original depth maps, corresponding rotation, translation and distortion correction are made to the original depth maps, and overlaid it on the corresponding color image [34]. The overlaid depth maps and the corresponding and overlaid it on the corresponding color image [34]. The overlaid depth maps and the 3D colored point cloud images obtained by the proposed method and Herrera’s method are shown in corresponding 3D colored point cloud images obtained by the proposed method and Herrera’s Figures 6 and 7, respectively. method are shown in Figures 6 and 7, respectively.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Figure 6. The overlaid depth maps and the corresponding 3D colored point cloud images obtained Figure 6. The overlaid depth maps and the corresponding 3D colored point cloud images obtained by proposed method. (a) the color image is captured by the color camera in Kinect; (c), (e), (g) the by proposed method. (a) the color image is captured by the color camera in Kinect; (c,e,g) the color color images are captured by the external camera 0, 1 and 2, respectively; (b), (d), (f), (h) are the images are captured by the external camera 0, 1 and 2, respectively; (b,d, f,h) are the corresponding 3D corresponding 3D colored point cloud images, and they are captured from (a), (c), (e), (g), colored point cloud images, and they are captured from (a, c,e,g), respectively. respectively.

Sensors 2017, 17, 1491

11 of 16

Sensors 2017, 17, 1491

11 of 16

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Figure 7. The overlaid depth maps and the corresponding 3D colored point cloud images obtained Figure 7. The overlaid depth maps and the corresponding 3D colored point cloud images obtained by Herrera’s method. (a) the color image is captured by the color camera in Kinect; (c), (e), (g) the by Herrera’s method. (a) the color image is captured by the color camera in Kinect; (c,e,g) the color color images are captured by the external camera 0, 1 and 2, respectively; (b), (d), (f), (h) are the images are captured by the external camera 0, 1 and 2, respectively; (b,d,f,h) are the corresponding 3D corresponding 3D colored point cloud images, they are captured from (a), (c), (e), (g), respectively. colored point cloud images, they are captured from (a,c,e,g), respectively.

It can be clearly observed that the proposed method shows very accurate results in the It can be clearly that and the 3D proposed shows very accurate the corresponding overlaidobserved depth maps coloredmethod point cloud images. Herrera’sresults methodinonly corresponding overlaid depth maps and 3D colored point cloud images. Herrera’s method only satisfies partial accuracy in the corresponding overlaid depth maps and 3D colored point cloud satisfies accuracy the corresponding overlaid andon 3Dthe colored cloudwhich images. images. partial For example, in in Figure 7f, a large black pointdepth cloudmaps appears whitepoint desktop, is For example, in Figure 7f, a large black point cloud appears on the white desktop, which is not allowed. not allowed. By By analyzing analyzing the the calibration calibration results results of of the the two two methods methods for for the the same same dataset, dataset, standard standard deviation deviation is compared for the re-projection error as shown in Table 4. Here, the standard deviation of each is compared for the re-projection error as shown in Table 4. Here, the standard deviation of each rere-projection error can be regarded as the actual value of the corresponding intermediate term after projection error can be regarded as the actual value of the corresponding intermediate term after the the nonlinear minimization by by Formula Formula(16). (16).Therefore, Therefore,the theactual actualvalue valueofof cc in nonlinear minimization in Equation Equation (16) (16) indirectly indirectly reflects the accuracy of the calibration, and it can be a reference to evaluate the accuracy of calibration, calibration, reflects the accuracy of the calibration, and it can be a reference to evaluate the accuracy of but it is by no means a direct standard [35]. The smaller the value is, the higher the calibration accuracy of the corresponding device will become. In Herrera’s method, the minimum value of the

Sensors 2017, 17, 1491

12 of 16

but it is by no means a direct standard [35]. The smaller the value is, the higher the calibration accuracy of the corresponding device will become. In Herrera’s method, the minimum value of the standard deviation of the color camera and the depth camera is found to be 0.1272 and 0.7343, respectively, and the standard deviation of the three external cameras is unique. This moment, the actual value of c is c Her = 6.04436 by Herrera’s method. Similarly, the c value of proposed method is c Pro = 5.93022. It can be found intuitively that the results of these two methods are very close. Both methods achieve accurate calibration, and the data shows that the proposed joint calibration method is more accurate. Therefore, our method does not only realize the joint calibration of the depth sensor and multiple external cameras, but also improves the accuracy of calibration and reduces the dependence on the number of input images. Table 4. Standard deviation of re-projection error. Herrera’s Method C.C.

0.1367 [−0.0054, +0.0058]

E.C.0

1.7242 [−0.0658, +0.0709]

0.1272 [−0.0050, +0.0054]

0.1390 [−0.0055, +0.0059]

E.C.2 0.8455 [−0.0012, +0.0012]

c

0.1423 [−0.0056, +0.0061] 1.6984 [−0.0648, +0.0699]

1.8169 [−0.0739, +0.0801]

E.C.1

D.C.

Proposed Method

0.7343 [−0.0010, +0.0010] 6.04436

1.6580 [−0.0674, +0.0731] 1.6429 [−0.0646, +0.0699]

1.5566 [−0.0612, +0.0662]

0.7829 [−0.0011, +0.0011]

0.8567 [−0.0011, +0.0012] 5.93022

Wherein C.C. represents Color Camera; E.C. represents External Camera and D.C. represents Depth Camera; c is the parameter in Equation (16). To compare the data sets, the variances were kept constant (σc = 0.02 px, σd = 0.75 kud, σei = 0.40 px).

4.2. 3D Reconstruction In addition, in order to provide data support for the 3D reconstruction module, the results of the two methods are also implemented into a real project platform, respectively [36]. The overlaid depth maps and the corresponding joint 3D reconstruction results of the proposed method and Herrera’s method are shown in Figure 8. Color images captured in different cameras are superimposed on the same space. The color images view comes from the color camera in Kinect and the external camera 0, which are covered in the same 3D point cloud space. The completeness of the reconstruction between them can reflect the accuracy of the joint calibration results. By observing the effects of the overlaid depth maps and the corresponding joint reconstructed 3D images, it can be found that both of these two methods ensure the integrity of the depth information, and the details of the scene are also reflected in the reconstructed 3D images. Comparing the details of these two image sets, the proposed method works better on the overlaid depth maps and the corresponding joint reconstructed 3D images. For example, in Figure 8a,c, comparing the left palm edge of the observed object, it is clear that the color and depth information were superimposed more accurately by proposed method; in Figure 8b,d, comparing the right shoulder of the observed object, the contours by the proposed method are clearer. In Herrera’s method, it contains a larger area of the clothing pattern on the surface of the brown storage locker due to the data deviation.

Sensors 2017, 17, 1491

13 of 16

Sensors 2017, 17, 1491

13 of 16

Sensors 2017, 17, 1491

13 of 16

(b) (a) (b) (a)

(c)

(d)

Figure 8. Joint 3D reconstruction. (a) and (b) are the overlaid depth map and the corresponding joint

Figure 8. Joint 3D reconstruction. (a,b) are the overlaid depth map and the corresponding joint reconstructed 3D image by proposed method, respectively; (c) and (d) are the overlaid depth map (c)proposed method, respectively; (c,d) are the (d)overlaid depth map and the reconstructed 3D image by and the corresponding joint reconstructed 3D image by Herrera’s method, respectively. corresponding joint3Dreconstructed method, respectively. Figure 8. Joint reconstruction.3D (a)image and (b)by areHerrera’s the overlaid depth map and the corresponding joint

4.3. reconstructed 3D Ground Truth 3D image by proposed method, respectively; (c) and (d) are the overlaid depth map and theTruth corresponding joint reconstructed 3D image by Herrera’s method, respectively. 4.3. 3D Ground In order to visually demonstrate the calibration result of the two methods, we also collected a

set of data a test set. As shown in Figure 9, the test set contains six sets of standard chessboard In to as visually 4.3.order 3D Ground Truth demonstrate the calibration result of the two methods, we also collected a images with different angles, and set 9, of the images a checkerboard underchessboard Kinect set of data as a test set. As shown in each Figure testcontains set contains six sets ofimage standard order to visually demonstrate the calibration of the twoofmethods, we also collected viewInand a corresponding depth image. First of all, result the coordinates the checkerboard corners aof images with different angles, and eachinset of images contains a checkerboard image under Kinect view set data test set. As shown Figure 9, theFigure test set six distance sets of standard theoftest setas area determined, and the number is in 3a.contains The actual betweenchessboard the corners and aimages corresponding depth image. First of all,ofthe coordinates of checkerboard the checkerboard cornersKinect of the test with different each contains image of the checkerboard is angles, 25 mm. and Then, the set Kinectimages intrinsics of thesea two methods are usedunder to reconstruct set are determined, and the number is in Figure 3a. The actual distance between the corners view and set, a corresponding image. First of all, the of analyzing the checkerboard corners ofof the the test respectively. depth The calibration accuracy is coordinates evaluated by the distance error checkerboard isare 25 mm. Then,points. the the Kinect intrinsics of these twoactual methods arethe used to reconstruct the test setthe determined, and isthe in closer Figure 3a.actual The distance between the corners between reconstructed Innumber theory, the distance and calculated distance the ofofthe 25 mm. Then, theare, Kinect ofby these two methods areofused to reconstruct test set, respectively. Theis calibration accuracy is intrinsics evaluated analyzing the distance between the thecheckerboard adjacent checkerboard corners the higher the calibration accuracy the error corresponding the test set, respectively. accuracy is relative evaluated analyzing theknown distance calibration method beThe [37]. In order toactual reducedistance the the maximum reconstructed points. In will theory, thecalibration closer the anderror, thebycalculated distance ofdistances theerror adjacent between the reconstructed points. In theory, the closer the actual distance and the calculated distance of the x-axis and the y-axis are measured separately. In other words, the distances between the checkerboard corners are, the higher the calibration accuracy of the corresponding calibration method ofcheckerboard the adjacentcorners checkerboard corners are,1,the higher the calibration accuracy of5the corresponding numbered 1, 9 and 46 are calculated, respectively. Table shows the distance will be [37]. In order to reduce the relative error, the maximum known distances of the x-axis and calibration method will be [37]. Inpoints orderin tothe reduce the relative the maximum known errorare between the reconstructed x-axis and y-axiserror, directions. It isthe clear that thedistances proposed the y-axis measured separately. In other words, the distances between checkerboard corners ofmethod the x-axis and to the are measured In other words, the distances between the is closer they-axis true distance with a separately. higher calibration accuracy. numbered 1, 9 and 1, 46 are calculated, respectively. Table 5 shows the distance error between the checkerboard corners numbered 1, 9 and 1, 46 are calculated, respectively. Table 5 shows the distance reconstructed points the x-axis and y-axis directions. is clear that theItproposed method is closer to error between thein reconstructed points in the x-axis andIty-axis directions. is clear that the proposed the true distance with higher calibration method is closer to athe true distance withaccuracy. a higher calibration accuracy.

(a)

(b)

Figure 9. One of the test data sets. (a) the checkerboard image under Kinect view; (b) the corresponding depth image.

(a)

(b)

Figure 9. One of the test data sets. (a) the checkerboard image under Kinect view; (b) the

Figure 9. One of the test data sets. (a) the checkerboard image under Kinect view; (b) the corresponding corresponding depth image. depth image.

Sensors 2017, 17, 1491

14 of 16

Table 5. Distance errors between the reconstructed points. Herrera’s Method

1 2 3 4 5 6 M

Proposed Method

Lx-25 (mm)

Ly-25 (mm)

Lx-25 (mm)

Ly-25 (mm)

0.16988 0.10438 0.19263 0.20350 −0.05288 0.03600 0.12655

0.07660 0.10360 0.08220 0.25660 0.20440 0.07500 0.13307

0.16475 0.09775 0.18025 0.18088 −0.04725 0.03288 0.11729

0.06380 0.09040 0.06960 0.24160 0.19200 0.06520 0.12043

Lx and Ly represent the calculated distances of the adjacent checkerboard corners in the x-axis and y-axis directions, respectively. Lx-25 and Ly-25 represent the error between the calculated distance and the actual distance in the x-axis and y-axis directions, respectively. M represents the arithmetic mean of the absolute value of the distance error.

5. Conclusions Considering the problem that current research only focuses on the calibration of a single external camera instead of multiple external cameras, we present a novel method and a corresponding workflow framework that can simultaneously calibrate relative poses of a Kinect and three external cameras. By optimizing the final cost function and adding corresponding weights to the external cameras in different locations, the joint calibration of multiple devices is efficiently constructed. At the same time, the validity and accuracy of the method are verified with comparative experiments. Experimental results show that the proposed method improves the accuracy of calibration. It also shows that the proposed method does not only reduce the dependence on the number of input pictures, but also improves the accuracy of joint 3D reconstruction. In this paper, camera calibration technology is used to provide data support and has been successfully applied in a practical real-time project, with important practical value. Acknowledgments: This work was supported by grants from the National Natural Science Foundation of China (Grant No. 51575407, 51575338, 61273106, 51575412) and the EU Seventh Framework Programme (Grant No. 611391). This paper is funded by Wuhan University of Science and Technology graduate students’ short-term study abroad special funds. Author Contributions: Y.L., Y.S., G.L. and H.C. conceived and designed the experiments; Y.L. performed the experiments; Y.L., Y.S., G.L. and H.C. analyzed the data; Y.L., J.K., G.J. and D.J. contributed reagents/materials/analysis tools; Y.L. wrote the paper; and H.L., Z.J. and H.Y. edited the language. Conflicts of Interest: The authors declare no conflict of interest.

References 1.

2. 3. 4.

5. 6.

Zhang, Z. Flexible Camera Calibration by Viewing a Plane from Unknown Orientations. In Proceedings of the IEEE 1999 Seventh International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; pp. 666–673. Salvi, J.; Armangué, X.; Batlle, J. A comparative review of camera calibrating methods with accuracy evaluation. Pattern Recognit. 2002, 35, 1617–1635. [CrossRef] Gong, X.; Lin, Y.; Liu, J. 3D LIDAR-camera extrinsic calibration using an arbitrary trihedron. Sensors 2013, 13, 1902–1918. [CrossRef] [PubMed] Canessa, A.; Chessa, M.; Gibaldi, A.; Sabatini, S.P.; Solari, F. Calibrated depth and color cameras for accurate 3D interaction in a stereoscopic augmented reality environment. J. Vis. Commun. Image Represent. 2014, 25, 227–237. [CrossRef] Choi, D.-G.; Bok, Y.; Kim, J.-S.; Shim, I.; Kweon, I.S. Structure-From-Motion in 3D Space Using 2D Lidars. Sensors 2017, 17, 242. [CrossRef] [PubMed] Li, N.; Zhao, X.; Liu, Y.; Li, D.; Wu, S.Q.; Zhao, F. Object tracking based on bit-planes. J. Electron. Imaging 2016, 25, 013032. [CrossRef]

Sensors 2017, 17, 1491

7. 8. 9. 10. 11. 12.

13. 14. 15.

16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29.

30. 31.

15 of 16

Lv, F.; Zhao, T.; Nevatia, R. Camera calibration from video of a walking human. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1513–1518. [PubMed] Xiang, X.Q.; Pan, Z.G.; Tong, J. Depth camera in computer vision and computer graphics: An overview. J. Front. Comput. Sci. Technol. 2011, 5, 481–492. Chen, D.; Li, G.; Sun, Y.; Kong, J.; Jiang, G.; Tang, H.; Ju, Z.; Yu, H.; Liu, H. An Interactive Image Segmentation Method in Hand Gesture Recognition. Sensors 2017, 17, 253. [CrossRef] [PubMed] Miao, W.; Li, G.F.; Jiang, G.Z.; Fang, Y.; Ju, Z.J.; Liu, H.H. Optimal grasp planning of multi-fingered robotic hands: A review. Appl. Comput. Math. 2015, 14, 238–247. Zhang, Z. Microsoft kinect sensor and its effect. IEEE Multimed. 2012, 19, 4–10. [CrossRef] Henry, P.; Krainin, M.; Herbst, E.; Ren, X.; Fox, D. RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments. In The 12th International Symposium on Experimental Robotics (ISER); Springer: Berlin/Heidelberg, Germany, 2010; pp. 22–25. Burrus, N. Kinect Calibration. Available online: http://nicolas.burrus.name/index.php/Research/ KinectCalibration (accessed on 10 November 2011). Yamazoe, H.; Habe, H.; Mitsugami, I.; Yagi, Y. Easy Depth Sensor Calibration. In Proceedings of the 2012 21st International Conference on Pattern Recognition (ICPR), Tsukuba, Japan, 11–15 November 2012; pp. 465–468. Herrera, D.; Kannala, J.; Heikkilä, J. Accurate and Practical Calibration of a Depth and Color Camera Pair. In International Conference on Computer Analysis of Images and Patterns; Springer: Berlin, Germany, 2011; pp. 437–445. Zhang, C.; Zhang, Z. Calibration between depth and color sensors for commodity depth cameras. In Computer Vision and Machine Learning with RGB-D Sensors; Springer: Berlin, Germany, 2014; pp. 47–64. Smisek, J.; Jancosek, M.; Pajdla, T. 3D with Kinect. In Consumer Depth Cameras for Computer Vision; Springer: Berlin, Germany, 2013; pp. 3–25. Herrera, D.; Kannala, J.; Heikkilä, J. Joint depth and color camera calibration with distortion correction. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2058–2064. [CrossRef] [PubMed] Raposo, C.; Barreto, J.P.; Nunes, U. Fast and Accurate Calibration of a Kinect Sensor. In Proceedings of the 2013 International Conference on 3DTV-Conference, Washington, DC, USA, 29 June–1 July 2013; pp. 342–349. Guo, L.P.; Chen, X.N.; Liu, B. Calibration of Kinect sensor with depth and color camera. J. Image Graph. 2014, 19, 1584–1590. Han, Y.; Chung, S.L.; Yeh, J.S.; Chen, Q.J. Calibration of D-RGB camera networks by skeleton-based viewpoint invariance transformation. Acta Phys. Sin. 2014, 63, 074211. Heikkila, J. Geometric camera calibration using circular control points. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1066–1077. [CrossRef] Weng, J.; Cohen, P.; Herniou, M. Camera calibration with distortion models and accuracy evaluation. IEEE Trans. Pattern Anal. Mach. Intell. 1992, 14, 965–980. [CrossRef] Wang, J.; Shi, F.; Zhang, J.; Liu, Y. A new calibration model of camera lens distortion. Pattern Recognit. 2008, 41, 607–615. [CrossRef] Yin, Q.; Li, G.F.; Zhu, J.G. Research on the method of step feature extraction for EOD robot based on 2D laser radar. Discrete Cont Dyn-S 2015, 8, 1415–1421. Fang, Y.F.; Liu, H.H.; Li, G.F.; Zhu, X.Y. A multichannel surface emg system for hand motion recognition. Int. J. Hum. Robot. 2015, 12, 1550011. [CrossRef] Zhang, Z. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1330–1334. [CrossRef] Unnikrishnan, R.; Hebert, M. Fast Extrinsic Calibration of a Laser Rangefinder to a Camera; Carnegie Mellon University: Pittsburgh, PA, USA, 2005. Lee, H.; Rhee, H.; Oh, J.H.; Park, J.H. Measurement of 3-D Vibrational Motion by Dynamic Photogrammetry Using Least-Square Image Matching for Sub-Pixel Targeting to Improve Accuracy. Sensors 2016, 16, 359. [CrossRef] [PubMed] Cheng, K.L.; Ju, X.; Tong, R.F.; Tang, M.; Chang, J.; Zhang, J.J. A Linear Approach for Depth and Colour Camera Calibration Using Hybrid Parameters. J. Comput. Sci. Technol. 2016, 31, 479–488. [CrossRef] Li, G.F.; Gu, Y.S.; Kong, J.Y.; Jiang, G.Z.; Xie, L.X.; Wu, Z.H. Intelligent control of air compressor production process. Appl. Math. Inform. Sci. 2013, 7, 1051. [CrossRef]

Sensors 2017, 17, 1491

32. 33. 34. 35. 36. 37.

16 of 16

Li, G.F.; Miao, W.; Jiang, G.Z.; Fang, Y.F.; Ju, Z.J.; Liu, H.H. Intelligent control model and its simulation of flue temperature in coke oven. Discrete Cont Dyn-S 2015, 8, 1223–1237. [CrossRef] Li, Z.; Zheng, J.; Zhu, Z.; Yao, W.; Wu, S. Weighted guided image filtering. IEEE Trans. Image Proc. 2015, 24, 120–129. Chen, L.; Tian, J. Depth image enlargement using an evolutionary approach. Signal Proc. Soc. Image Commun. 2013, 28, 745–752. [CrossRef] Liu, A.; Marschner, S.; Snavely, N. Caliber: Camera Localization and Calibration Using Rigidity Constraints. Int. J. Comput. Vis. 2016, 118, 1–21. [CrossRef] Li, G.F.; Kong, J.Y.; Jiang, G.Z.; Xie, L.X.; Jiang, Z.G.; Zhao, G. Air-fuel ratio intelligent control in coke oven combustion process. INF Int. J. 2012, 15, 4487–4494. Li, G.F.; Qu, P.X.; Kong, J.Y.; Jiang, G.Z.; Xie, L.X.; Gao, P. Coke oven intelligent integrated control system. Appl. Math. Inform. Sci. 2013, 7, 1043–1050. [CrossRef] © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Global Calibration of Multiple Cameras Based on Sphere Targets.

Calibration of action cameras for photogrammetric purposes.

A split-optimization approach for obtaining multiple solutions in single-objective process parameter optimization.

A Global Calibration Method for Widely Distributed Cameras Based on Vanishing Features.

A Sequential Optimization Calibration Algorithm for Near-Field Source Localization.

Simultaneous localization and calibration for electromagnetic tracking systems.

Multimodality calibration for simultaneous fluoroscopic and nuclear imaging.

A unified approach to the optimization of brachytherapy and external beam dosimetry.

Stereoscopic Surgical Recording Using GoPro Cameras: A Low-Cost Means for Capturing External Eye Surgery.

Design of Endoscopic Capsule With Multiple Cameras.

Simultaneous optimization method for absorption spectroscopy postprocessing.

Simultaneous Optimization of Multiple Response Variables for the Gelatin-chitosan Microcapsules Containing Angelica Essential Oil.

Performance analysis of the Microsoft Kinect sensor for 2D Simultaneous Localization and Mapping (SLAM) techniques.

A sample-effective calibration design for multiple components.

The external approach for rhinoplasty and septoplasty.

Easy-to-use, general, and accurate multi-Kinect calibration and its application to gait monitoring for fall prediction.

Simultaneous beam sampling and aperture shape optimization for SPORT.

Dose calibration optimization and error propagation in polymer gel dosimetry.

A collective neurodynamic optimization approach to bound-constrained nonconvex optimization.

An improved approach to align and embed multiple brain samples in a gelatin-based matrix for simultaneous histological processing.

The Joint External Evaluation of Taiwan: The External Evaluators' Perspective.

Simultaneous optimization of nanocrystalline SnO2 thin film deposition using multiple linear regressions.

A graph-based ant colony optimization approach for process planning.

Optimization of external beam radiation therapy.