sensors Article

A High Spatial Resolution Depth Sensing Method Based on Binocular Structured Light Huimin Yao 1 , Chenyang Ge 1,2, *, Jianru Xue 1,3 and Nanning Zheng 1,2 1 2 3

*

The Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an 710049, China; [email protected] (H.Y.); [email protected] (J.X.); [email protected] (N.Z.) The National Engineering Laboratory for Visual Information Processing and Applications, Xi’an Jiaotong University, Xi’an 710049, China Shaanxi Provincial Key Laboratory of Digital Technology and Intelligent Systems, Xi’an 710049, China Correspondence: [email protected]; Tel.: +86-29-8266-5270

Academic Editor: Vittorio M. N. Passaro Received: 17 January 2017; Accepted: 5 April 2017; Published: 8 April 2017

Abstract: Depth information has been used in many fields because of its low cost and easy availability, since the Microsoft Kinect was released. However, the Kinect and Kinect-like RGB-D sensors show limited performance in certain applications and place high demands on accuracy and robustness of depth information. In this paper, we propose a depth sensing system that contains a laser projector similar to that used in the Kinect, and two infrared cameras located on both sides of the laser projector, to obtain higher spatial resolution depth information. We apply the block-matching algorithm to estimate the disparity. To improve the spatial resolution, we reduce the size of matching blocks, but smaller matching blocks generate lower matching precision. To address this problem, we combine two matching modes (binocular mode and monocular mode) in the disparity estimation process. Experimental results show that our method can obtain higher spatial resolution depth without loss of the quality of the range image, compared with the Kinect. Furthermore, our algorithm is implemented on a low-cost hardware platform, and the system can support the resolution of 1280 × 960, and up to a speed of 60 frames per second, for depth image sequences. Keywords: depth sensing; binocular structured light; spatial resolution; Kinect; speckle pattern

1. Introduction At present, human computer interaction based on three-dimensional (3D) depth information has become highly attractive in the areas of image processing and computer vision, which further promotes the development of 3D depth acquisition technology. In addition, the recently developed 3D depth sensors, such as the Microsoft Kinect (Microsoft Corporation, Redmond, Washington, DC, USA) [1], have been applied in more fields, such as gesture recognition [2–5], intelligent driving [6,7], surveillance [8], 3D reconstruction [9,10], and so on. 3D depth acquisition technology measures the distance information between objects and a depth sensor. It represents a non-contact, non-destructive measurement technology. The fully studied methods for acquiring depth information can be classified into two categories: passive methods and active methods. Binocular stereo vision is an active research topic in passive methods. Some authors [11,12] have implemented an entire stereo vision process on a hardware platform accompanied by progress of the algorithms and field programmable gate array (FPGA) technology. However, the considerable computational expense still limits its industrialization, as some algorithms have to be realized on a custom-build FPGA. One the other hand, stereo vision still does not have a good method to acquire the

Sensors 2017, 17, 805; doi:10.3390/s17040805

www.mdpi.com/journal/sensors

Sensors 2017, 17, 805

2 of 18

dense disparity map for a large textureless area, such as a white wall. Therefore, no related electronic products have yet been released. Time-of-Flight (ToF) and structured light [13,14] is mainly used in the active depth sensing method. The ToF technology acquires the distance information based on measuring the flight time along the light path. Samsung released the first complementary metal oxide semiconductor (CMOS)-based ToF sensor in 2012, where synchronized depth (480 × 360) and RGB images (1920 × 720) can be obtained by a single image sensor [15]. The Kinect 2 (Kinect for XBox One) issued in 2013 by Microsoft is also based on the ToF principle [16]. However, in terms of miniaturization and integration, the advantages of technology based on structured light are not ignored. In addition to the Microsoft Kinect, other companies have also researched and released their own depth-sensing cameras, or intend to integrate them into their electronic products. For example, in 2013 Apple purchased Primesense and claim an important invention patent: “Depth Perception Device and System” [17], and intends to employ it as an input device for human–machine interfaces for their product. In early 2014, Intel announced a three-dimensional depth imaging device, the “RealSense 3D Camera,” and the products RealSense R200 and SR300 have been on the market since 2016. In February 2014, Google announced a new project “Tango,” which intends to make a smartphone with 3D visual recognition capabilities. In 2015, the Microsoft holographic visor “Hololens” appeared on the market. However, the depth-sensing systems based on structured light are not mature when compared to two-dimensional (2D) imaging systems. The resolution and accuracy of the depth image are lower, and the performance is not very reliable in the use of moving objects or dynamic scenes. In [18,19], the authors combine color images to restore the corresponding depth image. The methods can improve the image quality and reduce the noise of the acquired depth map, but the improvement of the measurement precision is limited. Moreover, the hardware implementation of depth sensors based on structured light remains a black box, to a large extent, in the literature. In [20], we have proposed a full very large scale integration (VLSI) implementation method to obtain a high-resolution and high-accuracy depth map based on randomized speckle patterns. Then it was found that the spatial resolution of the depth map has the potential to be further improved. Thus, in this paper, we add an infrared camera at the right of the laser projector and employ the local binocular stereo matching algorithm to improve the performance of the depth map. Several binocular structured light approaches have been proposed for acquiring a disparity map. Different projected patterns and disparity estimation methods are employed. Ma et al. combined a color-coded pattern based on vertical stripes and the semi-global stereo matching (SGM) algorithm to obtain a dense facial disparity map [21]. An et al. provided the comparative analysis of various structured lighting techniques with a view for facial reconstruction [22]. Nguyen et al. employed the dot-coded pattern to enhance textures on plants, and five pairs of RGB cameras to reconstruct the whole plants [23]. In this system, stereo block matching is applied to calculate the matching results. The gray or color-coded patterns mentioned above are sensible to ambient light and the surface color of objects. Yang et al. used the light stripes and corresponding decoding method to measure the 3D surfaces [24]. However, this method is not suitable for moving objects. The experimental setup proposed by Dekiff et al. also used the speckle pattern and digital image correlation [25]. There is a triangulation angle of about 30◦ between the two cameras. It is a short-range measurement system. The measuring distance is under 1 m and the size of the measuring field is approximately 24 cm × 18 cm. The Intel RealSense Camera SR300 includes two infrared cameras and has similar performance. Our key contribution in this paper, beyond [20], is that an infrared camera is added at the right of the laser projector, then the binocular matching mode, performed between the captured left and right patterns, is employed to improve the spatial resolution in the X-Y direction. The mismatching and occlusion is unavoidable in the binocular matching process, so the monocular matching mode, performed between the left and the reference patterns, is also employed to revise the matching results.

Sensors 2017, 17, 805 Sensors 2017, 17, 0805; doi: 10.3390/s17040805

3 of 18 3 of 18

results. the Finally, the complete is implemented and on hardware a low-costplatform hardware platform Finally, complete system issystem implemented and verified onverified a low-cost with a laser with a laser and two-camera setup. projector andprojector two-camera setup. The rest rest of ofthis thispaper paperisisorganized organizedasas follows: Section 2, the basic principles mentioned in The follows: In In Section 2, the basic principles mentioned in our our depth-sensing method are briefly introduced. The acquisition steps of the depth map from depth-sensing method are briefly introduced. The acquisition steps of the depth map from projected projected speckle patterns areindescribed Section 3. Section 4 gives the architecture full-pipelineofarchitecture of speckle patterns are described Section 3.inSection 4 gives the full-pipeline our proposed our proposed method. Section 5 discusses the utility of our method and presents some simulation method. Section 5 discusses the utility of our method and presents some simulation results. The paper results. The paper ends within some conclusions in Section 6. ends with some conclusions Section 6. 2. Related Ranging Principles In [20], a consistency enhancement algorithm is proposed to make the intensity of the pattern formed at different distances as consistent as possible so as to ensure the matching accuracy in the disparity estimation process. However, However, the surface material material of the objects also affects the projected so we we have havetotoselect selecta alarger largerblock, block,compared compared with smallest in theory, to perform pattern, so with thethe smallest sizesize in theory, to perform the the block-matching Therefore, we add an infrared camera the right the projector laser projector in block-matching step.step. Therefore, we add an infrared camera at theatright of theoflaser in order order to capture a pattern is nearly consistent with the captured left pattern. Then the smallest to capture a pattern that is that nearly consistent with the captured left pattern. Then the smallest image imagecan block can be employed in the binocular matching mode performed between the captured left block be employed in the binocular matching mode performed between the captured left and right and right patterns so as to improve the spatial resolution of the depth map. patterns so as to improve the spatial resolution of the depth map. showsthe theposition position relationship between the projector and cameras. One camera is Figure 11shows relationship between the projector and cameras. One camera is added added to the right of projector the lasercompared projector with compared withdepth-sensing traditional depth-sensing The to the right of the laser traditional method. The method. experimental experimental platform contains a projector speckle similar that and usedtwo in platform contains a projector which projects awhich speckleprojects patternasimilar to pattern that used in the to Kinect the Kinect and cameras with the same performance parameters. The is between the cameras with thetwo same performance parameters. The projector is between theprojector two cameras. The three two cameras. The units are located in the the optical same straight line are andparallel. the optical axes of each are units are located in three the same straight line and axes of each In our depth-sensing parallel. the In our depth-sensing method, the triangulation principle correlation method, triangulation principle and digital image correlation (DIC)and are digital related.image The triangulation (DIC) are isrelated. The triangulation is toinformation transform the disparity to distance relationship. information principle to transform the disparityprinciple to distance through the geometrical through the relationship. the digitaltoimage correlation (DIC) is performed to Generally, thegeometrical digital image correlationGenerally, (DIC) is performed identify the corresponding points in the identify theimages. corresponding points in the individual images. individual

s

s

Figure 1. Front view of the experiment platform.

2.1. Triangulation Principle 2.1. Triangulation Principle In the the binocular binocular stereo stereo system, system, the the problem problem of of occlusion occlusion is is unavoidable. unavoidable. Thus, Thus, we we combine combine the the In monocular ranging ranging method methodtotocompensate compensatefor forthe thedisparity disparity occluded areas. In this section, monocular of of thethe occluded areas. In this section, the the ranging principle and the transformation relationship of the two displacement vectors from two ranging principle and the transformation relationship of the two displacement vectors from two modes modes are introduced. The ranging on the triangulation Asinshown are introduced. The ranging method method is basedisonbased the triangulation principle.principle. As shown Figure in 2, FigureA2, point is an arbitrary point in the 3D space marked by the speckle pattern. The two point is an arbitrary point in the 3D space marked by the speckle pattern. The two points, P and t l points, and are , respectively, projection of on the left and right camera imaging Pr , respectively, the projectionare of Athe t on the left and right camera imaging planes, so they are the planes, so theyimage are the corresponding image of images. in the left and toright two images. corresponding points of At in the left and points right two According photogrammetry, According photogrammetry, wetest obtain the test pointthe vertical in twodistance images, if we obtaintothe displacement of ifthe pointthe At displacement in two images,ofwe can obtain we can obtain the vertical distance from point to the ranging sensor. The two cameras are from point At to the ranging sensor. The two cameras are located as shown in Figure 1, and their located as shown in Figure andtranslation their coordinates differ as per the translation distance coordinates only differ as per1,the distanceonly in the X direction. Thus, the image planes in of the the X direction. Thus, the image planes of the two cameras are on the same plane, and the epipolar lines two cameras are on the same plane, and the epipolar lines are aligned with the scanline of the image. are alignedthe with the scanline∆y of in thethe image. Therefore, displacement ∆ require in the Y be Therefore, displacement Y direction can the be ignored, we only andirection estimatecan of the ignored, we only an estimate of the ∆ ofinAthe X direction between the displacement ∆x inrequire the X direction between the displacement projection point t on the left and right camera projection point of on the left and right camera imaging planes. We assume thephysical optical imaging planes. We assume that the optical focal length of the image sensor is f , µthat is the focal length of the image sensor is , is the physical pixel-pitch, and is the baseline distance between the laser projector and image sensor. According to the relationship between similar

Sensors 2017, 17, 805

4 of 18

Sensors 2017, 17, 0805; doi: 10.3390/s17040805

4 of 18

pixel-pitch, and s is the baseline distance between the laser projector and image sensor. According to d− f lr triangles, we can obtain similar the proportional relationship , where the parameter = 2s−isll −the the relationship between triangles, we can obtain the = proportional relationship , d

d

. Then the depth of point canthe bedepth calculated distance the sensor the point where thebetween parameter d is theand distance between the sensor andvalue the point At . Then valueasof point At can be calculated as 2 2 =2s f = 2s f (1) d= (1) += ∆ l l + lr µ∆xbi where ∆ represents the displacement of the corresponding image points , , respectively, on where ∆xbi represents the displacement of the corresponding image points Pl , Pr , respectively, on the the left and right images. left and right images. At

Aref d

Δxbi

lr′

Δxleft le

Fc

Ol

Pl

{

Pr′

lr

d ref

{

{

{

ll

V1

Pref

s

Pe

Pr

Fe

V2

Fc

f

Or

s

Figure Figure 2. 2. Schematic Schematic diagram diagram of of the the triangulation triangulation principle. principle.

Next, the thedepth depthsensing sensingtechnology technologywith withonly only left-side camera is discussed. In the process, Next, thethe left-side camera is discussed. In the process, we we need to acquire a reference image the camera left camera a known distance. As shown in Figure 2, need to acquire a reference image fromfrom the left at a at known distance. As shown in Figure 2, the the point the position of A point at the reference distance. According to the relationship point Ar is theisposition of point at the reference distance. According to the relationship between t s−ll −le −µ∆xle f t dre f − f f between similar we triangles, wethe canproportional obtain the relationship proportionald−relationship =. similar triangles, can obtain = and = s−ll −le and = d

Then the∆ depth value point value At canofbe calculated . Then theofdepth point canasbe calculated as

s

s

dre f

11 d == µ −−s f ∆x∆le f t ++d 1 1 re f

(2) (2)

where image points on where ∆x ∆ le f t represents representsthe thedisplacement displacementofofthe thecorresponding corresponding image pointsPl , P,re f , respectively, , respectively, the left and its reference images. on the left and its reference images. From that the the inverse inverse of of distance distance d d follows From Equations Equations (1) (1) and and (2), (2), we we can can observe observe that follows aa linear linear relationship with the displacement ∆x or ∆x ; thus, there is a simple linear relationship the le f t ∆ relationship with the displacementbi ∆ or ; thus, there is a simple linear between relationship two disparities. The point At can located in front or behind, theof, reference point Arereference f , so the value between the two disparities. Thebepoint can be of, located in front or behind, the point of displacement ∆x can be positive or negative. In this paper, we set a direction for the displacement. le f t , so the value of displacement ∆ can be positive or negative. In this paper, we set a The projection point Pl is assigned the end point andasthe Pr vector, or Pre f direction for the displacement. Theasprojection pointof theisvector, assigned theprojection end pointpoint of the →







0 as thethe start point of the vector.orThen we ∆xpoint Pre the −∆xleThen ∆xbican = Pget and projection point as can the get start vector. =. le f t = of f Pl = f t andwe r Pl ∆= ∆xbi Combining Equations (1) and (2), we can obtain the relationship of the two displacement as follows: = −∆ and ∆ = = ∆ . Combining Equations (1) and (2), we can obtain the relationship of the two displacement as → follows: → 2s f ∆xbi = 2∆xle f t + (3) µd 2 re f ∆ = 2∆ + (3)

Sensors 2017, 17, 805

5 of 18

2.2. Digital Image Correlation The digital image correlation (DIC) algorithm was developed at the end of last century and has been widely used to analyze deformation and identify the corresponding points between different images [25,26]. Its principle is to match subsets from different digital images by an appropriate similarity calculation method. The process is to ensure the parameters in the displacement shape function through a correlation function. The displacement mentioned in the Section 2.1 is contained in these parameters. Assume that the coordinate of point Pl in the first image (left speckle image) is ( x0 , y0 ). In order to determine the corresponding point ( x00 , y00 ) in second image (right or reference speckle image), a square subset around the central point ( x0 , y0 ) is extracted. Each point in the subset is represented by ( x, y). If the represented subset’s surface is perpendicular to the optical axis, it only brings in rigid displacement; the mapping position ( x 0 , y0 ) in the second image of each point in this subset is calculated by: x0 = x + u (4) where u is the displacement of the subset center point in the X direction (the displacement in the Y direction can be ignored in our paper), while, in most situations, the speckle pattern projected on the tested object may be deformed. Thus we introduce the first-order displacement shape function to approximate the mapping position x 0 = x + u + u x ( x − x0 ) + u y ( y − y0 )

(5)

where the parameters u x , uy are the components of the first-order displacement gradient. Lu and Cary [27] propose a DIC algorithm using a second-order displacement shape function for large deformations, but it has twelve parameters which increases the computational difficulty and complexity. Therefore, the displacement is estimated according to Equation (5). The block-matching algorithm is employed to estimate the displacement u in Equations (4) and (5), which is also ∆xbi or ∆xle f t in Section 2.1. In the implementation process, first we assign different initial values for parameters u x and uy , and then extract the different subsets with the size of m × m from the second image. Finally, Equation (6) is employed as similarity criterion to find the optimal matching subsets:  C u, u x , uy = p

∑ ∑ f ( x, y) g( x 0 , y0 ) p , ∑ ∑ f ( x, y) f ( x, y) ∑ ∑ g( x 0 , y0 ) g( x 0 , y0 )

(6)

where the f ( x, y), g( x 0 , y0 ) are, respectively, the discrete gray value of the first and second patterns. The digital image correlation (DIC) algorithm is a local stereo matching algorithm. In addition to this algorithm, there are other stereo matching algorithms, such as semi-global block matching [28], belief propagation, graph cuts [29], and so on. However, these algorithms are not adaptable to our system. The reasons are as follows. Firstly, in the structured light ranging, the projected pattern is coded following a certain principle. The decoding method that is the ranging algorithm is designed according to the coding method. In this paper, the speckle pattern is binary, and composed of randomly distributed isolated speckles. Every image block extracted from the pattern is unique, so the block-matching algorithm is enough to decode the pattern. Secondly, other stereo matching algorithms are very complex and utilized to optimize the depth map, such as the problems of texturelessness and occlusion. In our paper, the infrared camera only can capture the projected image itself without additional information from the surrounding environment, such as the color or gray information. The edge information and other features of objects are very difficult to calculate from the captured image, so the optimization of these stereo-matching algorithms for our result is limited. On the other hand, the scene has been marked by the projected spots, so there is no textureless area, and the occlusion area that exists in binocular matching can be corrected by the monocular matching results. Thirdly, our system is implemented and verified on a hardware platform. An iterative algorithm is

Sensors 2017, 2017, 17, 17, 0805; 805 doi: 10.3390/s17040805 Sensors

of 18 18 66 of

optimal solution of the established energy function while the iterations are different for every pixel always used in the other stereo matching algorithms to find the optimal solution of the established or image, so the hardware costs are uncertain, which does not conform to the hardware design energy function while the iterations are different for every pixel or image, so the hardware costs are principle. Finally, considering the trade-off between accuracy and hardware costs, we employ the uncertain, which does not conform to the hardware design principle. Finally, considering the trade-off block-matching algorithm in our system design. between accuracy and hardware costs, we employ the block-matching algorithm in our system design. 3. The Depth-Sensing Method from Two Infrared Cameras Based on Structured Light 3. The Depth-Sensing Method from Two Infrared Cameras Based on Structured Light Figure 3 shows the flowchart of our method. The method can be divided into seven steps and Figure 3 shows the flowchart of our method. The method can be divided into seven steps and the the details are described as follows: details are described as follows:

Rl

Figure Figure 3. 3. Flow Flow diagram diagram of of depth-sensing depth-sensing method. method.

Step Step 1: 1: Encode Encode the the space space using using the the speckle-encoded speckle-encoded pattern. pattern. Based Based on on the the active active mode, mode, the the projector projector projects projects the the speckle-encoded speckle-encoded pattern pattern to to encode encode or or mark mark the space or object. The pattern is fixed and composed by the randomly distributed spots that the space or object. The pattern is fixed and composed by the randomly distributed spots that are are formed by interference interferenceof ofcoherent coherentlaser laserbeams. beams.The The basic design principle of projected patterns is formed by basic design principle of projected patterns is that that the pattern in sub-window any sub-window is unique and fully identified. the pattern in any is unique and fully identified. Step fromthe theleft leftcamera. camera. Step 2: 2: Capture Capture and and solidify solidify the the reference reference pattern pattern Rl from As As shown shown in in Figure Figure 1, 1, the the two two identical identical cameras cameras are are symmetrically symmetrically arrayed arrayed on on the the both both sides sides of of the pattern projector and the optical axes of the cameras and projector are parallel with each other. the pattern projector and the optical axes of the cameras and projector are parallel with each other. The narrow band-pass band-passfilters filtersare arepasted pasted lens of two cameras so infrared the infrared a The narrow onon thethe lens of two cameras so the light light withinwithin a scope scope of certain wavelengths, are adopted the projector, can be captured. This design of certain wavelengths, which which are adopted by the by projector, can only beonly captured. This design mainly mainly eliminates the disturbance of other wavelengths of light or sources and obtains a clear and eliminates the disturbance of other wavelengths of light or sources and obtains a clear and stable stable encoded pattern projected by the projector. encoded pattern projected by the projector. Before Before working, working, we we need need to to capture capture and and solidify solidify the the left-side left-side reference reference pattern pattern for for disparity disparity estimation. The reference pattern is generated by capturing the speckle pattern projected estimation. The reference pattern is generated by capturing the speckle pattern projected on on the the standard plane perpendicular to the optical axis (Z-axis) of the laser projector and cameras. The standard plane perpendicular to the optical axis (Z-axis) of the laser projector and cameras. The vertical vertical between the plane standard andsensor the depth sensorThe is known. selection distancedistance dre f between the standard and plane the depth is known. selectionThe must ensure must ensure that a majority part, or the whole speckle pattern, can be projected onto the standard that a majority part, or the whole speckle pattern, can be projected onto the standard plane and can plane and can occupy theview entire fieldleft of image view of the left image sensor. Then the pattern reference also occupy thealso entire field of of the sensor. Then the reference speckle is speckle pattern is processed by the speckle pattern preprocessing module described in step 3, and processed by the speckle pattern preprocessing module described in step 3, and the fixed image is the fixed image is stored in memory. In the normal processing stage, the system only needs to read

Sensors 2017, 17, 805

7 of 18

Sensors 2017, 17, 0805; doi: 10.3390/s17040805

7 of 18

stored in memory. In the normal processing stage, the system only needs to read the reference speckle pattern from memory for the disparity estimation of the captured speckle pattern the reference speckle pattern from memory for the disparity estimation of the image. captured speckle

pattern image. Step 3: Preprocess the captured or input speckle patterns Il , Ir from the two cameras. Step 3: Preprocess the captured input speckle patterns , from the twoand cameras. For the original speckle patternsor taken directly from the cameras, the intensity the size of the spots formed with laser interference decrease with the increment of the projection distance and size mayof be For the original speckle patterns taken directly from the cameras, the intensity and the uneven the whole Hence, a consistency enhancement algorithm proposed in [20] is used the spots in formed with image. laser interference decrease with the increment of the projection distance and to enhance the input speckle patterns to make it more discriminative so as to improve the matching may be uneven in the whole image. Hence, a consistency enhancement algorithm proposed in [20] algorithm combines grayscale transformation and discriminative histogram equalization, is suitablethe for isaccuracy. used to The enhance the input speckle patterns to make it more so as to improve hardware implementation, and can be described as matching accuracy. The algorithm combines grayscale transformation and histogram equalization, ( is suitable for hardware implementation, and can be described as β × ( f ( x, y) − f ( x, y)), if f ( x, y) ≥ ∗ f ( x, y) = , − , , if , ≥ × ∗ if f ( x, y) ≤ , = 0, 0, if , ≤

f ( x, y) ̅ , f ( x, y) ̅ ,

(7) (7)

where f (̅ x,, y) isisthe average graygray valuevalue of the of subset the center pixelcenter β = gray y) ( x, y) and the average the with subset with the pixel , and where re f / f ( x,= ̅ , is a scale ⁄ factor. is a scale factor. Step4:4:Detect Detectthe theshadow shadowarea areaofofthe theenhanced enhancedpatterns patterns Il,, Ir .. Step As shown shown ininFigure 4, because of theofdistance difference between between the foreground and background, As Figure 4, because the distance difference the foreground and a blank area aorblank no projection area is formed, that the shadow The subset in the area background, area or no projection area is is, formed, that is,area. the shadow area. Theshadow subset in theis unable to use Equation (6) to calculate the correlation coefficient. This will affect the next disparity shadow area is unable to use Equation (6) to calculate the correlation coefficient. This will affect the estimation step. Hence, in theHence, preprocessing module for the image for Il and r , the shadow next disparity estimation step. in the preprocessing module the Iimage and area , theis detectedarea andismarked. disparity will not be performed in be theperformed marked area. Themarked shadow shadow detectedThe and marked.estimation The disparity estimation will not in the area detecting method used to detect the number of pixels of speckle spots within a certain size of area. The shadow area detecting method used to detect the number of pixels of speckle spots within subset. the number a threshold, the acenter pixel the of the subset is in athe certain sizeIfof the subset.isIfless the than number is less than threshold, center pixel ofthe theshadow subset isarea, in otherwise the pixel is in the projection area. the shadow area, otherwise the pixel is in the projection area.

shr

shl

Ol

prol

pror

s

s

f Or

Figure Figure4.4.Schematic Schematicdiagram diagramofofthe theprinciple principleofofshadow shadowformation. formation.

For the left camera, the shadow area shl is located at the left side of the projection prol of the foreground while, for the right camera, the shadow shr is at the right side of pror . It is clear that the edge of the shadow area is correlated with the edge of the foreground. This characteristic can be used to segment the object located in the foreground. The results after preprocessing for a captured speckle pattern are shown in Figure 5. It is obviously that, in Figure 5b, after consistency enhancement, the intensity of spots in whole speckle pattern is more consistent. The captured speckle pattern is from left camera, thus in Figure 5c, the shadow of the object is at the left side of the projection.

(a)

(b)

(c)

Figure 5. The speckle pattern and the results after preprocessing: (a) the speckle pattern; (b) the consistency enhancement result; and (c) the shadow detection result.

Ol

s

Or

s

Sensors 2017, 17, 805

8 of 18

Figure 4. Schematic diagram of the principle of shadow formation.

Sensors 2017, 17, 0805; doi: 10.3390/s17040805

8 of 18

For the left camera, the shadow area ℎ is located at the left side of the projection of the foreground while, for the right camera, the shadow ℎ is at the right side of . It is clear that the edge of the shadow area is correlated with the edge of the foreground. This characteristic can be used to segment the object located in the foreground. The results after preprocessing for a captured speckle pattern are shown in Figure 5. It is obviously that, in Figure the intensity of(c) spots in whole speckle (a) 5b, after consistency enhancement, (b) pattern is more consistent. The captured speckle pattern is from left camera, thus in Figure 5c, the Figure 5. The speckle pattern and the results after preprocessing: (a) the speckle pattern; (b) the Figure 5. The is speckle andof thethe results after preprocessing: (a) the speckle pattern; (b) the shadow of the object at thepattern left side projection. consistency enhancement enhancement result; result; and and (c) (c) the the shadow shadow detection detection result. result. consistency

Step 5: Estimate the disparity based on the two block matching modes. Step 5: Estimate the disparity based the two blockInmatching modes. between left and right In binocular matching, occlusion is on unavoidable. the matching patterns, Inthere are mismatching areasis caused by occlusion. In order to left solve binocular matching, occlusion unavoidable. In the matching between andthis rightproblem patterns, and therethe are matching mismatching areas caused by occlusion. In order to solve this problem improve the improve accuracy, we employ two matching modes in our paper.and One is to estimate matching accuracy, we employ two matching modes in our paper. One is to estimate the disparity the disparity between the left and right pattern, called the binocular mode, another is between the left between thereference left and right pattern, the binocular another is between pattern and its pattern fixedcalled in memory, calledmode, the monocular mode. the left pattern and its reference pattern fixed in memory, called the monocular mode. For the monocular mode, Equation (4) is employed to estimate the displacement. As shown in For the monocular mode, Equation (4) is employed to estimate the displacement. As shown in Figure 6, we extract a square image block (or subset) of size × from the left speckle pattern. Figure 6, we extract a square image block (or subset) of size m × m from the left speckle pattern. Due Due to the position of the image sensors and the laser projector, the block matching algorithm is to the position of the image sensors and the laser projector, the block matching algorithm is confined confined along the X-axis only. Thus, a search window of size × from the left reference along the X-axis only. Thus, a search window of size m × M from the left reference speckle pattern speckle pattern and is extracted, the pixel ofand the the image blockwindow and theshare searching window is extracted, the centraland pixel of central the image block searching the same Y sharecoordinates. the sameThe Y m coordinates. The mnumbers, parameters odd numbers, and block-matching ≪ . Then the parameters are odd and mare  M. Then the full-search full-search method and (6) matching are usedblock to find optimum methodblock-matching and the correlation Equation (6)the are correlation used to find Equation the optimum andthe estimate matching block and estimate thecenter displacement the center ofEquation the image Finally, the displacement ∆xle f t of the pixel of the∆image of block. Finally, pixel we use (3) block. to convert 0 . the displacement ∆x to ∆x we use Equation (3) to convert the displacement ∆ to ∆ . le f t bi

m

M

m

Δx

m

Figure 6. Schematic diagram disparity estimation. Figure 6. Schematic diagramofofthe the block-matching-based block-matching-based disparity estimation.

For the binocular mode, the baseline length is twice that of the monocular mode; therefore, the deformation is larger and needs to be considered. In the matching process, Equation (5) is employed to estimate the displacement. The parameter is set as 0, and is set as –3/8, 0, 3/8, respectively. We round the calculated the pixel position to extract a square block from search window. The remaining process is the same as the monocular mode and then we acquire the

Sensors 2017, 17, 805

9 of 18

Sensors 2017, 17, 0805; doi: 10.3390/s17040805

9 of 18

For the binocular mode, the baseline length is twice that of the monocular mode; therefore, the deformation is larger and needs to be considered. In the matching process, Equation (5) is employed to part, if |∆ − ∆ | < ℎ , then ∆ is selected as the final displacement ∆ , otherwise we estimate the displacement. The parameter u x is set as 0, and uy is set as –3/8, 0, 3/8, respectively. We compare correlation coefficients the two matching modes andThe theremaining larger one is roundthe the maximum calculated the pixel position to extract of a square block from search window. selected as is the ∆ mode . For and the then pixelweinacquire the uncrossed part, we process thefinal samedisplacement as the monocular the displacement ∆xbicannot for the obtain center the displacement ∆ , therefore, the ∆ is selected as the final displacement ∆ . pixel of the image block.

Based on the obtained the final displacement ∆ , we can calculate the distance information for Step 6: The integration of two displacements and the depth mapping. the target object by triangulation according to Equation (1). As shown in Figure 7, the field of view (FoV) of the left image sensor can be divided into two Step 7: Obtain the depth map. part, the other is the uncrossed part. For the pixel in the crossed parts. One is the field of the crossed 0 part, if ∆x − ∆xbi < th, then ∆xbi is selected as the final displacement ∆x, otherwise we compare

Move thebicenter pixel of the image block to the next neighboring pixel in the same row, and

the maximum correlation coefficients of the two matching modes and the larger one is selected as the repeat steps 5 and 6 to obtain distance information for the neighboring pixel. Then the whole depth final displacement ∆x. For the pixel in the uncrossed part, we cannot obtain the displacement ∆xbi , image can be obtained the pixel-by-pixel and line-by-line basis. 0 is on therefore, the ∆xbi selected as the final displacement ∆x.

Figure 7. The field of of view (FoV) depthimage imageabout about two working modes. Figure 7. The field view (FoV)ofofthe theoutput output depth thethe two working modes.

4. Hardware Architecture andthe Implementation Based on the obtained final displacement ∆x, we can calculate the distance information for the object by triangulation according to Equation In target this paper, our method is implemented on (1). a hardware platform. Figure 8 shows the

architecture our depth-sensing Step of 7: Obtain the depth map.method. There are three main modules in the FPGA, i.e., consistency enhancement, shadow detection, and disparity estimation, and two external modules, Move the center pixel of the image block to the next neighboring pixel in the same row, and repeat the microcontroller unit (MCU) flash memory. On the one pixel. hand,Then the MCU controls reading steps 5 and 6 to obtain distanceand information for the neighboring the whole depththe image and can writing of the on flash Before the reference image needs to be solidified be obtained thememory. pixel-by-pixel andprocessing, line-by-line basis. in flash memory in advance, so the MCU will send a writing signal to control the flash memory 4. Hardware Architecture and Implementation through the inter-integrated circuit (I2C) bus. In the processing stage, the FPGA reads the reference data from memory; thus, is the writing signal needs toplatform. be changed as8 ashows reading signal. On the In flash this paper, our method implemented on a hardware Figure the architecture our depth-sensing There are three the FPGA, i.e., consistency otherof hand, there are method. some thresholds in main our modules method,insuch as the in enhancement, the consistency shadow detection, and disparity estimation, two external modules, the microcontroller unit help (MCU)us to enhancement and the threshold ℎ in the and displacements integration step. The MCU and flash memory. On the one hand, the MCU controls the reading and writing of the flash memory. adjust these thresholds outside the FPGA. Before processing, the reference image Rl needs to be solidified in flash memory in advance, so the The pipeline framework is the heart of the hardware implementation. For example, the input MCU will send a writing signal to control the flash memory through the inter-integrated circuit (I2C) image is scanned from the upper-left pixel to the lower-right pixel to generate a fixed data stream. bus. In the processing stage, the FPGA reads the reference data from flash memory; thus, the writing However, every pixel is processed with the surrounding pixels in steps 3, 4 and 5. Then the block signal needs to be changed as a reading signal. On the other hand, there are some thresholds in our extraction submodule essential in the three main modules. As show in Figure 9, the − 1 line method, such as the is gray re f in the consistency enhancement and the threshold th in the displacements buffers and × D-trigger are used in this submodule to extract a × image block. The integration step. The MCU help us to adjust these thresholds outside the FPGA. length of the line buffer is equal to the image width and each D-trigger represents a clock cycle delay. The extraction of the searching window also uses the same architecture, except the length of the D-trigger is changed to . In consistency enhancement module, after the block extraction module, there is an average submodule to calculate the average gray value of the extracted image block. Then comparator and multiplier are employed to enhance the central pixel according to Equation (7). In the shadow detection module, the comparator and counter are employed for the extracted image block to detect

Sensors 2017, 17, 805 Sensors 2017, 17, 0805; doi: 10.3390/s17040805

10 of 18 10 of 18

fr

β

fr

Ir

fl

Il

thsh

f r*

fl *

fl

β

fl *

Sensors 2017, 17, 0805; doi: 10.3390/s17040805

10 of 18

Ir

Δxbi , Cbi _ max

fr

Il

Ir Il

Rl

β

thsh

Δx

f r*

fr Il

Δx′bi

Δxleft

fl

f l *C

Cleft _ max

left _ max

Rl

fl

β

fl *

Figure8.8.Architecture Architectureofofthe thedepth depthsensing sensingmethod. method. Figure Ir

x ,C The pipeline framework is the heart of the hardwareΔimplementation. For example, the input I image is scanned from the upper-left pixel to the lower-right pixel to generate a fixed data stream. Δx However, every pixel is processed with the surrounding pixels in steps 3, 4 and 5. Then the block I extraction submodule is essential in the three main modules. AsΔshow in Figure 9, the m − 1 line buffers x′ Δx C C and m × m D-trigger are Rl used Rin this submodule to extract a m × m image block. The length of the line buffer is equal to the image width and each D-trigger represents a clock cycle delay. The extraction of the searching window also uses the same architecture, except the length of the D-trigger is changed Figure 8. Architecture of the depth sensing method. to M. bi

bi _ max

l

l

left

left _ max

l

bi

left _ max

Figure 9. Block extraction submodule.

In disparity estimation, after the extraction of the image block and the searching window, the − + 1 matching blocks in the searching window and image block are fed into the correlation submodule to calculate the correlation coefficients according to Equation (6). Then, the max submodule selects the max coefficient and acquires the corresponding displacement. However, there is a slight difference in the binocular mode because the deformation is considered in our paper. The matching block extraction contains an extra submodule, i.e., the deformation block extraction submodule, and an example for this submodule is given in Figure 10 by =3 and = 0, = ±1, 0. For every central pixel in the searching window, we extract three matching Figure9.9.Block Blockextraction extractionsubmodule. submodule. Figure blocks marked in red, black, and green. The deformation templates have been stored in advanced. ThenIna disparity 3 × 5 block with central is multiplied with block the three If the pixel in the the estimation, afterpixel the extraction of the image andtemplates. the searching window, In consistency enhancement module, after the block extraction module, there is an block. average template is one, the corresponding pixel in the block will be retained to form the matching − + 1 matching blocks in the searching window and image block are fed into the correlation submodule to calculate the average gray value of the extracted image block. Then comparator and submodule to calculate the correlation coefficients according to Equation (6). Then, the max multiplier are employed to enhance the central pixel according to Equation (7). In the shadow detection submodule selects the max coefficient and acquires the corresponding displacement. module, the comparator and counter are employed for the extracted image block to detect whether the However, there is a slight difference in the binocular mode because the deformation is considered central pixel is in the shadow area. in our paper. The matching block extraction contains an extra submodule, i.e., the deformation In disparity estimation, after the extraction of the image block and the searching window, the block extraction submodule, and an example for this submodule is given in Figure 10 by =3 M − m + 1 matching blocks in the searching window and image block are fed into the correlation and = 0, = ±1, 0. For every central pixel in the searching window, we extract three matching blocks marked in red, black, and green. The deformation templates have been stored in advanced. Then a 3 × 5 block with central pixel is multiplied with the three templates. If the pixel in the template is one, the corresponding pixel in the block will be retained to form the matching block.

Sensors 2017, 17, 805

11 of 18

submodule to calculate the correlation coefficients according to Equation (6). Then, the max submodule selects the max coefficient Cmax and acquires the corresponding displacement. However, there is a slight difference in the binocular mode because the deformation is considered in our paper. The matching block extraction contains an extra submodule, i.e., the deformation block extraction submodule, and an example for this submodule is given in Figure 10 by m = 3 and u x = 0, uy = ±1, 0. For every central pixel in the searching window, we extract three matching blocks marked in red, black, and green. The deformation templates have been stored in advanced. Then a 3 × 5 block with central pixel R23 is multiplied with the three templates. If the pixel in the template is one, the corresponding pixel in the 2017, block17,will retained to form the matching block. Sensors 0805;be doi: 10.3390/s17040805 11 of 18 Sensors 2017, 17, 0805; doi: 10.3390/s17040805

11 of 18

Figure 10. Deformation block extraction submodule. Figure 10. Deformation block extraction submodule.

For the last two submodules, displacement conversion and we two For two submodules, displacement conversion and triangulation, we establish two lookup Forthe thelast last two submodules, displacement conversion and triangulation, triangulation, we establish establish two lookup tables which store the in according to (1) tables which store calculating resultsresults in ROM, according to Equations (1) and (3) (3) in lookup(LUTs), tables(LUTs), (LUTs), whichthe store the calculating calculating results inROM, ROM, according to Equations Equations (1)and and (3) in Section 2.1, respectively. Every input displacement corresponds to a register address, and the Section 2.1, 2.1, respectively. EveryEvery inputinput displacement corresponds to a register address,address, and the result is in Section respectively. displacement corresponds to a register and the result is read from the corresponding address. read from the corresponding address. result is read from the corresponding address. 5. 5.Experimental ExperimentalResults Resultsand andDiscussion Discussion

designed FPGA We Wedesigned designedan anFPGA FPGAhardware hardwareplatform, platform, as asshown shown in in Figure Figure11, 11,to toverify verifythe thedepth depthsensing sensing proposed in this paper. On the platform, we use the near-infrared laser projector algorithm proposed in this paper. On the platform, we use the near-infrared laser projector algorithm proposed in this paper. On the platform, we use the near-infrared laser (emitting projector (emitting laser speckle pattern), to that used in and IR sensors laser speckle pattern), to similar that used Kinect two identical image sensors (receiving (emitting laser specklesimilar pattern), similar to in that usedand in Kinect Kinect and two twoIRidentical identical IR image image sensors (receiving the laser speckle pattern, with output resolution supporting 1280 × 960 at 60 Hz), the laser speckle pattern, with output resolution supporting 1280 × 960 at 60 Hz), which are (receiving the laser speckle pattern, with output resolution supporting 1280 × 960 at 60 Hz), which which are fixed on an aluminum plate. Our proposed depth-sensing method is verified on an Altera fixed on an aluminum plate. Our proposed depth-sensing method is verified on an Altera FPGA are fixed on an aluminum plate. Our proposed depth-sensing method is verified on an Altera FPGA EP4CE115F23C8N, Intel Corporation, Santa Clara, CA, USA). (type: (type:EP4CE115F23C8N, EP4CE115F23C8N,Intel IntelCorporation, Corporation,Santa SantaClara, Clara,CA, CA,USA). USA).

Figure Figure11. 11.The Thehardware hardwaretest testplatform. platform. Figure 11. The hardware test platform.

5.1. 5.1.The TheValidation Validationofofthe theTransforming TransformingRelationship Relationship between betweenTwo TwoDisplacements Displacements 5.1. The Validation of the Transforming Relationship between Two Displacements In In this this part, part, we we validate validate the the transforming transforming relationship relationship between between the the two two displacements displacements ∆∆ In this part, we validate the transforming relationship between the two displacements ∆xbi and and from and ∆∆ from two two matching matching modes. modes. We We capture capture aa set set of of left left and and right right speckle speckle patterns patterns projected projected ∆x from two matching modes. We capture a set of left and right speckle patterns projected on a le f t on on aa standard standard plane plane that that isis perpendicular perpendicular to to the the optical optical axis axis (Z-axis) (Z-axis) of of the the laser laser projector projector and and standard plane that is perpendicular to the optical axis (Z-axis) of the laser projector and parallel to parallel parallel to to the the left left reference reference speckle speckle pattern pattern plane. plane. The The distance distance information information of of every every speckle speckle the left reference speckle pattern plane. The distance information of every speckle pattern is known, pattern and pattern isis known, known, and and the the range range isis from from 0.7 0.7 m m to to 4.46 4.46 m. m. The The tested tested displacements displacements ∆∆ and ∆∆ and the range is from 0.7 m to 4.46 m. listed The tested displacements ∆x andtwo ∆xletables, pattern are bithe f t for each for each pattern are estimated and in Tables 1 and 2. In the theoretical for each pattern are estimated and listed in Tables 1 and 2. In the two tables, the theoretical displacements displacements calculated calculated from from Equations Equations (1) (1) and and (2) (2) are are also also listed, listed, where where the the related related camera camera parameters = 4.387 , = 3.75 , the baseline = 74.6 , and = 2 . From parameters = 4.387 , = 3.75 , the baseline = 74.6 , and = 2 . From Tables Tables 11 and isis slightly .. The and 2, 2, itit can can be be found found that that the the error error of of ∆∆ slightly larger larger than than that that of of ∆∆ The reason reason isis that that the baseline of the binocular mode is twice the length of that of the monocular mode. the baseline of the binocular mode is twice the length of that of the monocular mode. For For displacements displacements ∆∆ ,, from from 0.7 0.7 to to 2.5 2.5 m, m, the the error error of of the the test test value value decreases decreases with with the the increase increase in in

Sensors 2017, 17, 805

12 of 18

estimated and listed in Tables 1 and 2. In the two tables, the theoretical displacements calculated from Equations (1) and (2) are also listed, where the related camera parameters f = 4.387 mm, µ = 3.75 µm, the baseline s = 74.6 mm, and dr = 2 m. From Tables 1 and 2, it can be found that the error of ∆xbi is slightly larger than that of ∆xle f t . The reason is that the baseline of the binocular mode is twice the length of that of the monocular mode. For displacements ∆xbi , from 0.7 to 2.5 m, the error of the test value decreases with the increase in distance. After 2.5 m, the error fluctuation is stable between 0.47 and 0.61. For displacements ∆xle f t , the error of the test value around the reference distance is at a minimum and the error between the theoretical and test values is no more than one pixel. In Figure 12, we plot the ranging curves calculated from Equations (1) and (2) and the transforming curve from Equation (3) and, additionally, the test displacements are also plotted and represented, shown as blue plots. It intuitively shows that all of the blue plots are basically on the red line. Table 1. A list of the displacements ∆xbi at different distance. Distance(m)

Theory Value

Test Value

Distance(m)

Theory Value

Test Value

0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5

250.06 193.08 157.96 134.26 116.36 102.55 92.06 83.12 75.76 70.07

250.79 ± 0.94 193.66 ± 0.75 158.58 ± 0.71 135.05 ± 0.74 116.83 ± 0.65 102.73 ± 0.61 92.06 ± 0.53 83.15 ± 0.56 75.70 ± 0.53 69.99 ± 0.50

2.7 2.9 3.1 3.3 3.5 3.7 3.9 4.1 4.3 4.46

64.67 60.08 56.21 52.88 49.91 47.19 44.80 42.61 40.64 39.11

64.21 ± 0.51 59.60 ± 0.59 55.71 ± 0.57 52.35 ± 0.63 49.33 ± 0.58 46.62 ± 0.58 44.24 ± 0.61 42.02 ± 0.47 40.04 ± 0.50 38.45 ± 0.58

Table 2. A list of the displacements ∆xle f t at different distance. Distance(m)

Theory Value

Test Value

Distance(m)

Theory Value

2.7 2.9 3.1 3.3 3.5 3.7 3.9 4.1 4.3 4.46

−11.30 −13.59 −15.53 −17.20 −18.68 −20.04 −21.24 −22.33 −23.32 −24.08

0.7 81.40 81.79 ± 0.67 0.9 52.90 53.20 ± 0.59 1.1 35.34 35.64 ± 0.58 1.3 23.50 23.87 ± 0.58 1.5 14.55 14.75 ± 0.47 1.7 7.64 7.73 ± 0.32 1.9 2.39 2.41 ± 0.28 2.1 −2.08 −2.09 ± 0.30 2.3 −5.76 −5.69 ± 0.51 2.5 − 8.60 Sensors 2017, 17, 0805; doi: 10.3390/s17040805−8.72 ± 0.68

(a)

(b)

Test Value

−11.37 ± 0.80 −13.75 ± 0.58 −15.67 ± 0.43 −17.36 ± 0.43 −18.89 ± 0.42 −20.22 ± 0.37 −21.37 ± 0.44 −22.49 ± 0.50 −23.45 ± 0.45 −24.28 ± 0.50 12 of 18

(c)

Figure 12. 12. The The test test results resultsand andtheoretical theoreticalcurves curvesfrom from Equations and distance Figure Equations (1),(1), (2),(2), and (3).(3). (a)(a) TheThe distance ; (b) the distance curve of displacements ∆ ; (c) the transforming curve of displacements ∆ curve of displacements ∆xle f t ; (b) the distance curve of displacements ∆xbi ; (c) the transforming curve ∆ . curve between displacements ∆ ∆x and between displacements ∆xle f t and bi . Table 1. A list of the displacements ∆ Distance(m) 0.7 0.9 1.1

Theory Value 250.06 193.08 157.96

Test Value 250.79 ± 0.94 193.66 ± 0.75 158.58 ± 0.71

at different distance.

Distance(m) 2.7 2.9 3.1

Theory Value 64.67 60.08 56.21

Test Value 64.21 ± 0.51 59.60 ± 0.59 55.71 ± 0.57

Sensors 2017, 17, 805

13 of 18

5.2. The Spatial Resolution in X-Y Direction In this paper, we divide the speckle pattern into blocks (or subsets) to estimate the disparity of every pixel. According to the triangulation principle, the spatial resolution in the X-Y direction is proportional to the size of the divided block and is inversely proportional to distance. The relationship can be expressed as mµ ∆X (Y ) ∝ d, (8) f where m represents the block size. In our method, reducing the size of the matching block is performed to improve the spatial resolution of the depth map, but it will reduce the matching accuracy. Therefore, we analyze the error rate of different sizes of blocks at different distances to select the appropriate block size. The two matching modes are performed from 0.7 m to 4.46 m with block sizes of 25 × 25 and 17 × 17, and ∆x25 and ∆x17 represent the estimated displacements, respectively. In the experimental results, we find that, with the larger block size of 25 × 25, there are no mismatching pixels in the two matching modes. Then, when the smaller block size of 17 × 17 is used, for every pixel, if |∆x25 − ∆x17 | > 0.25 pixel, we assume that the matching result is an error. Table 3 lists the error rate of the two matching modes with the block size of 17 × 17. The results show that almost no mismatching appears in binocular mode before 3.5 m and the error rate is also very low from 3.7 m to 4.1 m. Therefore, the 17 × 17 block can be selected to perform the DIC algorithm. Moreover, it is obvious that, in monocular mode, the matching results are very fine around the reference distance; in other distances, the mismatching appears—especially at 0.7 m, the mismatching is very serious. In addition, the main calculation is performed in the disparity estimation step (step 5). Although the two matching modes are performed in the present method, we adopt the smaller block without the increase of the calculation amount in comparison to the 25 × 25 block size. Table 3. The error rate (‰) of the two matching modes at different distance. Distance (m)

Binocular Mode

Monocular Mode

Distance (m)

Binocular Mode

Monocular Mode

0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5

0 0 0 0 0 0 0 0 0 0

0.48 0.22 0.18 0.06 0 0 0 0 0 0

2.7 2.9 3.1 3.3 3.5 3.7 3.9 4.1 4.3 4.46

0 0 0 0 0 0.01 0.01 0.03 0.09 0.20

0.03 0.03 0.04 0.03 0.08 0.07 0.06 0.06 0.06 0.09

5.3. The Analysis and Comparison of Results Firstly, we employ our method to test people based on our hardware platform and the results are shown in Figure 13. Figure 13a is the output depth image from the monocular mode, with some mismatching happening on the finger part, as shown in the green circle. Figure 13b is the output depth image from the binocular mode, in which the fingers are clearer. However, the occluded areas marked by red curves impact the quality of the whole image. Thus, we correct the occluded areas through Figure 13a to generate the final depth image, as shown in Figure 13c. Comparing Figure 13a with Figure 13b, it is clear that the recognition ability of the monocular mode for tiny objects is lower than that of the binocular mode and mismatching is easier to appear on the similar objects in monocular mode after reducing the block size. Conversely, the noise on the body is slightly greater in Figure 13b. The reason is that the surface of the body is relatively complex. The occlusion also exists on the surface. The black area around the person is the detected shadow area and the final result (Figure 13c) proves that our method can primarily integrate the advantage of Figure 13a,b.

areas through Figure 13a to generate the final depth image, as shown in Figure 13c. Comparing Figure 13a with Figure 13b, it is clear that the recognition ability of the monocular mode for tiny objects is lower than that of the binocular mode and mismatching is easier to appear on the similar objects in monocular mode after reducing the block size. Conversely, the noise on the body is slightly greater in Figure 13b. The reason is that the surface of the body is relatively complex. The occlusion also exists on the surface. The black area around the person is the detected shadow area Sensors 2017, 17, 805 and the final result (Figure 13c) proves that our method can primarily integrate the advantage of Figures 13a,b.

(a)

(b)

14 of 18

(c)

Figure 13. The depth image (a) from monocular mode (b) from binocular mode (c) from the

Figure 13.combination The depthofimage (a)matching from monocular mode (b) from binocular mode (c) from the combination the two modes. of the two matching modes.

In order to test the spatial resolution of our system and intuitively compare the results with other similarly devices, such as the Xtion Pro, Kinect, Kinect 2, and Realsense R200, the depth maps on the several straight sticks with known width, as shown in Figure 14, are obtained and shown in Figure 15. The widths of the straight sticks are 5, 8, 10, 12, 15, 20, and 25 mm, from left to right. As shown in Figure 15, the results from our method are always better than that from the Kinect, Xtion Pro and Realsense. For example, it is obvious that the first stick still can be better identified at a testing distance of 1.5 m in Figure 15f, but it only can be identified at 0.7 m in Figure 15b,c,e. At 1.9 m, there are still six sticks in our depth map, but only three sticks in other three maps. Moreover, it is obvious that the noise in Figure 15e is great and the depth map on the flat object always flickers. The reason is that Intel sacrifices the performance of the Realsense in order to reduce the dimensions. The dimensions of the RealSense R200 are only 101.56 mm (length) × 9.55 mm (height) × 3.8 mm (width), which is the smallest depth sensor at present. Compared with the Kinect 2, our results is slightly better before 1.5 m, because the Kinect 2 cannot identify the first stick at 1.5 m. Beyond the testing distance of 1.5 m, our method smooths the depth of the thinner sticks that cannot be tested. However, the Kinect 2 can mark these uncertain areas, although it also cannot obtain the depth value of these objects, such as the location of the first stick from 1.5 m to 3.1 m. Although there are some problems in our method, Sensors 2017, 17, 0805; doi: 10.3390/s17040805 14 of 18 the results from Figures 13 and 15 still prove that the present method not only improves the spatial In order to image, test the spatial resolution of our compare the results with resolution of the depth but also ensures the system qualityand of intuitively depth image. other15, similarly devices, such as the Xtion Pro, Kinect, Kinect 2, and Realsense R200, the depth mapsof different In Figure we simplify the tested scene and objects, measuring the spatial resolution on the several straight sticks with known width, as shown in Figure 14, are obtained and shown in devices under an ideal circumstance. The results are optimal to a large extent. In practical applications, Figure 15. The widths of the straight sticks are 5, 8, 10, 12, 15, 20, and 25 mm, from left to right. As the scenes shown and structure of objects are complicated, andalways the tested results Therefore, in in Figure 15, the results from our method are better than thatwill frombe theaffected. Kinect, Xtion Pro and Formore example, it is obvious results that the and first stick still canthem. be better identified at a the following, weRealsense. presented experimental discussed testing in distance of 16, 1.5 m Figureour 15f, system but it only be identified at 0.7 m in Figures 15b,c,e. As shown Figure weintried oncan some capital letters. The strokes areAt 10 mm (the 1.9 m, there are still six sticks in our depth map, but only three sticks in other three maps. Moreover, upper line) and 15 mm (the lower line), respectively, which have been labeled in Figure 16a. The it is obvious that the noise in Figure 15e is great and the depth map on the flat object always flickers. testing distance is about 1.15 m. sacrifices In Figures and 15, sticks the width of 10 and The reason is that Intel the 14 performance of thewith Realsense in order to mm reduce the15 mm are dimensions. The dimensions of the It RealSense are only 101.56 mm (length) × 9.55 mmbe(height) the third and fifth ones, respectively. can be R200 concluded that the two sticks can tested by these × 3.8 mm (width), whichm. is the depth sensor at present. withwith the Kinect 2, our depth sensing devices at 1.15 Thesmallest test results in Figure 16 areCompared consistent this conclusion. All results is slightly better before 1.5 m, because the Kinect 2 cannot identify the first stick at 1.5 m. devices can detect that there are some objects at this position. However, it is different to accurately Beyond the testing distance of 1.5 m, our method smooths the depth of the thinner sticks that recognize cannot whichbeletters are. For example, the letter “N,” in Figure 16b,c,e, the test results in tested.they However, the Kinect 2 can for mark these uncertain areas, although it also cannot the poor. depth value these objects, suchresults as the location of the first slightly stick fromimprove, 1.5 m to 3.1 m. the upperobtain line are In theoflower line, the in Figure 16b,c but they look areFigure some problems in our method, the results from Figures and 15 still prove that are also more like Although the letterthere “H.” 16d,f show an improvement over the 13 other three, but there the present method not only improves the spatial resolution of the depth image, but also ensures problems in the test results. The holes in the letter “A” are not detected by our system or Kinect 2. the quality of depth image.

Width of sticks (mm): 5

8

10

12

15

20

25

Figure 14. The width of the straight sticks.

Figure 14. The width of the straight sticks.

Width of sticks (mm): 5

8

10

12

15

20

25

Sensors 2017, 17, 805

15 of 18

Figure 14. The width of the straight sticks.

Sensors 2017, 17, 0805; doi: 10.3390/s17040805

(a)

(b)

15 of 18

(c)

(d)

(e)

(f)

Figure 15. The different widths of the straight sticks (5, 8, 10, 12, 15, 20, and 25 mm from left to right) at different distances. (a) Color image; (b) Xtion Pro; (c) Kinect; (d) Kinect 2 (see Appendix A for more details); (e) Realsense R200; (f) our method.

In Figure 15, we simplify the tested scene and objects, measuring the spatial resolution of different devices under an ideal circumstance. The results are optimal to a large extent. In practical applications, the scenes and structure of objects are complicated, and the tested results will be affected. Therefore, in the following, we presented more experimental results and discussed them. As shown in Figure 16, we tried our system on some capital letters. The strokes are 10 mm Sensors 2017, 17, 0805; doi: 10.3390/s17040805 15 (the of 18 upper line) and 15 mm (the lower line), respectively, which have been labeled in Figure 16a. The testing distance is about 1.15 m. In Figures 14 and 15, sticks with the width of 10 mm and 15 mm are the third and fifth ones, respectively. It can be concluded that the two sticks can be tested by these depth sensing devices at 1.15 m. The test results in Figure 16 are consistent with this conclusion. All (b)are some objects(c) (d)However, it is(e) (f) devices (a) can detect that there at this position. different to accurately recognize they are. For example, for the letter “N,” in Figures 16b,c,e, thetotest results Figurewhich 15. Theletters different widths of the straight sticks (5, 8, 10, 12, 15, 20, and 25 mm from left right) Figure 15. The different widths of the straight sticks (5, 8, 10, 12, 15, 20, and 25 mm from left to right) at in the upper line are poor. In the lower line, the results in Figures 16b,c slightly improve, but at different distances. (a) Color image; (b) Xtion Pro; (c) Kinect; (d) Kinect 2 (see Appendix A for they different distances. (a) Color image; (b) Xtion Pro; (c) Kinect; (d) Kinect 2 (see Appendix A for more look more like the(e) letter “H.” Figures show an improvement over the other three, but there are more details); Realsense R200; (f)16d,f our method. details); (e) Realsense R200; (f) our method. also problems in the test results. The holes in the letter “A” are not detected by our system or Kinect 2. In Figure 15, we simplify the tested scene and objects, measuring the spatial resolution of 10 mm under an ideal circumstance. The results are optimal to a large extent. In practical different devices applications, the scenes and structure of objects are complicated, and the tested results will be affected. Therefore, in the following, we presented more experimental results and discussed them. 15 mm As shown in Figure 16, we tried our system on some capital letters. The strokes are 10 mm (the upper line) and 15 mm (the lower line), respectively, which have been labeled in Figure 16a. The testing distance is about 1.15 m. In Figures 14 and 15, sticks with the width of 10 mm and 15 mm are the third and fifth that the two sticks can be (a) ones, respectively. It can be concluded (b) (c)tested by these depth sensing devices at 1.15 m. The test results in Figure 16 are consistent with this conclusion. All devices can detect that there are some objects at this position. However, it is different to accurately recognize which letters they are. For example, for the letter “N,” in Figures 16b,c,e, the test results in the upper line are poor. In the lower line, the results in Figures 16b,c slightly improve, but they look more like the letter “H.” Figures 16d,f show an improvement over the other three, but there are also problems in the test results. The holes in the letter “A” are not detected by our system or Kinect 2. (d)

10 mm

(e)

(f)

Figure 16. The test results on some capital letters. (a) Color image; (b) Xtion Pro; (c) Kinect; (d) Kinect 2 Figure 16. The test results on some capital letters. (a) Color image; (b) Xtion Pro; (c) Kinect; (d) Kinect (see Appendix A for more details); (e) Realsense R200; (f) our method. 2 (see Appendix A for more details); (e) Realsense R200; (f) our method. 15 mm

At last, we performed an experiment in a complex scene with furniture and present the results Atin last, we performed experiment in a complex scene with furniture and present results here here Figure 17. Table 4an shows the working mode and performance features of everythe device. The in Xtion Figureand 17. Kinect Table 4employ shows the working mode and performance features of every device. The Xtion the same chip (Primesense P1080), and the performance features are (a) so we do not list them in this (b) almost the same, table. Firstly, from Figure 17, we (c) can see that the depth capture distance of our system and Kinect 2 is obviously an improvement over the other devices. Only Object ⑧ can be tested in Figures 17d,f. Secondly, the Kinect 2 adopts ToF technology, while other devices are all based on structured light. Object ⑥ can be tested in Figures 17b,c,f. However, Kinect 2 cannot be detected it in its range limit. Therefore, we consider that

Sensors 2017, 17, 805

16 of 18

and Kinect employ the same chip (Primesense P1080), and the performance features are almost the same, so we do not list them in this table. Firstly, from Figure 17, we can see that the depth capture distance of our system and Kinect 2 is obviously an improvement over the other devices. Only Object 8 can be tested in Figure 17d,f. Secondly, the Kinect 2 adopts ToF technology, while other devices are

6 can be tested in Figure 17b,c,f. However, Kinect 2 cannot be all based on structured light. Object detected it in its range limit. Therefore, we consider that sometimes the structured-light technology is better than the ToF technology. Finally, the optical phenomena will affect the measurement results. 2 which is a mirror reflection object. The surface of In Figure 17d, there are some noises on Object , 3 absorbs the light, which significantly affects the performance of the depth measurement Object based on2017, active vision. Sensors 17, 0805; doi: 10.3390/s17040805 16 of 18 ⑦ ⑧ ⑥







⑤ ③





(a)

(b)

(c)

(d)

(e)

(f)

Figure 17. The test results in the complex scene with furniture. (a) Color image; (b) Xtion Pro; Figure 17. The test results in the complex scene with furniture. (a) Color image; (b) Xtion Pro; (c) Kinect; (c) Kinect; (d) Kinect 2; (e) Realsense R200; (f) our method. (d) Kinect 2; (e) Realsense R200; (f) our method.

Item Working Item mode Range limit Working mode Framerate Range limit Image resolution Framerate Bits of resolution depth image Image Vertical Bits of depthField image Horizontal Field Vertical Field Output interface Horizontal Field

Output interface

Table 4. The performance comparison. Table 4. The performance comparison. Kinect Kinect 2 Realsense R200 Structured light ToF Structured light Kinect Kinect 2 Realsense R200 0.8~3.5 m 0.5~4.5 m 0.4~2.8 m Structured light ToF Structured light 30 fps 30 fps Up to 60 fps 0.8~3.5 m 0.5~4.5 m 0.4~2.8 m 640 × 480 512 × 424 Up to 640 × 480 30 fps 30 fps Up to 60 fps unpublished 12bits 640 10bits × 480 512 × 424 Up to 640 × 480 43° 60° 46° ± 5° 10bits unpublished 12bits ◦ ◦ 4357° 6070° 46◦59° ± ±5◦5° ◦ USB USB 57◦ 2.0 70◦ 3.0 59◦USB ± 53.0 USB 2.0 USB 3.0 USB 3.0

Our Method Structured light Our Method 0.8~4.5 m Structured light Up to 60 fps 0.8~4.5 m Up to 1280 × 960 Up to 60 fps Up to 12bits 1280 × 960 43° 12bits 58°◦ 43 USB 58◦3.0

USB 3.0

6. Conclusions

6. Conclusions In this paper, an FPGA hardware design method for a depth sensing system based on active structured light isan presented to improvedesign the spatial resolution of the depth image withbased two infrared In this paper, FPGA hardware method for a depth sensing system on active cameras. Firstly, the infrared cameras, based on narrow band-pass filtering, cooperate with the structured light is presented to improve the spatial resolution of the depth image with two infrared infrared laser to capture the clearer speckle pattern, which is projected by laser and used to mark the cameras. Firstly, the infrared cameras, based on narrow band-pass filtering, cooperate with the infrared space. Secondly, we improved the spatial resolution in the X-Y direction by reducing the size of the laser to capture the clearer speckle pattern, which is projected by laser and used to mark the space. matching block, but the smaller size reduces the matching strain precision. Thus, the two matching Secondly, we improved the spatial resolution in the X-Y direction by reducing the size of the matching modes are combined to obtain more precise data. Moreover, the method solves the occlusion block, but the smaller size reduces the matching strain precision. Thus, the two matching modes are problem existing in traditional binocular stereo systems. Thirdly, we introduce the full pipeline combined to obtain precise data. Moreover, the method the occlusion existing architecture of themore depth sensing method and verified it onsolves an FPGA platform. problem However, if our in system is to be widely adopted in the future, some optimization is needed to limit hardware resources and power consumption, and more and various scenes are also needed to verify the system; this will be the focus of our future work. Acknowledgments: This work was supported by the National Natural Science Foundation of China under Grant No. 61571358 and No. 61627811; and Shaanxi Industrial Science and Technology Project under Grant No.

Sensors 2017, 17, 805

17 of 18

traditional binocular stereo systems. Thirdly, we introduce the full pipeline architecture of the depth sensing method and verified it on an FPGA platform. However, if our system is to be widely adopted in the future, some optimization is needed to limit hardware resources and power consumption, and more and various scenes are also needed to verify the system; this will be the focus of our future work. Acknowledgments: This work was supported by the National Natural Science Foundation of China under Grant No. 61571358 and No. 61627811; and Shaanxi Industrial Science and Technology Project under Grant No. 2015GY058. Author Contributions: Chenyang Ge and Nanning Zheng provided the initial idea of this research. Huimin Yao and Chenyang Ge performed the experiments and wrote the paper. Jianru Xue and Nanning Zheng provided valuable advice in the experiments and manuscript writing. Sensors 2017, 17, 0805; doi: 10.3390/s17040805

17 of 18

Conflicts of Interest: The authors declare no conflict of interest. Conflicts of Interest: The authors declare no conflict of interest.

Appendix A

Appendix A

In the depth images from the Kinect 2 in Figure 15d, the uncertain areas in the depth images In the depth images from the Kinect 2 in Figure 15d, the uncertain areas in the depth images are aremarked markedby bydark darkpixels. pixels.This Thismakes makesthe thedepth depthofofthe themeasured measured sticks difficult distinguished sticks difficult toto bebe distinguished from the uncertain area. Therefore, we made a small change in Figure 15d, as shown in Figure A1b; from the uncertain area. Therefore, we made a small change in Figure 15d, as shown in Figure A1b; thethe uncertain areas are uncertain areas aremarked markedby bywhite whitepixels. pixels.

(a)

(b)

Figure A1. Depth images at distance 1.5 m from Kinect 2: (a) original depth image; (b) revised depth Figure A1. Depth images at distance 1.5 m from Kinect 2: (a) original depth image; (b) revised image. depth image.

References References

1. Zhang, Z. Microsoft kinect sensor and its effect. IEEE Multimedia 2012, 19, 4–10. Han, Z. J.; Microsoft Shao, L.; Xu, D.; sensor Shotton, J. Enhanced computer vision 2012, with microsoft kinect sensor: A review. 1. 2. Zhang, kinect and its effect. IEEE Multimedia 19, 4–10. [CrossRef] IEEE 43, 1318–1334. 2. Han, J.; Trans. Shao,Cybern. L.; Xu,2013, D.; Shotton, J. Enhanced computer vision with microsoft kinect sensor: A review. 3. IEEE Iii, Trans. F.A.M.; Lauder, A.; 43, Rector, K.; Keeling, P.; Cherones, A.L. Measurement of active shoulder motion Cybern. 2013, 1318–1334. [PubMed] using the Kinect, a commercially available infrared position detection system. J. Shoulder Elbowmotion Surg. Am. Iii, F.A.M.; Lauder, A.; Rector, K.; Keeling, P.; Cherones, A.L. Measurement of active shoulder using 3. Shoulder Elbow Surg. 2015, 25, 216–223. the Kinect, a commercially available infrared position detection system. J. Shoulder Elbow Surg. Am. Shoulder 4. Elbow Hernandez-Belmonte, U.H.; Ayala-Ramirez, V. Real-time hand posture recognition for human-robot Surg. 2015, 25, 216–223. interaction tasks. Sensors 2016, 16, 36. 4. Hernandez-Belmonte, U.H.; Ayala-Ramirez, V. Real-time hand posture recognition for human-robot 5. interaction Plouffe, G.; Cretu, A. Static and dynamic hand[PubMed] gesture recognition in depth data using dynamic time tasks. Sensors 2016, 16, 36. [CrossRef] warping. IEEE Trans. Instrum. Meas. 2016, 65, 305–316. 5. Plouffe, G.; Cretu, A. Static and dynamic hand gesture recognition in depth data using dynamic time 6. Kondyli, A.; Sisiopiku, V.P.; Zhao, L.; Barmpoutis, A. Computer assisted analysis of drivers’ body activity warping. IEEE Trans. Instrum. Meas. 2016, 65, 305–316. [CrossRef] using a range camera. IEEE Intell. Transp. Syst. Mag. 2015, 7, 18–28. Kondyli, A.; Sisiopiku, V.P.; Zhao, L.; Barmpoutis, A. Computer assisted analysis of drivers’ body activity 6. 7. Yue, H.; Chen, W.; Wu, X.; Wang, J. Extension of an iterative closest point algorithm for simultaneous using a range camera. IEEE Intell. Transp. Syst. Mag. 2015, 7, 18–28. [CrossRef] localization and mapping in corridor environments. J. Electron. Imaging 2016, 25, 023015. 7. Yue, H.; Chen, W.; Wu, X.; Wang, J. Extension of an iterative closest point algorithm for simultaneous 8. Kim, H.; Lee, S.; Kim, Y.; Lee, S.; Lee, D.; Ju, J.; Myung, H. Weighted joint-based human behavior localization and mapping in corridor environments. J. Electron. Imaging 2016, 25, 023015. [CrossRef] recognition algorithm using only depth information for low-cost intelligent video-surveillance system. 8. Kim, H.; Lee, Kim, Y.; Lee, S.; Lee, D.; Ju, J.; Myung, H. Weighted joint-based human behavior recognition Expert Syst. S.; Appl. 2015, 45, 131–141. only depth information forD.; low-cost intelligent video-surveillance system.S.; Expert Syst. D.; Appl. 9. algorithm Izadi, S.;using Kim, D.; Hilliges, O.; Molyneaux, Newcombe, R.; Kohli, P.; Shotton, J.; Hodges, Freeman, 2015, 45, 131–141. [CrossRef] Davison, A. Kinect fusion: Real-time 3D reconstruction and interaction using a moving depth camera. In Proceedings of the ACM Symposium on User Interface Software and Technology, Santa Barbara, CA, USA, 16–19 October 2011; pp. 559–568. 10. Tang, S.; Zhu, Q.; Chen, W.; Darwish, W.; Wu, B.; Hu, H.; Chen, M. enhanced RGB-D mapping method for detailed 3D indoor and outdoor modeling. Sensors 2016, 16, 1589. 11. Jin, S.; Cho, J.; Pham, X.D.; Lee, K.M.; Park, S.K.; Kim, M.; Jeon, J.W. FPGA design and implementation of a

Sensors 2017, 17, 805

9.

10. 11. 12.

13. 14. 15.

16. 17. 18. 19. 20. 21. 22.

23. 24. 25. 26. 27. 28. 29.

18 of 18

Izadi, S.; Kim, D.; Hilliges, O.; Molyneaux, D.; Newcombe, R.; Kohli, P.; Shotton, J.; Hodges, S.; Freeman, D.; Davison, A. Kinect fusion: Real-time 3D reconstruction and interaction using a moving depth camera. In Proceedings of the ACM Symposium on User Interface Software and Technology, Santa Barbara, CA, USA, 16–19 October 2011; pp. 559–568. Tang, S.; Zhu, Q.; Chen, W.; Darwish, W.; Wu, B.; Hu, H.; Chen, M. enhanced RGB-D mapping method for detailed 3D indoor and outdoor modeling. Sensors 2016, 16, 1589. [CrossRef] [PubMed] Jin, S.; Cho, J.; Pham, X.D.; Lee, K.M.; Park, S.K.; Kim, M.; Jeon, J.W. FPGA design and implementation of a real-time stereo vision system. IEEE Trans. Circuits Syst. Video Technol. 2010, 20, 15–26. Banz, C.; Hesselbarth, S.; Flatt, H.; Blume, H.; Pirsch, P. Real-time stereo vision system using semi-global matching disparity estimation: Architecture and FPGA-implementation. In Proceedings of the International Conference on Embedded Computer Systems, Samos, Greece, 19–22 July 2010; pp. 93–101. Geng, J. Structured-light 3D surface imaging: a tutorial. Adv. Opt. Photonics 2011, 3, 128–160. [CrossRef] Salvi, J.; Pagès, J.; Batlle, J. Pattern codification strategies in structured light systems. Pattern Recognit. 2004, 37, 827–849. [CrossRef] Kim, W.; Wang, Y.; Ovsiannikov, I.; Lee, S.H.; Park, Y.; Chung, C.; Fossum, E. A 1.5Mpixel RGBZ CMOS image sensor for simultaneous color and range image capture. In Proceedings of the IEEE International Solid-state Circuits Conference Digest of Technical Papers, San Francisco, CA, USA, 19–23 February 2012; pp. 392–394. Sarbolandi, H.; Lefloch, D.; Kolb, A. Kinect range sensing: Structured-light versus time-of-flight kinect. Comput. Vision Image Underst. 2015, 139, 1–20. [CrossRef] Gere, D.S. Depth Perception Device and System. U.S. Patent 2013/0027548, 31 January 2013. Lin, B.S.; Su, M.J.; Cheng, P.H.; Tseng, P.J.; Chen, S.J. Temporal and spatial denoising of depth maps. Sensors 2015, 15, 18506–185025. [CrossRef] [PubMed] Camplani, M.; Salgado, L. Efficient spatio-temporal hole filling strategy for Kinect depth maps. In Proceedings of the IS&T/SPIE Electronic Imaging, Burlingame, CA, USA, 22–26 January 2012. Yao, H.; Ge, C.; Hua, G.; Zheng, N. The VLSI implementation of a high-resolution depth-sensing SoC based on active structured light. Mach. Vision Appl. 2015, 26, 533–548. [CrossRef] Ma, S.; Shen, Y.; Qian, J.; Chen, H.; Hao, Z.; Yang, L. Binocular Structured Light Stereo Matching Approach for Dense Facial Disparity Map; Springer: Berlin/Heidelberg, Germany, 2011; pp. 550–559. An, D.; Woodward, A.; Delmas, P.; Chen, C.Y. Comparison of Structured Lighting Techniques with a View for Facial Reconstruction. 2012. Available online: https://www.researchgate.net/publication/268427066_ Comparison_of_Structured_Lighting_Techniques_with_a_View_for_Facial_Reconstruction (accessed on 3 March 2017). Nguyen, T.T.; Slaughter, D.C.; Max, N.; Maloof, J.N.; Sinha, N. Structured light-based 3D reconstruction system for plants. Sensors 2015, 15, 18587–18612. [CrossRef] [PubMed] Yang, R.; Cheng, S.; Yang, W.; Chen, Y. Robust and accurate surface measurement using structured light. IEEE Trans. Instrum. Meas. 2008, 57, 1275–1280. [CrossRef] Dekiff, M.; Berssenbrügge, P.; Kemper, B.; Denz, C.; Dirksen, D. Three-dimensional data acquisition by digital correlation of projected speckle patterns. Appl. Phys. B 2010, 99, 449–456. [CrossRef] Robert, L.; Nazaret, F.; Cutard, T.; Orteu, J.J. Use of 3-D digital image correlation to characterize the mechanical behavior of a fiber reinforced refractory castable. Exp. Mech. 2007, 47, 761–773. [CrossRef] Lu, H.; Cary, P.D. Deformation measurements by digital image correlation: Implementation of a second-order displacement gradient. Exp. Mech. 2000, 40, 393–400. [CrossRef] Hirschmüller, H. Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 328–341. [CrossRef] [PubMed] Scharstein, D.; Szeliski, R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vision 2002, 47, 7–42. [CrossRef] © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

A High Spatial Resolution Depth Sensing Method Based on Binocular Structured Light.

Depth information has been used in many fields because of its low cost and easy availability, since the Microsoft Kinect was released. However, the Ki...
8MB Sizes 0 Downloads 11 Views