Passive depth estimation using chromatic aberration and a depth from defocus approach Pauline Trouvé,1,* Frédéric Champagnat,1 Guy Le Besnerais,1 Jacques Sabater,2 Thierry Avignon,2 and Jérôme Idier3 1

ONERA-The French Aerospace Lab, F-91761 Palaiseau, France

2 3

Institut d’Optique Graduate School, 2 Avenue Augustin Fresnel, RD128 91127 Palaiseau, France

LUNAM Université, IRCCyN (UMR CNRS 6597) BP 92101, 1 rue de la Noë, 44321 Nantes Cedex 3, France *Corresponding author: [email protected] Received 19 June 2013; revised 30 August 2013; accepted 3 September 2013; posted 10 September 2013 (Doc. ID 192491); published 9 October 2013

In this paper, we propose a new method for passive depth estimation based on the combination of a camera with longitudinal chromatic aberration and an original depth from defocus (DFD) algorithm. Indeed a chromatic lens, combined with an RGB sensor, produces three images with spectrally variable in-focus planes, which eases the task of depth extraction with DFD. We first propose an original DFD algorithm dedicated to color images having spectrally varying defocus blurs. Then we describe the design of a prototype chromatic camera so as to evaluate experimentally the effectiveness of the proposed approach for depth estimation. We provide comparisons with results of an active ranging sensor and real indoor/outdoor scene reconstructions. © 2013 Optical Society of America OCIS codes: (110.0110) Imaging systems; (110.1758) Computational imaging; (100.0100) Image processing; (100.3190) Inverse problems. http://dx.doi.org/10.1364/AO.52.007152

1. Introduction

Imaging devices with depth estimation ability, also referred to as RGB-D cameras, have a large field of applications, including robot guidance for civilian and military applications, man–machine interface for game consoles or smartphones, and 3D recording. These systems are usually based either on stereoscopy, using two cameras with different viewpoints, or on infrared (IR) active ranging systems, using laser pulses as in LIDAR scanners and time-of-flight (TOF) cameras, or projected light patterns such as the Kinect, developed by PrimeSense. Here we focus on a passive depth estimation method using a single “chromatic camera,” i.e., a camera with a lens having an accentuated longitudinal chromatic aberration. 1559-128X/13/297152-13$15.00/0 © 2013 Optical Society of America 7152

APPLIED OPTICS / Vol. 52, No. 29 / 10 October 2013

We first describe an original chromatic depth from defocus (DFD) algorithm and show its efficiency on simulated data. Then we present the design and the realization of a real prototype of chromatic camera and combine it with our algorithm so as to produce estimated depth maps. We evaluate the depth estimation performance on real textured scenes and demonstrate an accuracy better than 10 cm in the range of 1–3 m for a camera with focal length of 25 mm and an f-number of 4. A. Depth from Defocus

DFD is a passive depth estimation method based on the relation between defocus blur and depth [1]. Indeed, as illustrated in Fig. 1, if a point source is placed out of the in-focus plane of an imaging system, its image, which corresponds to the point spread function (PSF), has a size given by the geometrical relation

Fig. 1. Illustration of the DFD principle.

   1 1 1  ε  Ds − − ; f d s

(1)

where f is the focal length, D is the lens diameter, and d and s are, respectively, the distance of the point source and the sensor with respect to the lens. Knowing f and s, the depth d can be inferred from a local estimation of the PSF size, or in other words, of the local amount of blur. Compared to the traditional stereoscopic ranging system, a DFD approach requires only a single camera, and thus leads to a more compact and simple experimental setting. Moreover, compared to a recent single lens stereoscopic approach such as in [2], where the aperture is divided within three RGB color filters to create a parallax effect between the RGB channels, it does not reduce the signal-tonoise ratio (SNR) with an aperture division. Finally, compared to a light-field camera [3], DFD requires a simpler optical design and has no issue of angular resolution versus spatial resolution. Besides, one advantage of a 3D camera with DFD compared to active ranging systems is that it can be used in outdoor as well as in indoor situations. Indeed, an active ranging system, such as the Kinect or TOF camera, projects an IR signal that can be disturbed by the IR illumination of the sun. On the other hand, DFD has two fundamental drawbacks illustrated in Fig. 2. First, as shown in Fig. 2(a), there is a depth ambiguity on both sides of the in-focus plane because two different depths lead to the same PSF size. In addition, Fig. 2(b) illustrates that near the in-focus plane region, no blur variation can be observed, due to a PSF size below one pixel. That region corresponds to the depth of field (DoF) and can be considered as a dead zone for depth estimation by DFD. Like all passive depth estimation techniques, DFD requires the scene to be sufficiently textured. However, this scene is generally unknown. A first family of DFD techniques then uses several images of the same scene to estimate the relative local blur. In [1,4–6] the images are obtained with different lens settings, an approach that requires the scene to be static between the successive acquisitions. To avoid this constraint, a first solution is to separate the input beam. For example, a mirror beam splitter is used in [7]. In [8] a programmable aperture involving an LCoS is used to acquire two images with different aperture shapes at a high rate. In [9] a system is

Fig. 2. Theoretical blur variation ϵ given by Eq. (1) with respect to depth, for a conventional imaging system with a focal length of 25 mm, an f-number of 3, and a sensor pixel size of 5 μm, and with the in-focus plane put at 2 m. (a) Illustration of depth estimation ambiguity. (b) Illustration of the dead zone in the DoF region.

proposed that uses a spatial light modulator to produce two images successively, the first one with a constant blur from which the scene is estimated, and the second one with a depth-dependent defocus blur from which depth is estimated. However, all the latter solutions imply sophisticated optical systems, which increases the size of the camera, or the design and manufacturing difficulty. Another family of DFD techniques uses a single image. Acquisition is then easier, but the processing is more difficult due to the ambiguity between scene and blur. Various scene models have been used [10–15], and some of them are based on a supervised learning step [12,13]. Note that an important topic in single image DFD is the use of a coded aperture whose shape has an influence on depth estimation accuracy and can thus be optimized [11–13]. However, in most coded aperture approaches developed in the literature [11–13], depth ambiguity remains, because there is still a unique in-focus plane. Note that a nonsymmetric aperture could avoid this ambiguity, but they are not considered as the most accurate [12]. Besides, due to the DoF of the lens, there is still a dead zone near the in-focus plane. Another interesting approach proposed in [16] consists of using a color circular spectral filter inside the aperture in order to capture simultaneously three RGB images with different aperture radii: one for the green channel and another for the two red and blue channels. As shown in Fig. 3(a), each depth is then characterized by two blur radii, which increases the accuracy of depth estimation [16]. However, once again, this approach is subject to ambiguity and a dead zone around the single in-focus plane. To summarize, there is still a need for a single acquisition DFD system that could provide depth estimation with good accuracy over a large range 10 October 2013 / Vol. 52, No. 29 / APPLIED OPTICS

7153

Fig. 3. Example of theoretical blur variation ε given by Eq. (1), with respect to depth for RGB channels: (a) in the case of the chromatic aperture of [16], and (b) in the case of a chromatic lens. In (a) the focal length is 25 mm, and the f-number of the green channel is 4.5, while it is 3 for the red and blue channels. For the chromatic lens (b), the green channel focal length is 25 mm, and the RGB channels are respectively focused at 1.5, 2, and 3 m, with an f-number of 3. In both cases the sensor pixel size is 5 μm.

without dead zone or ambiguity. In this paper we describe such a system based on the use of a lens with longitudinal chromatic aberration, and an original chromatic DFD algorithm. B. Depth from Defocus with a Lens with Chromatic Aberration

Longitudinal chromatic aberration leads to spectrally varying in-focus planes of the lens. Thus a chromatic lens combined with a color sensor produces, in a single snapshot, three images with varying defocus blur. The benefit that can be gained from this kind of system can be seen in Fig. 3(b), which shows the variation of the PSF size for the three RGB channels of a chromatic imaging system. First, for each depth there is a unique triplet of PSFs. Therefore we avoid the depth ambiguity that exists in single in-focus plane DFD. In addition, each channel has a different in-focus plane, so the dead zone of a channel can be compensated by using blur variation in the other two channels. Hence with such a system depth estimation can potentially have good and homogeneous accuracy over a large range. Finally, as chromatic aberration is often corrected at the expense of additional lenses such as diffractive lenses, removing the achromatic constraint makes the optical design simpler, as there are fewer elements to optimize, and can lead to a more compact system. C.

Related Works

Depth estimation from chromatic aberration has already been proposed by Garcia et al. in [17], but their algorithm for local blur identification is restricted to step edges; hence they do not obtain depth images and provide no experimental evaluation of the depth estimation performance. 7154

APPLIED OPTICS / Vol. 52, No. 29 / 10 October 2013

On the other hand, chromatic aberration has been proposed for DoF extension, for grayscale scenes in [18,19] and for color scenes in [20–22]. In [18,21,22] a relative sharpness measure is used to locally identify the sharpest channel and to add the higher frequencies of the sharpest channel onto the blurred channels, a technique that is referred to as “high frequency transfer.” The relative sharpness measure can lead to a classification of the scene into three regions: near, medium range, and far. However, none of these references propose a solution for depth imaging nor study the performance of depth estimation. Note that chromatic aberration is also exploited in microscopy to estimate a 3D map of the surface of the observed object. For example, in [23], a chromatic lens and a RGB sensor are used for phase estimation with the transport of intensity equation. Another example is the use of chromatic aberration in spectral confocal microscopy [24]. Regarding algorithm issues, let us first note that classical multi-image DFD techniques [1,4–6] cannot be used for DFD with a chromatic lens because they do not account for the partial correlation between the color channels: this point is detailed in Section 2. In [16] the proposed DFD algorithm is dedicated to the processing of color images having spectrally varying defocus blur. However, the processing in [16] relies on the assumption that the same channel always has the lowest blur level, whatever the depth. Such an approach cannot be applied to the case of DFD with a chromatic lens, because the channel having the lowest blur level varies with depth, as shown in Fig. 3(b). D.

Contributions and Paper Organization

Our first contribution is an original depth estimation algorithm dedicated to chromatic systems presented in Section 2. Section 3 presents validations of the proposed algorithm carried out on simulated images from a set of natural scenes. Our second contribution is the design and the realization of a chromatic lens, from which a prototype of RGB-D camera has been built. Using this prototype and the proposed chromatic DFD algorithm, we present an experimental evaluation of depth estimation accuracy in Section 4. (Some of the depth estimation results were presented in [25].) Depth maps obtained on real 3D scenes are shown and compared to the results of an active ranging system. Discussion and concluding remarks are given in Sections 5 and 6. 2. DFD Algorithm Dedicated to Chromatic Camera A. Algorithm Overview

DFD can be related to blind deconvolution as both the scene and the PSF are unknown. Moreover, as depth varies in the image, one has to deal with spatially varying blur, which means that identification has to be done locally with a very limited number of data. Such a severely underdetermined problem requires additional assumptions on the scene and

on the PSF. In [14] we have presented an unsupervised single image DFD method based on a very simple one-parameter Gaussian prior distribution of the scene and that assumes that the PSF belongs to some finite set of known candidate PSFs. Each candidate PSF accounts for the defocus associated to a particular depth dk , according to a preliminary step of calibration or simulation of the imaging system at hand. The flowchart of this algorithm is presented in Fig. 4. The image is decomposed in patches where depth is assumed constant. For each patch the problem reduces to the optimization of a cost function over two parameters: ˆ  arg min GLdk ; α: dˆk ; α k;α

(2)

The first one is a regularization parameter α that accounts for the local SNR, and the second is the depth dk of the patch. The cost function GL is a generalized likelihood (GL) obtained by marginalizing out the (unknown) scene. Note that, as shown in the flowchart, homogeneous patches are rejected because they are insensitive to depth. We adopt the same general approach and flowchart for the case of the chromatic camera, but derive a new GL cost function that requires two nontrivial modifications with respect to [14]. First, a depth value is now related to a triplet of PSFs (one per RGB channel) instead of a unique one in the single image DFD case of [14]. Second, a scene prior distribution has to be specified for the three channels, accounting for interchannel RGB correlations. In this section we introduce these modifications incrementally. We first consider, in Section 2.C, the case of color images affected by a spectrally varying blur, but assuming a total correlation between the RGB recorded scenes. The resulting algorithm is referred to as grayscale chromatic DFD (GC-DFD) since it amounts to considering grayscale scenes. The scope for such an algorithm is, for instance,

barcode and QR-code reading [19], but wider applicability requires the development of a truly RGB scene prior that models the partial correlation between the RGB channels. The related criterion, which leads to the color chromatic DFD (CC-DFD) algorithm, is derived in Section 2.D. B. Observation Model

The relation between the scene and the recorded image is usually modeled as a convolution with the PSF. In the general case, defocus blur varies spatially in the image and this model is only valid on image patches, where the PSF is supposed to be constant. In the case of a lens having spectrally varying defocus blur combined with a color sensor, each RGB channel has a different PSF. Using the matrix formalism on image and scene patches, this case can be modeled as 2 3 2 32 3 yR xR 0 0 H R d 6 7 6 76 7 Y  4 yG 5  4 0 0 54 xG 5  N H G d 0

yB

0

H B d

xB

 H C X  N;

(3)

where yR ; yG ; yB  and xR ; xG ; xB  represent the concatenation of the pixels of three RGB scenes and image patches, respectively. N stands for the noise that affects the three channels. Each H c d is a convolution matrix that depends on the PSF of the channel c and on the depth d. As we consider small patches, care has to be taken concerning boundary hypotheses: the usual periodic boundary assumption is not suited here. In the following, we use “valid” convolutions in which the support of xc is enlarged with respect to the one of yc according to the PSF support [26, Section 4.3.2]. Thus, if N is the length of the vector yc, and M is the length of xc, with M > N, each H c d is an N × M convolution matrix. Assuming that the noise is a zero-mean white Gaussian process with variance σ 2n , the data likelihood is pYjX; σ 2n 

C.

 2π

−N∕2

σ 2n −N∕2

  ‖Y − H C X‖2 : exp − 2σ 2n (4)

Grayscale Scene

In the case of a grayscale scene acquired with a chromatic camera, each RGB image actually originates from the same scene so xR  xG  xB  x. Thus the observation model reduces to 2

yR

3

2

H R d

3

7 6 7 6 Y  4 yG 5  4 H G d 5x  N  H gC x  N: Fig. 4. Generic flowchart of the DFD algorithm for one image patch.

yB

(5)

H B d

10 October 2013 / Vol. 52, No. 29 / APPLIED OPTICS

7155

1. Scene Model We propose to use, as in [14], a Gaussian prior written as   ‖Dx‖2 2 px; σ x  ∝ exp − ; (6) 2σ 2x with D a matrix composed of the concatenation of the convolution matrices corresponding to the vertical and horizontal first-order derivatives, i.e., the convolution matrices relative to filters  −1 1  and  −1 1 T . This model, which can be physically interpreted as a 1∕f 2 decrease of the scene spectrum, has previously shown good results in single image blur identification [14,27]. Note that matrix D is singular, as D1  0. In such a case, the scene prior is said to be improper. Improper models are not uncommon in statistics, the most famous example being the Brownian motion; moreover, as discussed in the next section, depth inference can still be derived from such a prior [28–30]. 2. Generalized Likelihood Derivation In the general case of proper prior distributions, the probability of observed data Y is obtained by integrating out or marginalizing the prior px; σ 2x : Z pYjH gC d; σ 2n ; σ 2x   pYjx; H gC d; σ 2n px; σ 2x dx: (7) Because px; σ 2x  is improper, the distribution pYjH gC d; σ 2n ; σ 2x  defined by Eq. (7) is also improper. It is shown in [30] that an extended definition of the marginal probability can be derived, which yields the following marginal likelihood: 1

pYjH gC d; σ 2n ; α ∝ jQα; d; σ 2n j2 e−

Y T Qα;d;σ 2 n Y 2

;

(8)

with jAj corresponding to the product of the nonzero eigenvalues of a matrix A and Qα;d;σ 2n  1  2 I N;N − H gC dH gC dT H gC d  αDT D−1 H gC dT ; σn (9) where α  σ 2n ∕σ 2x is a regularization parameter that accounts for various SNR in different image patches, and I is the N × N identity matrix. In statistics, Q is referred to as the precision matrix and when Q is regular, it corresponds to the inverse of the covariance matrix. To reduce the number of parameters, the likelihood is maximized with respect to σ 2n in order to deal with a generalized likelihood (GL) that depends only on H gC d and α [26, Section 3.8.2]. This maximization leads to σˆ 2n  Y T Pα; dY∕3N − n, where n is the number of zero eigenvalues of the matrix Pα; d defined as 7156

APPLIED OPTICS / Vol. 52, No. 29 / 10 October 2013

Pα; d  I N;N − H gC dH gC dT H gC d  αDT D−1 H gC dT : (10) As shown in [30], the number of zero eigenvalues in P is equal to that of D. As discussed in the previous section, D has a single zero eigenvalue associated with the eigenvector 1. Thus n  1, and reporting the expression of σˆ 2n in Eq. (8) we obtain 1

pYjH gC d; α ∝ jPα; dj2 Y T Pα; dY−

3N−1 2

:

(11)

Maximizing Eq. (11) is equivalent to minimizing the GL function: GLG d; α 

Y T Pα; dY 1∕3N−1 jPα; dj

:

(12)

Note that this expression is formally identical to the one proposed by Wahba in the context of spline smoothing [28]. We borrow the term GL from this reference. D.

Color Scene

Actually, the RGB image originates from three distinct scenes: xR , xG , and xB , which are partially correlated. We propose to use the luminance and the red–green and blue–yellow chrominance decomposition xl ; xc1 ; xc2  defined as 2

3 3 2 xl xR 6 7 7 6 4 xG 5  T ⊗ I M;M  X LC  T ⊗ I M;M  4 xc1 5; (13) xc2 xB where 2

p1 3

6 6 T  6 p13 4 p1 3

−1 p 2 1 p 2

−1 p 6

0

p2 6

3 7

−1 7 p ; 67 5

(14)

⊗ stands for the Kronecker product. The observation model then writes Y  H cC dX LC  N;

(15)

H cC d  H C dT ⊗ I M;M :

(16)

1. Scene Model According to [31], the three components of the luminance/chrominance decomposition can be assumed independent. We then use the Gaussian prior (6) on each luminance and chrominance component, which leads to

  ‖D μXLC ‖2 pX LC ; σ 2x ; μ ∝ exp − C 2 2σ x 2 p 3 μD 0 0 6 7 with DC  4 0 D 0 5: 0

0

which is discussed in Section 3.B) or automatically set by the algorithm. F.

(17)

D

Akin to [31], our prior incorporates a parameter μ that models the ratio between the gradient variances of the luminance and the chrominance components. 2. Generalized Likelihood Derivation Following the derivation of Section 2.C, maximum likelihood estimation of d; α is equivalent to minimizing the GL function: GLC d; α 

Y T Pα; dY 1∕3N−3 jPα; dj

;

(18)

with Pα; d  I N;N − H cC dH cC dT H cC d  αDTC DC −1 H cC dT : (19) In this case the number of eigenvalues of Pα; d is n  3, one for each luminance and chrominance component. E.

Algorithm Implementation

The two GC-DFD and CC-DFD algorithms have the same flowchart as the one illustrated in Fig. 4. For each RGB image patch, the proposed algorithm selects a depth among a discrete set of potential depths fd1 ; …; dK g. Each depth dk is associated with a triplet of PSFs, i.e., fH R dk ; H G dk ; H B dk g obtained by calibration, or by simulation, for example using an optical design software. The proposed selection criterion for both algorithms is ˆ  arg min GLX dk ; α; dˆk ; α k;α

(20)

where X refers to either G or C for the GC-DFD and CC-DFD algorithms. To limit the computational burden, we use an efficient computation of the criterion using a generalized singular value decomposition (GSVD), as in [14]. We use square patches; however, other shapes or even adaptation of the patch shape to the spatial features of the scene could be considered but lead to a higher computational cost. Note that to reject homogeneous patches, we use a Canny edge detector with an adaptive threshold. A patch is processed only if the Canny filter detects an edge in it; otherwise the patch is rejected. Finally, for both GC-DFD and CC-DFD the only parameters that have to be set by the user are the patch size and, if needed, the patch overlapping. All other parameters are either fixed (such as the parameter μ of CC-DFD,

Variation of Color Sensor Type

The proposed formalism allows us to deal with the case of a 3CCD sensor, a layered photodiode sensor, or a sensor with a color filter array (CFA) [32]. For a 3CCD sensor, the input beam is separated, according to wavelength, on three different sensors of identical resolution. The layered photodiode sensor uses the wavelength-dependent absorption length of a semiconductor. Both systems provide three full-resolution RGB images that can be directly superposed. However, most color cameras use a unique sensor with a CFA. For instance, the classical Bayer CFA is made up of 2 × 2 basic cells with one red, one blue, and two green filters. In this case, the recorded image can be decomposed into three undersampled red, green, and blue images. Such undersampling is easily taken into account in our model by removing the lines corresponding to missing pixels from convolution matrices H R , H G , and H B in Eq. (3). 3. Simulation Study

To simulate a chromatic lens, a set of triplets of PSFs is built, each triplet being related to a depth. The PSFs are supposed to be Gaussian with a standard deviation defined by σ  ρϵ, where ϵ is given by Eq. (1) and ρ  0.65. The focal lengths of the RGB channels are respectively, [25.06, 25.00, 24.81] mm, the lens diameter D is 6.3 mm, the sensor position s is 25.22 mm, and the pixel size is 7.4 μm. The resulting in-focus planes of the RGB channels are respectively at 4, 2.9, and 1.5 m. The set of potential PSF triplets is built for depth varying from 1.2 to 3.8 m with a step of 5 cm. In the following, we simulate images acquired by this ideal chromatic lens, using the computed PSFs triplet set applied to a set of natural color scenes. For each scene, composed of three RGB components, we build three noisy blurred images using a triplet from the PSFs triplet set associated with the true depth. We assume a fronto-parallel scene, i.e., a constant depth over the whole image support, and choose successively several depths ranging from 1.3 to 3.5 m. The acquisition noise is modeled by a zero-mean white Gaussian pseudo-random noise of standard deviation 0.05, given that the scenes have a normalized intensity. Depth is then estimated on image patches of size 20 × 20 pixels extracted from the three RGB images. In this simulation, without loss of genericity, a 3CCD sensor is considered. A. Comparison of GC-DFD and CC-DFD on Simulated Images

Both GC-DFD and CC-DFD algorithms are tested on two sets of scenes presented in Fig. 5. The first one is made of grayscale scenes, and the second one is made of color scenes. To simulate images from the chromatic lens and the grayscale scenes, we convert the color scene into a grayscale scene and use it to 10 October 2013 / Vol. 52, No. 29 / APPLIED OPTICS

7157

25 0 1.2

2.4 3.6 Depth in m

50 25 0 1.2

2.4 3.6 Depth in m

(a)

0 1.2

2.4 3.6 Depth in m

50

0 1.2

2.4 3.6 Depth in m

(b)

GC−DFD algorithm Fig. 5. Comparison of CC-DFD and GC-DFD algorithms using error metric and standard deviation (std) computed over a collection of (a) grayscale scenes and (b) color scenes.

produce each channel image. For the CC-DFD algorithm the value of μ is fixed at 0.04, a value empirically determined in [31]. The depth estimation results for both algorithms and both sets are shown in Fig. 5. Figure 5(a) shows that for the set of grayscale scenes either the GC-DFD or the CC-DFD algorithm leads to an error metric and a standard deviation under 10 cm, a value that is close to the depth sampling step of 5 cm used in the PSF set construction. Despite the fact that it is a grayscale scene, the CC-DFD algorithm gives results very close to the GC-DFD. Moreover, Fig. 5(b) shows that the GC-DFD algorithm on color scenes leads to unsatisfactory results with a significant error metric on small depths and large standard deviation. In contrast the CC-DFD algorithm shows far better results for these color scenes. These simulation results illustrate that classical DFD techniques, which rely as the GC-DFD algorithm on the assumption that the blurred images originate from the same scene, are prone to bad estimation results when used with images of a color scene from a chromatic lens. This justifies the need for the algorithm based on the modelization of the correlation between the RGB channels, such as the proposed CC-DFD. Setting of the Parameter μ

The depth estimation tests conducted in Section 3.A on the CC-DFD algorithm with color scenes are reproduced for different values of μ. Table 1 shows the mean error metric and mean standard deviation 7158

0.4 7.0 11

0.04 5.5 8.3

0.004 7.2 12

0.0004 9.1 16

for depths varying from 1.3 to 3.5 m for five values of μ. The value of μ that gives the lowest error metric and standard deviation is 0.04. This result is consistent with the value empirically chosen in [31]. This value is fixed for all the other experiments of the paper. C. Chromatic Versus Achromatic Lens Depth Estimation Performance Study

25

CC−DFD algorithm

B.

1 9.1 14

25

75 Std (cm)

Std (cm)

75

μ Mean err in cm Mean std in cm

50

APPLIED OPTICS / Vol. 52, No. 29 / 10 October 2013

In this section, we use the CC-DFD algorithm on simulated images for two imaging systems having either a chromatic or an achromatic lens. The chromatic lens is identical to the one simulated in Section 3.A. The achromatic lens has the same focal length and f-number, i.e., respectively, 25 mm and 4, but has a single in-focus plane that is put at 1.5 m in order to have no depth estimation ambiguity nor dead zone in the range of 2 to 3 m. The set of potential PSF triplets for each lens is built for depth varying from 2 to 4 m with a step of 5 cm. For each imaging system, we generate 120 image patches of size 20 × 20 pixels using scenes patches extracted from natural color scenes presented in Fig. 5(b). For each depth and each imaging system, the RGB images are obtained by convolution of scene patches with the corresponding PSFs triplet. White Gaussian noise is added to the result with a standard deviation of 0.05, given that the scenes have a normalized intensity. Figure 6 shows the error metric and the standard deviation of the depth estimation obtained with the CC-DFD algorithm. The error metrics of both imaging systems are both low; however, the Error metric in cm

50

20 CL AL 10

0 2

2.5

3

3.5

Depth in m 150

Std in cm

Error metric (cm)

Error metric (cm)

Table 1. Mean Error Metric (Err) and Standard Deviation (Std) of Depth Estimation Results over a Set of True Depths Varying from 1.3 to 3.5 m with a Step of 20 cm, for Various Values of μ

CL AL

100 50 0 2

2.5

3

3.5

Depth in m Fig. 6. Comparison of CC-DFD algorithm using either an achromatic lens (AL) or a chromatic lens (CL). Lens parameters for both systems are given in Section 3.C.

4. Experimental Validation on a Real Chromatic Camera

To experimentally demonstrate the efficiency of the proposed approach, we have built a prototype camera with a chromatic lens. The design of the lens and the chosen calibration procedure are first presented. Then we demonstrate the depth estimation capability of our chromatic approach using this prototype with the CC-DFD algorithm. A.

Blur radius in pixel (ε)

standard deviation with the chromatic lens is much lower than with an achromatic one. This study illustrates the performance gain in terms of accuracy of using chromatic aberration for depth estimation.

B

15

G 10

R

5

0

1

2

3

4

5

Depth in m Fig. 7. Theoretical blur variation with respect to depth for the designed chromatic camera.

Lens Design

1. Settings We have designed a customized optical system using the optical design software Zemax. We use a commercially available 5 megapixel CFA Stingray color sensor with a pixel size of 3.45 μm. The lens focal length is fixed at 25 mm, and the depth estimation range is chosen from 1 to 5 m, to conduct both indoors and outdoors tests. These settings lead to an optic with a diagonal field of view of around 25°. 2. Design Process Given the lens settings, our aim was to build a lens with sufficient longitudinal chromatic aberration to have separated RGB DoF in the required depth range, and thus avoid dead zones, as explained in Section 1.A and Fig. 3(b). On the other hand, all other aberrations should be reduced as much as possible, in order to maintain good image quality, and in particular lateral chromatic aberration, because it causes misalignment of the RGB images. The design of a lens with reduced field aberrations for such a field of view requires us to use at least a lens triplet. We choose to start with a classical 25 mm f ∕4 triplet made of two convergent lenses separated by a divergent lens, which is the triplet aperture stop, because this configuration naturally helps to reduce lateral chromatic aberration. In the classical optical design, the lens glasses would usually be chosen so that their constringence difference reduces longitudinal chromatic aberration, for instance, respectively, crown, flint, and crown glasses. Hence, the amount of longitudinal chromatic aberration can be tuned by changing the triplet glasses. We have compared several choices of glass for the triplet, and we found that a focal shift of 200 μm, obtained with the glasses N-BK7/LLF1/N-BK7, was an amount of chromatic aberration that correctly separates the RGB DoF in the depth range of 1 to 5 m. Indeed, as illustrated in Fig. 7, with this triplet, when the green channel is focused around 2.8 m, the in-focus planes of the blue and red channels are approximately at 1.9 and 4.5 m and the camera always has at least one channel whose blur size is above one pixel in the range 1 to 5 m. Note that the pixel unit in this figure is twice that of the detector pitch in order

to account for Bayer matrix undersampling. Figure 8 shows the focal shift between the red and the blue wavelengths and the lateral chromatic aberration of the obtained triplet. As expected, the maximal lateral chromatic aberration is of 1 μm, which is less than the sensor pixel size. 3. Lens Realization Figure 9 shows the obtained triplet, and the color rays corresponding to different point sources at various field angles. The specifications of the triplet were sent to an optical lens manufacturer who built the lenses. We have mounted the obtained lens on a customized mechanical structure, and the optical system was fixed in front of the Stingray color sensor. A picture of the obtained camera is presented in Fig. 10. B. Lens Calibration

The CC-DFD algorithm requires a set of potential defocus blur triplets. To build such a set, one could use several techniques, such as modeling the PSF with a pill-box function, with Fourier optics formula [33], or using simulated PSF generated by an optical

Fig. 8. Focal shift and lateral chromatic shift between the red and the blue wavelengths of the proposed chromatic lens estimated by the optical design software Zemax. 10 October 2013 / Vol. 52, No. 29 / APPLIED OPTICS

7159

(a)

R

Fig. 9. Chromatic lens scheme given by Zemax, according to the optical design of Section 4.A. Color rays correspond to point sources at different field angles.

design software with the theoretical lens settings. However, these methods do not take into account aberrations, misalignment, or mechanical construction errors. A solution is to calibrate the PSFs of the actual camera prototype using either a point source or a known high frequency pattern [12,34,35]. We use the method proposed in [35], with the blackand-white pattern shown in Fig. 11(d), for depths varying from 1 to 5 m with a step of 5 cm. As the triplet still suffers from field aberrations that imply a PSF variation with field angle, we build an on-axis and off-axis potential PSFs set. Figure 11 shows on-axis calibrated PSFs at three different depths for the three RGB channels. Note that the comparison of depth estimation results using different calibration techniques remains an interesting subject for further studies. Indeed, recent works propose to use only few calibrated PSFs to estimate any PSF at any range [36,37]. Implementation of these methods could also be interesting here in order to limit the number of snapshots required for PSF calibration. C.

Depth Estimation Accuracy Evaluation

Acquisitions are made of color textured plane frontoparallel scenes put at different distances from the camera. For each scene and at each distance, depth is estimated with the CC-DFD algorithm on image patches of size 21 × 21 inside a centered region of size 120 × 120, where the PSF is supposed to be constant, with a patch overlapping of 50%. Figure 12 shows depth estimation results for the four scenes presented in Fig. 12. We plot the mean and the standard deviation of the estimated depth results for each scene position with respect to the ground truth given

G

B

(b)

(c)

(d)

Fig. 11. (a) Random pattern of [35] used for the RGB PSFs calibration. (b)–(d) Examples of calibrated PSFs at (b) 4.7 m, (c) 2.7 m, and (d) 2 m.

True depth

Estimation

5 4 3 2 1 1

2.5

4

2.5

4

2.5

4

2.5

4

5 4 3 2 1 1 5 4 3 2 1 1 5 4 3 2

Fig. 10. Customized chromatic lens (left) has been mounted on a Stingray color sensor to obtain a chromatic camera (right) of dimensions 4.5 × 4.5 × 8 cm. 7160

APPLIED OPTICS / Vol. 52, No. 29 / 10 October 2013

1 1

Fig. 12. Evaluation of depth estimation accuracy on real frontoparallel color scenes. Axes are in m.

True depth

Estimation

5 4 3 2 1

1

2

3

4

5 4 3 2 1

1

2

3

4

5 4 3 2 1

1

2

3

4

Fig. 13. Evaluation of depth estimation accuracy on real frontoparallel grayscale scenes. Axes are in m.

by the Kinect. Indeed, the Kinect accuracy ranges from a few millimeters at 1 m to 3 cm at 3 m [38], which is lower than the sampling of depths in our potential set. For each scene, bias is comparable to the PSF calibration step (5 cm) and standard deviation varies from 3 to 10 cm between 1 and 3.5 m. The same experiment was repeated with the same targets printed in grayscale, leading to very similar results, presented in Fig. 13. These results demonstrate that the CC-DFD algorithm combined with a chromatic camera can provide accurate depth estimation and is robust to various scene textures. Note, however, that both bias and standard deviation degrade after 3.5 m. One can explain this degradation by looking at the theoretical PSF size variations of the prototype lens presented in Fig. 7. First, the variation of the PSF size reduces with depth, so it is not surprising that the accuracy of depth estimation reduces as well. Besides, the PSF of the red channel is below one pixel after 3.5 m, so this channel no longer gives depth information, which reduces the performance. D.

Depth Maps

1. Comparison with an Active Ranging System for Indoor Scenes Figure 14 shows three examples of depth maps obtained from images acquired with our prototype of chromatic camera using the CC-DFD algorithm with 21 × 21 square patches and 50% overlapping. As they are indoor scenes, the depth maps obtained

Fig. 14. Results of the prototype chromatic camera on indoor scenes. (a) Raw image acquired with the chromatic lens. (b) Kinect’s depth map. (c) Raw depth map with CC-DFD with a patch size of 21 × 21 and 50% overlapping. The depth labels are in m. Black label corresponds to homogeneous regions rejected by the algorithm.

with the CC-DFD algorithm can be compared to the depth maps given by the Kinect. Homogeneous regions, where depth can not be estimated, appear in black in the depth map. In the first two examples, the estimated depth maps are noisier but consistent with the Kinect depth map. The third example is a case in which the Kinect depth map locally shows outliers. This is due to occultations and thin surfaces where the Kinect pattern does not reflect properly. In contrast, the proposed chromatic DFD camera correctly estimates a depth map for such a complex scene. 2. Outdoor Situations In contrast with the Kinect that is sensible to the sun IR illumination, the proposed chromatic camera can also be used for outdoor depth estimation. Figure 15 shows examples of raw depth maps obtained in various outdoor situations with the CC-DFD algorithm with 9 × 9 square patches and 50% overlapping. In the estimated depth maps, various objects can be distinguished, including thin ones like the grid in the second example. Note that this kind of repetitive object misleads the correlation technique, so would be difficult to identify with a stereoscopic device. 5. Discussion A. Robustness of Depth Estimation with Respect to Variation of the Scene Spectrum

In the proposed DFD approach, we take advantage of the interchannel chromatic aberration in order to estimate depth from a set of potential calibrated PSFs triplets. However, the object spectrum can have an influence on the actual PSF size due to intrachannel chromatic aberration. One option to reduce the PSF variability within a color channel would be to use a sensor with narrowband color filter. However, this approach would increase the complexity of the 10 October 2013 / Vol. 52, No. 29 / APPLIED OPTICS

7161

(a)

(b)

2

1.5

(c) 1

Fig. 15. Results of the prototype chromatic camera on outdoor scenes. (Left) Raw image acquired with the chromatic lens. (Right) Raw depth map with CC-DFD with a patch size of 9 × 9 and 50% overlapping. The depth labels are in m. Black label corresponds to homogeneous regions rejected by the algorithm.

device realization and reduce the SNR. Here we opt for an off-the-shelf RGB sensor having broadband color filters. Then we assume that the observed scenes have a sufficiently large spectrum to be consistent with PSFs calibrated with a black-andwhite pattern as described in Section 4.B. The experimental results obtained with various color or grayscale scenes show that this assumption is valid enough to give an accurate depth estimation. Finally, in the limit case of an object emitting only in the spectrum of one channel, two channels out of three will receive no signal, which will probably reduce the CC-DFD efficiency. However, in our opinion this case is unlikely to happen in natural scenes, and even so it could be detected with a preprocessing so as to use a single image DFD approach [14] on a valid channel. B.

Image Restoration

Chromatic aberration induces an inhomogeneous resolution among the RGB channels. Thus, as illustrated in Fig. 16(a), the raw RGB image requires restoration. The issue with a chromatic lens is that blur is both spatially and spectrally variant. As the CC-DFD algorithm provides a depth value, hence a triplet of PSFs, for each local RGB patch one 7162

APPLIED OPTICS / Vol. 52, No. 29 / 10 October 2013

(d)

(e)

(f)

Fig. 16. Example of restoration of a real image acquired with the prototype chromatic camera using the high frequency transfer approach of Eq. (21). (a) Raw image. (b) Image after restoration. (c) and (e) Zoomed detail patches from the raw red channel. (d) and (f) Restored detailed of the red channel.

could consider restoration by local multichannel deconvolution, although such approaches would probably be computationally demanding. On the other hand, the prototype camera has been designed so that the blur is mainly caused by defocus and that, for a large depth range, there is at least one sharp channel out of three (see Fig. 7). Thus we propose a simple and fast restoration technique by transferring the high frequencies of the sharpest channel to others. Such an approach has been proposed in [21] and is also related to pansharpening in satellite imaging. Formally the restored channel yc is the sum of the original image with a weighted sum of the high frequencies of all channels: yc;out  p  yc;in p  adp;R HPR p  adp;G HPG p  adp;B HPB p;

(21)

where HPc p is the output of a high-pass filter (difference of Gaussian) at pixel p applied on the image of the channel c. The weight of a channel c depends on the distance of the object with respect to the in-focus planes of that channel. If the object is at depth d close to the depth d0;c of the in-focus plane of a channel, its image in this channel is likely to have high frequency content. Weights are then decreasing functions of jd0;c − dj, as presented in Fig. 17. In practice, for each pixel, the weights are computed using the estimated depth provided by the CC-DFD algorithm. In homogeneous regions, the weights are set to zero. Note that the restoration equation (21) is formally similar to the one used in [21] but, in this reference, the weights are computed on the basis of a local contrast of each channel through a learned function. Figure 16 presents an example of restoration of a real image acquired with the prototype chromatic camera. The resolution gain appears

d,G

1

0.5 0

a

a

d,R

1

2

relies on a simple and fast median filter. The median filter has a size of three times the patch size. This filter helps to eliminate the outliers in the background but does not propagate depth information on homogeneous regions. The second regularization approach consists in optimizing a criterion based on a Markov random field model of the depth map: X Ed  GLC dp

0.5 0

4

Depth in m

2

4

Depth in m

(a)

(b)

a

d,B

1 0.5 0

p

2

X



4

Depth in m

p;q∈N p

 ‖yg p − yg q‖2 1 − δdp ;dq ; exp − 2σ 2 

(c)

(22)

Fig. 17. Weights for the high frequency transfer equation (21) as a function of depth.

clearly on the details of the red channel presented in Figs. 16(c)–16(f). A quantitative evaluation of the restoration gain could be the subject of further studies. C.

Depth Map Regularization

In Fig. 14, we have presented raw depth maps obtained with the CC-DFD algorithm. These raw results present few outliers, and show no depth information on homogeneous regions that are excluded from the processing. A regularization approach can be use to overcome these two issues. Here we present preliminary results of two regularization approaches. The first one, presented in Fig. 18(a),

2.5

2

1.5

where N p is a first-order neighborhood of the pixel p, dp is the estimated depth value at pixel p, and yg is the result of color image conversion in grayscale. This energy favors depth jumps located on image edges. We minimize this criterion using a graphcut algorithm [12,15,39]. Figure 18(b) presents the results obtained with λ  1.1 and σ  4.10−4 . The parameters have been chosen to propagate information over homogeneous regions and lead to a result that is close to a depth segmentation. 6. Conclusion

In this paper we have proposed a new passive method for depth estimation based on the use of chromatic aberration and a DFD approach. We have presented an algorithm that estimates depth locally using a PSFs triplet selection criterion derived from a maximum likelihood calculation. This algorithm is based on a modelization of the interchannel scene correlation, which allows estimation of depth on color scene patches. Simulated and experimental tests have illustrated the effectiveness of the proposed algorithm and provide a demonstration of the concept of chromatic DFD. There are several perspectives for this work. The calibration process could be improved so as to reduce the number of calibration images required. Regarding the processing, we are currently working on a parallel implementation of the CC-DFD algorithm. A more detailed study of image restoration and depth map regularization should also be conducted. Finally, the design of the chromatic lens could be optimized to improve the overall performance of the system using the codesign approach presented in [40]. The authors would like to thank F. Bernard and L. Jacubowiez for fruitful discussions.

1

(a)

(b)

Fig. 18. Result of depth map regularization. From up to down: acquired image, raw depth map, regularized depth map: (a) with a median filter of size 3 × 3, (b) after the minimization of Eq. (22). The depth labels are in m. Black label corresponds to homogeneous regions rejected by the algorithm.

References 1. A. Pentland, “A new sense for depth of field,” IEEE Trans. Pattern Anal. Mach. Intell. PAMI-9, 523–531 (1987). 2. Y. Bando, B. Y. Chen, and T. Nishita, “Extracting depth and matte using a color-filtered aperture,” ACM Trans. Graph. 27, 1–9 (2008). 3. A. Lumsdaine and T. Georgiev, “The focused plenoptic camera,” in Proceedings of IEEE International Conference on Computational Photography (IEEE, 2009), pp. 1–8. 10 October 2013 / Vol. 52, No. 29 / APPLIED OPTICS

7163

4. M. Subbarao, “Parallel depth recovery by changing camera parameters,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 1988), pp. 149–155. 5. C. Zhou and S. Nayar, “Coded aperture pairs for depth from defocus,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 2009), pp. 325–332. 6. P. Favaro and S. Soatto, 3D Shape Estimation and Image Restoration (Springer, 2007). 7. P. Green, W. Sun, W. Matusik, and F. Durand, “Multi-aperture photography,” ACM Trans. Graph. 26, 1–7 (2007). 8. H. Nagahara, C. Zhou, C. T. Watanabe, H. Ishiguro, and S. Nayar, “Programmable aperture camera using LCoS,” in Proceedings of the IEEE European Conference on Computer Vision (IEEE, 2010), pp. 337–350. 9. S. Quirin and R. Piestun, “Depth estimation and image recovery using broadband, incoherent illumination with engineered point spread functions,” Appl. Opt. 52, A367–A376 (2013). 10. S. Zhuo and T. Sim, “On the recovery of depth from a single defocused image,” in International Conference on Computer Analysis of Images and Patterns (Springer 2009), pp. 889–897. 11. A. Veeraraghavan, R. Raskar, A. Agrawal, A. Mohan, and J. Tumblin, “Dappled photography: mask enhanced cameras for heterodyned light fields and coded aperture refocusing,” ACM Trans. Graph. 26, 1–12 (2007). 12. A. Levin, R. Fergus, F. Durand, and W. T. Freeman, “Image and depth from a conventional camera with a coded aperture,” ACM Trans. Graph. 26, 1–9 (2007). 13. M. Martinello and P. Favaro, “Single image blind deconvolution with higher-order texture statistics,” Lect. Notes Comput. Sci. 7082, 124–151 (2011). 14. P. Trouvé, F. Champagnat, G. Le Besnerais, and J. Idier, “Single image local blur identification,” in Proceedings of IEEE Conference on Image Processing (IEEE, 2011), pp. 613–616. 15. M. Martinello, T. Bishop, and P. Favaro, “A Bayesian approach to shape from coded aperture,” in Proceedings of IEEE Conference on Image Processing (IEEE, 2010), pp. 3521–3524. 16. A. Chakrabarti and T. Zickler, “Depth and deblurring from a spectrally varying depth of field,” in Proceedings of IEEE European Conference on Computer Vision (IEEE, 2012), pp. 648–661. 17. J. Garcia, J. Sánchez, X. Orriols, and X. Binefa, “Chromatic aberration and depth extraction,” in Proceedings of IEEE International Conference on Pattern Recognition (IEEE, 2000), pp. 762–765. 18. M. Robinson and D. Stork, “Joint digital-optical design of imaging systems for grayscale objects,” Proc. SPIE 7100, 710011 (2008). 19. B. Milgrom, N. Konforti, M. Golub, and E. Marom, “Novel approach for extending the depth of field of Barcode decoders by using RGB channels of information,” Opt. Express 18, 17027–17039 (2010). 20. O. Cossairt and S. Nayar, “Spectral focal sweep: extended depth of field from chromatic aberrations,” in Proceedings of IEEE Conference on Computational Photography (IEEE, 2010), p. 1–8. 21. F. Guichard, H. P. Nguyen, R. Tessières, M. Pyanet, I. Tarchouna, and F. C. Cao, “Extended depth-of-field using sharpness transport across color channels,” Proc. SPIE 7250, 72500N (2009).

7164

APPLIED OPTICS / Vol. 52, No. 29 / 10 October 2013

22. J. Lim, J. Kang, and H. Ok, “Robust local restoration of space-variant blur image,” Proc. SPIE 6817, 68170S (2008). 23. L. Waller, S. S. Kou, C. J. R. Sheppard, and G. Barbastathis, “Phase from chromatic aberrations,” Opt. Express 18, 22817–22825 (2010). 24. S. Kebin, L. Peng, Y. Shizhuo, and L. Zhiwen, “Chromatic confocal microscopy using supercontinuum light,” Opt. Express 12, 2096–2101 (2004). 25. P. Trouvé, F. Champagnat, G. Le Besnerais, and J. Idier, “Chromatic depth from defocus, an theoretical and experimental study,” in Computational Optical Sensing and Imaging Conference, Imaging and Applied Optics Technical Papers (2012), paper CM3B.3. 26. J. Idier, Bayesian Approach to Inverse Problems (Wiley, 2008). 27. A. Levin, Y. Weiss, F. Durand, and W. Freeman, “Understanding and evaluating blind deconvolution algorithms,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2009), pp. 88–101. 28. G. Wahba, “A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem,” Ann. Stat. 13, 1378–1402 (1985). 29. A. Neumaier, “Solving ill-conditioned and singular linear systems: a tutorial on regularization,” SIAM Rev. 40, 636–666 (1998). 30. F. Champagnat, “Inference with gaussian improper distributions,” Internal Onera Report No. RT 5/14983 DTIM (2012). 31. L. Condat, “Color filter array design using random patterns with blue noise chromatic spectra,” Image Vis. Comput. 28, 1196–1202 (2010). 32. D. L. Gilblom, K. Sang, and P. Ventura, “Operation and performance of a color image sensor with layered photodiodes,” Proc. SPIE 5074, 318–331 (2003). 33. J. W. Goodman, Introduction to Fourier Optics (McGraw-Hill, 1996). 34. N. Joshi, R. Szeliski, and D. J. Kriegman, “PSF estimation using sharp edge prediction,” in Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (IEEE 2008), pp. 1–8. 35. M. Delbracio, P. Musé, A. Almansa, and J. Morel, “The nonparametric sub-pixel local point spread function estimation is a well posed problem,” Int. J. Comput. Vis. 96, 175–194 (2012). 36. Y. Shih, B. Guenter, and N. Joshi, “Image enhancement using calibrated lens simulations,” in Proceedings of IEEE European Conference on Computer Vision (IEEE, 2012), p. 42. 37. H. Tang and K. N. Kutulakos, “What does an aberrated photo tell us about the lens and the scene?” in Proceedings of IEEE International Conference on Computational Photography (IEEE, 2013), p. 86. 38. J. Chow, K. Ang, D. Lichti, and W. Teskey, “Performance analysis of low cost triangulation-based 3D camera: microsoft kinect system,” Int. Arc. Photogramm. Remote Sens. Spatial Inf. Sci. 39, 175–180 (2012). 39. J. Z. Wang, J. Li, R. M. Gray, and G. Wiederhold, “Unsupervised multiresolution segmentation for images with low depth of field,” IEEE Trans. Pattern Anal. Mach. Intell. 23, 85–90 (2001). 40. P. Trouvé, F. Champagnat, G. Le Besnerais, G. Druart, and J. Idier, “Design of a chromatic 3D camera with an end-to-end performance model approach,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition workshops (IEEE, 2013), pp. 953–960.

Passive depth estimation using chromatic aberration and a depth from defocus approach.

In this paper, we propose a new method for passive depth estimation based on the combination of a camera with longitudinal chromatic aberration and an...
3MB Sizes 0 Downloads 0 Views