Linear segmentation algorithm for detecting layer boundary with lidar Feiyue Mao,1 Wei Gong,1,* and Timothy Logan2 1

2

State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China University of North Dakota, Dept of Atmospheric Sciences, Clifford Hall Room 400, 4149 University Avenue Stop 9006, Grand Forks, ND 58202-9006, USA * [email protected]

Abstract: The automatic detection of aerosol- and cloud-layer boundary (base and top) is important in atmospheric lidar data processing, because the boundary information is not only useful for environment and climate studies, but can also be used as input for further data processing. Previous methods have demonstrated limitations in defining the base and top, window-size setting, and have neglected the in-layer attenuation. To overcome these limitations, we present a new layer detection scheme for uplooking lidars based on linear segmentation with a reasonable threshold setting, boundary selecting, and false positive removing strategies. Preliminary results from both real and simulated data show that this algorithm cannot only detect the layer-base as accurate as the simple multiscale method, but can also detect the layer-top more accurately than that of the simple multi-scale method. Our algorithm can be directly applied to uncalibrated data without requiring any additional measurements or window size selections. ©2013 Optical Society of America OCIS codes: (280.1100) Aerosol detection; (280.3640) Lidar.

References and links 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

K. N. Liou, An Introduction to Atmospheric Radiation (Academic Press, 2002). V. A. Kovalev and W. E. Eichinger, Elastic Lidar: Theory, Practice, and Analysis Methods (Wiley-Interscience, 2004). M. Feiyue, G. Wei, and M. Yingying, “Retrieving the aerosol lidar ratio profile by combining ground- and spacebased elastic lidars,” Opt. Lett. 37(4), 617–619 (2012). M. Vaughan, D. M. Winker, and K. Powell, “CALIOP algorithm theoretical basis document, part 2: Feature detection and layer properties algorithms,” (NASA Langley Research Center, Hampton, Virginia, USA, 2005). J. B. Senberg, A. Ansmann, J. M. Baldasano, D. Balis, C. B. Ckmann, B. Calpini, A. Chaikovsky, P. Flamant, A. Hgrd, and V. Mitev, “EARLINET: a European aerosol research lidar network,” in Advances in laser remote sensing, (Selected Papers of the 20th International Laser Radar Conference, 2001), pp. 155–158. F. Mao, W. Gong, S. Song, and Z. Zhu, “Determination of the boundary layer top from lidar backscatter profiles using a Haar wavelet method over Wuhan, China,” Opt. Laser Technol. 49, 343–349 (2013). F. Rocadenbosch, M. Sicard, M. N. M. Reba, and S. Tomas, “Morphological tools for range-interval segmentation of elastic lidar signals,” in IEEE International Geoscience and Remote Sensing Symposium(IGARSS), 2007), 4372~4375. S. R. Pal, W. Steinbrecht, and A. I. Carswell, “Automated method for lidar determination of cloud-base height and vertical extent,” Appl. Opt. 31(10), 1488–1494 (1992). D. M. Winker and M. A. Vaughan, “Vertical distribution of clouds over Hampton, Virginia observed by lidar under the ECLIPS and FIRE ETO programs,” Atmos. Res. 34(1-4), 117–133 (1994). Z. Wang and K. Sassen, “Cloud type and macrophysical property retrieval using multiple remote sensors,” J. Appl. Meteorol. 40(10), 1665–1682 (2001). F. Mao, W. Gong, and C. Li, “Anti-noise algorithm of lidar data retrieval by combining the ensemble Kalman filter and the Fernald method,” Opt. Express 21(7), 8286–8297 (2013). Y. Morille, M. Haeffelin, P. Drobinski, and J. Pelon, “STRAT: An automated algorithm to retrieve the vertical structure of the atmosphere from single-channel lidar data,” J. Atmos. Ocean. Technol. 24(5), 761–775 (2007). J. Gaumet, J. Heinrich, M. Cluzeau, P. Pierrard, and J. Prieur, “Cloud-base height measurements with a singlepulse erbium-glass laser ceilometer,” J. Atmos. Ocean. Technol. 15(1), 37–45 (1998).

#196329 - $15.00 USD Received 23 Aug 2013; revised 18 Oct 2013; accepted 20 Oct 2013; published 30 Oct 2013 (C) 2013 OSA 4 November 2013 | Vol. 21, No. 22 | DOI:10.1364/OE.21.026876 | OPTICS EXPRESS 26876

14. S. A. Young, “Analysis of lidar backscatter profiles in optically thin clouds,” Appl. Opt. 34(30), 7019–7031 (1995). 15. F. Mao, W. Gong, J. Li, and J. Zhang, “Cloud detection and coefficient retrieve based on improved differential zero-crossing method for Mie lidar,” Acta Opt. Sin. 30(11), 3097–3102 (2010). 16. F. Mao, W. Gong, and Z. Zhu, “Simple multiscale algorithm for layer detection with lidar,” Appl. Opt. 50(36), 6591–6598 (2011). 17. M. A. Vaughan, K. A. Powell, D. M. Winker, C. A. Hostetler, R. E. Kuehn, W. H. Hunt, B. J. Getzewich, S. A. Young, Z. Liu, and M. J. McGill, “Fully automated detection of cloud and aerosol layers in the CALIPSO lidar measurements,” J. Atmos. Ocean. Technol. 26(10), 2034–2050 (2009). 18. J. R. Campbell, K. Sassen, and E. J. Welton, “Elevated cloud and aerosol layer retrievals from micropulse lidar signal profiles,” J. Atmos. Ocean. Technol. 25(5), 685–700 (2008). 19. W. Gong, F. Mao, and S. Song, “Signal simplification and cloud detection with an improved Douglas-Peucker algorithm for single-channel lidar,” Meteorol. Atmos. Phys. 113(1-2), 89 (2011). 20. W. Gong, F. Mao, and J. Li, “OFLID: Simple method of overlap factor calculation with laser intensity distribution for biaxial lidar,” Opt. Commun. 284(12), 2966–2971 (2011). 21. D. H. Douglas and T. K. Peucker, “Algorithms for the reduction of the number of points required to represent a digitized line or its caricature,” Int. J. Geo. Inf. and Geo. 10, 112–122 (1973). 22. E. Keogh, S. Chu, D. Hart, and M. Pazzani, “An online algorithm for segmenting time series,” in (Proceedings 2001 IEEE International Conference on Data Mining, 2001), 289–296. 23. S. Burton, R. Ferrare, M. Vaughan, A. Omar, R. Rogers, C. Hostetler, and J. Hair, “Aerosol classification from airborne HSRL and comparisons with the CALIPSO vertical feature mask,” Atmos. Meas. Tech. 6(5), 1397– 1412 (2013).

1. Introduction Detecting and quantifying the lifetime and spatial variations of aerosol particles and clouds can be very helpful in characterizing the physical properties as well as their impact on climate change [1]. Many probes have been used for determining the dynamic range and variability of aerosol particles and clouds, such as the sun-photometer and nephelometer, but lidar is a particular device that can obtain the vertical profile and scan in three dimensions thereby illustrating the complete distribution of information in the atmosphere [2]. Currently, lidar networks (e.g. the European Aerosol Research Lidar Network, and the Micro Pulse Lidar Network) and satellite-based lidars (e.g. the Cloud-Aerosol Lidar with Orthogonal Polarization, and the Geoscience Laser Altimeter System) have been widely used and studied [3–5]. Since lidar data processing is generally considered unconstrained- rather than a simple routine procedure [2], the development of practical algorithms remains a difficult task. Layer detection is essentially based on differences in the physical and optical properties of aerosol- and cloud-layers that are reflected in a raw signal, P(r), or its derivative signals such as the range-corrected signal, X(r) [i.e., P(r)r2], the atmospheric extinction coefficient or attenuated backscatter coefficients and so on [6]. The reflectance of the aerosol- and cloudlayers is generally represented as a significantly sharp increase relative to the value of clear air in the lidar signal. Generally, in the layer detection, the issues are detecting the layer base and top height, which are considered to be the beginning and ending of the sharp increase caused by a layer in the raw or derived backscatter signal, respectively [7–10]. At certain dynamic signal ranges, however, noise can cause a similar deviations which gives the false impression of a signal increase [11]. Thus, the key task of the layer detection is to distinguish genuine layer from noise in order to accurately determine the boundaries. Although many algorithms have been proposed, automatic layer detection is challenging, as discussed in detail in the following paragraphs. Image processing techniques such as the wavelet method have been employed to determine the layer base, peak, and top, respectively [12]. CALIOP algorithm documents have analyzed in detail the disadvantages of those techniques and concluded that image processing methods should be rejected [4]. The wavelet method can correct for signal noise because it is essentially a low-pass filter. However, the local modulus maxims correspond to the “fastest-changing” bins, which are not exactly the base and top of the layer. This is due to changes near the layer base and top that are usually inconspicuous and “slower” than that of the other part of the layer.

#196329 - $15.00 USD Received 23 Aug 2013; revised 18 Oct 2013; accepted 20 Oct 2013; published 30 Oct 2013 (C) 2013 OSA 4 November 2013 | Vol. 21, No. 22 | DOI:10.1364/OE.21.026876 | OPTICS EXPRESS 26877

An extinction based detection method has been proposed to deduce a layer boundary from retrieved extinction profiles [13]. However, an accurate retrieval should be done after obtaining information about the layer boundary because the lidar ratio for clear air, cloud, and aerosol is usually different. Furthermore, a clear region with high SNR (Signal-to-Noise Ratio) is needed for setting the initial boundary value in the retrieval, that may not be possible when there is an optically thick layer at lower altitude for the ground-based lidar. Many methods based on a multi-point window have been proposed. For example, Pal et al. (1992) selected a layer base based upon the slope of the raw signal, which was obtained by a least-squares fitting with a multi-point window [8]. A layer base is selected when a certain number of consecutive slopes are larger than zero. However, the slope is very sensitive to noise and strongly oscillates about zero. Many other similar algorithms have been proposed [9, 14, 15], but were found to need an initial window size setting that in turn decreased the vertical resolution and caused errors. The simple multi-scale method proposed by Mao et al. (2011) can avoid the selections of the window size. This method can successfully find both strong and weak layer signals and accurately position the layer base [16]. However, like most previous algorithms, this Mao et al. (2011) method neglects in-layer attenuation, and selects a layer top as the first range bin above the layer base with X(r) ≤ X(rb) [8, 15]. This selection approach tends to under-estimate the layer top. Threshold algorithms have been proposed based on attenuated scattering ratios, such as the CALIPSO profile scanner and similar algorithms for ground-based lidar data [4, 9, 17, 18]. These algorithms scan the vertical profile starting with “clear air returns”, and layer boundaries are located where the signal intensity exceeds the threshold array. However, a ground-based lidar signal may be fully attenuated to noise level before reaching a region of clear air when there is an optically thick, low altitude layer. Otherwise, the threshold algorithms would refine the layer top based on the slope, which is sensitive to noise. Segmenting a lidar signal into piecewise sub-signals can be a key effective solution for lidar data processing. Gong et al. (2011) was the first to use an automatic linear segment technique to represent the X(r), which representation are subsequently used in cloud detection [19]. However, the method is still problematic (e.g., the threshold setting) and yields dissatisfactory results when the SNR is low. In order to overcome the limitations in defining the base and top, subjectively setting window-size, or coarsely neglecting the in-layer attenuation in the previous methods, we present a new linear segment scheme for layer detection. The scheme is based on a more reasonable threshold array and can directly be used for X(r). False positive rejection criteria are used based on X(r) with varying thresholds. We first search for layer base and top based on a linear fitting of the segmentations. In order to enhance the robustness, we employ an “extrapolate into the layer” strategy. This will refine our estimate of layer base and top as both simulated and real data with various noise levels are designed as a stress test for the detection scheme. We find good consistency between layer-base heights retrieved by both the linear segment algorithm and the simple multi-scale method. However, the tops detected by the linear segment algorithm are much accurate than that detected by the simple multi-scale method. Our algorithm can be directly applied to uncalibrated data as no additional measurements or models are required and no window size needs to be selected. 2. Principles and methods The backscatter power P(r) received by the detector can be written as the following [2]: P (r) =

{

}

r C ⋅ G ( r ) ⋅  β m (r)+β p (r)  ⋅ exp −2  α m (r) + α p (r) dr +e ( r ) , 0 r2

(1)

where r is the range, C is lidar constant, G(r) represents the overlap factor [20], βm(r) and βp(r) are the molecule and particle backscatter coefficient respectively, αm(r) and αp(r) are the #196329 - $15.00 USD Received 23 Aug 2013; revised 18 Oct 2013; accepted 20 Oct 2013; published 30 Oct 2013 (C) 2013 OSA 4 November 2013 | Vol. 21, No. 22 | DOI:10.1364/OE.21.026876 | OPTICS EXPRESS 26878

molecule and particle extinction coefficient respectively, and the noise e(r) is considered as Gaussian. Equation (1) illustrates that the lidar signal will increase considerably when a lidar laser pulse is incident into an optically thick layer. A diagram of the layer detection scheme based on X(r) (Fig. 1) shows that the scheme comprises four key steps that can be described as: (1) Signal segmentation: The segmentation is based on X(r) with a threshold array of 6σ·r2 where σ is the standard deviation of the background noise, and can be estimated using the signal at a distant range where noise is prevalent thus no lidar return. The break-bins outputted by the segmentation can be candidates of base, peak and top of layers. (2) Primary layer base and top selection: We search for the primary layer base when the fitted slope is positive, and select the primary top when fitted slope is negative and larger than or comparable to that below the primary base. (3) False positive rejection: When considering the envelope of the noise e(r)r2 is ± 3σ·r2, all detections whose signal difference is less than 3σ·rp2 + 3σ·rb2 are considered as a false positive and subsequently removed, rp and rb are the range of the peak and base, respectively. (4) Refine the layer base and top: To enhance the robustness, we employ an “extrapolate into the layer” strategy to refine the primary layer base and top based on the linear fitting of the range-corrected signal of the two nearest “clear air” segmentations.

Fig. 1. Flowchart of the detection algorithm based on the linear segmentation.

2.1 Signal segmentation The segmentation idea has been widely used in various research fields, such as cartography [21] and data mining [22]. This is because segmenting a signal (curve) into a piecewise subsignal (sub-curve) can be a key to further processing and application. Furthermore, a good segment algorithm can give the closest approximation to a result of visual analysis. To our knowledge, the initial form of the automatic segment approach for lidar data processing was first introduced and used by Gong et al. [19]. The goal of the lidar signal segmentation algorithm is to segment the signal into subsignals, in which any backscatter contributions are considered as originating from the same

#196329 - $15.00 USD Received 23 Aug 2013; revised 18 Oct 2013; accepted 20 Oct 2013; published 30 Oct 2013 (C) 2013 OSA 4 November 2013 | Vol. 21, No. 22 | DOI:10.1364/OE.21.026876 | OPTICS EXPRESS 26879

object. We start with a range-corrected signal with n range bins X(r1), X(r2), ... and X(rn). One of the major challenges of the segmentation is being able to determine the threshold array, which shall be discussed later. If the threshold array, dthr(r), is given, we can subsequently select a bin as a break-bin according to the intensity difference between the X(r) and Xseg(r). The Xseg(r) is derived by linear interpolation with the X(r) of the beginning and ending bins. The absolute intensity difference can be written as: d ( r ) = X ( r ) – X seg ( r ) .

(2)

After finding the maximum difference at range rm, the corresponding range bin will be selected as a break-bin if d(rm) exceeds the threshold value dthr(rm). Then, recursive segmenting calculations will be carried out based on the two sub-signals segmented by this break-bin until the threshold is no longer exceeded. Determining dthr(r) is one of the biggest challenges in the segmentation method. Because the SNR of X(r) decreases with range rapidly, it is not advisable to segment based on a constant threshold array. We previously defined the error array, e(r) as the difference between the X(r) and the denoised signal Xde(r) in [19]. Then, we defined the threshold array of a segmentation as the sum of the mean plus three times the standard deviation of e(r). One disadvantage of this strategy is that the dthr(rm) is very sensitive to noise, and the signal will be over-segmented as shown in Fig. 2(a). Furthermore, the segment has to be performed on a denoised signal rather than an original one, but a signal can be distorted by denoising technology. x 10

5

noise break_bins X(r)

15

±3 σ r

10

2

5 0 -5 0

(a) 2

4 6 8 Altitude (km)

10

12

20 Range-corrected signal (a.u)

Range-corrected signal (a.u)

20

x 10

5

break_bins X(r)

15

±3 σ r

2

10 5 0 -5

(b)

0

2

4 6 8 Altitude (km)

10

12

Fig. 2. Illustration of the selection of the threshold array. (a) Segmentations based on denoised signal Xde(r) and the threshold array of the sum of the mean and three times the standard deviation of e(r), and (b) segmentations based on X(r) and the threshold array of 6σ·r2. Where the signal for segmenting is in blue, break-bins are in orange, the noise derived by denoising technology is in gray, and the envelope of the noise ( ± 3σ·r2) is in red.

Fortunately, a more sophisticated threshold array can be theoretically determined. In essence, the X(r) can be described as: X (r ) = S ( r ) r2 + e (r ) r2 ,

(3)

where S(r) is the pure raw signal. It shows that the noise level of X(r) is equivalent to e(r)r2. Theoretically, the corresponding envelope of the noise e(r)r2 is ± 3σ·r2. Furthermore, it can be shown that the curves of + 3σ·r2 and −3σ·r2 envelop the detected noise in Fig. 2(a) and the signal (primarily noise) above 4 km in Fig. 2(b), respectively. Thus, a threshold array dthr(r) can be computed as:

#196329 - $15.00 USD Received 23 Aug 2013; revised 18 Oct 2013; accepted 20 Oct 2013; published 30 Oct 2013 (C) 2013 OSA 4 November 2013 | Vol. 21, No. 22 | DOI:10.1364/OE.21.026876 | OPTICS EXPRESS 26880

d thr ( r ) = 6σ ·r 2 .

(4) 2

The segmentations based on the threshold array of 6σ·r with X(r) can be seen in Fig. 2(b). The segmentation not only retains “marked-change bins” as break-bins, but also neglects most of the bins in the “no-change range” such as all the range bins above 4 km. The segmentation can avoid over-segmentation, which is beneficial for refining the layer boundary and removing any over-detections. Based on the above discussion we revise “r1, r2, ..., rn” to “1:n” for convenience and outline the pseudo-code of the recursive algorithm as follows: Function [break_bins] = linear_segment [X(1:n), σ] // obtain Xseg(1:n) by interpolating Xseg(1) and Xseg(n) linearly: Xseg(1:n) = linear_interpolate[X(1), X(n)]; // obtain the absolute difference between X and Xseg. d(1:n) = |X(1:n) -Xseg(1:n) |; // find dm and its corresponding range bin. [dm, dm _bin] = max[d(1:n)]; // obtain threshold array dthr(1:n) = 6·σ·r2(1:n); If dm > dthr (dm _bin) // segment the signal X(1:n) with given threshold. // recursively segment the left part if necessary. [break_bins1] = linear_segment [X(1: dm_bin), σ]; // recursively segment the right part if necessary. [break_bins2] = linear_segment [X(dm_bin + 1:end), σ]; // combine the break bins of the left and right parts. break_bins = [break_bins1(1:end-1), break_bins2(1:end)]; Else // the break bins are the first and end bins of X(1:n) if dm < dthr(dm _bin) break_bins = [X(1), X(n)]; End if End Function

Furthermore, we derive a fitted signal Xfit,k(r) = afit,k·r + bfit,k, where afit,k and bfit,k are derived from all the X(r) in the k-th sub-signal (i.e., the signal between k and k + 1 break-bins) by the least-squares method. The fitted slope afit(r) will be used for the detection of the layer base and top and is further discussed in the next section. 2.2 Selection primary layer base and top Generally, the layer base height is considered to be the beginning of the sharp increase in the direct or derived backscatter signal [7–10]. Detection of the cloud base can be difficult at times due to water uptake effects, haze, or even precipitation below the cloud base [10, 19]. However, our definition of the layer base is consistent with the previous studies. In essence, we select a layer base whenever afit(r) becomes larger than zero. A second issue of layer detection is selecting the layer top. We initially estimate the layer top as the height of the first bin where X(r) ≤ X(rb) above the base in the same manner as other studies [8]. To remedy the effects of in-layer attenuation, we further estimate the layer top as where the fitting slope is negative as well as greater than or comparable to that of the nearest clear-sky signal at a lower altitude. For an un-penetrated layer, the signal reduces to the noise level rapidly, and we

#196329 - $15.00 USD Received 23 Aug 2013; revised 18 Oct 2013; accepted 20 Oct 2013; published 30 Oct 2013 (C) 2013 OSA 4 November 2013 | Vol. 21, No. 22 | DOI:10.1364/OE.21.026876 | OPTICS EXPRESS 26881

further estimate the layer top height where we consider no signal but only noise. In this situation, the layer top is underestimated, and it is called the “effective top”. 2.3 False positive rejection Because P(r) has an “inverse square law” (1/r2) effect, X(r) is used for rejecting false positive. As mentioned above, the noise level of X(r) is equal to e(r)r2, and is therefore not recommended to use a constant threshold in order to reject the false positive. Thus, the rejection criterion is based upon varying threshold in X(r). By defining Xs(rp) and Xs(rb) as pure range-corrected signals of the peak and base, respectively, the signal difference between layer peak and base can be written as: Δ = [ X s ( rp ) + e ( rp ) rp 2 ] – [ X s ( rb ) + e ( rb ) rb 2 ]. 2

(5) 2

The corresponding envelope of ∆ is Xs(rp)−Xs(rb) ± (3σ·rp + 3σ·rb ). Furthermore, X(r) decrease along with range for clear air. Thus, for a false positive which is due to noise effect, the ∆ of noise variations should be no more than Δ thr = 3σ ·rp 2 + 3σ ·rb 2 .

(6)

We apply ∆thr as a threshold to separate the genuine layer from false positive which are due to noise effect. Based on the threshold, all detections where ∆ is less than ∆thr are considered as false positive and subsequently removed.

10

Xs(r) 2

Xs(r) ± 3σr

Xs(r) ± 3σr2

Xup-extrap(r)

Xup-extrap(r)

4

X(r)

X(r)

Xs(r)

(a) 3 10 3.5 3.75

4

4.25 4.5 4.75 Altitude (km)

5

5.25 5.5

10

4

(b) 3 10 3.5 3.75

4

4.25 4.5 4.75 Altitude (km)

5

5.25 5.5

Fig. 3. Illustration for investigating the detection efficiency of the linear segment algorithm. (a) the blue solid and dashed lines denote a pure range-corrected signal Xs(r) and its envelopes Xs(r)s ± 3σ·r2, respectively, orange denotes the extrapolated line of the upper envelope, (b) the same as (a) but with a optical thinner layer.

Figure 3 shows two cases that investigate the detection efficiency of the linear segment algorithm which is mainly caused by the false positive rejection strategy. Figure 3(a) shows a pure range-corrected signal, Xs(r), and we use Xup(r) and Xlow(r) to denote the upper envelope [i.e. Xs(r) + 3σ·r2] and lower envelope [i.e. Xs(r)−3σ·r2] of a noised range-corrected signal, respectively. Theoretically, according the defining of ∆thr, the algorithm will be triggered to discriminate a layer when X(r) is larger than Xup(r). The algorithm is able to successfully detect the layer as in the case in Fig. 3(a), because the Xlow(rp) is larger than the Xup-extrap(rp), which is the extrapolation of the Xup(r) of the clear air lower than 4 km. However, the algorithm may fail to detect the layer for the case in Fig. 3(b), because the Xlow(rp) is less than Xup-extrap(rp). In this situation, the possibility of the layer to be detected is equal to the sum of the possibilities Xlayer(r) of every range bin larger than Xup-extrap(r). Thus, the larger of the geometric thickness of the layer, the larger of the possibility of the layer to be detected. Because the pattern of a layer is variable in the real world, it is difficult to suggest a constant

#196329 - $15.00 USD Received 23 Aug 2013; revised 18 Oct 2013; accepted 20 Oct 2013; published 30 Oct 2013 (C) 2013 OSA 4 November 2013 | Vol. 21, No. 22 | DOI:10.1364/OE.21.026876 | OPTICS EXPRESS 26882

possibility to be a reference for any given detection. Under the condition of Fig. 3(a), once a layer signal with random noise is given, a layer base will be detected in the range of 4-4.25 km. However, a layer base can likely be detected anywhere in the range of 4-5 km for a signal under the condition of Fig. 3(a). Considering, an penetrable optically thick layer or geometrically thick layer will be easier to detect than a thin layer. We will only discuss the requirement for detecting optical and geometric thin layers as in Fig. 3(b) in the following. Considering G(r) as a unit as well as assuming C·βm(rb)·T2(rb) is equal to C·βm(rp)·T2(rp), the difference of X(rp) and X(rb) can be written as: Δ = C ⋅ β p (rp ) ⋅ T 2 (rp )+e ( rp ) ⋅ rp 2 − e ( rb ) ⋅ rb 2 ,

(7)

where T2(r) is the two-way transmittance at range r. The range of ∆ is C·βp(rp)·T2(rp) ± (3σ·rp2 + 3σ·rb2). Because we will judge a detection to be a layer when ∆ is larger than ∆thr, theoretically, we have no difficulty in detecting a layer where βp(rp) meet the following condition:

β p (rp ) >

6σ ( rp2 + rb2 ) C ⋅ T 2 (rp )

.

(8)

In the same manner, the accuracy of the detection of the layer base (top) depend on how fast is the βp(r) to be increase (decrease) from zero (from the value of the left part of Eq. (8)) to the value of the left part of Eq. (8) (to zero). 2.4 Refining layer base and top The aforementioned selected bases and tops are generally break-bins, which are extreme values and therefore sensitive to noise. Sometimes the layer base (top) can be underestimated (overestimated) respectively, such as the primary selected base illustrate in Fig. 4 with a simulated layer in 4-5 km. To overcome the effects of noise, we employ an “extrapolate into the layer” strategy to refine the base and top of a retained layer after false positive rejection. We extrapolate the fitted signals into the layer up to the layer peak based on the linear fitting of the range-corrected signal of the two nearest “clear air” segmentations just below and above the layer as shown in Fig. 4. We refine our estimate of the layer base (top) by examining from the peak to the lower (higher) region, until the range-corrected signal is less than the extrapolated signal. The base/top heights are then updated after several iterations to further enhance the robustness of this method. We add the signal between the former selected base/top and the new one into the new fitting, and refine the base/top until they match the former ones. The impact of the refining is illustrated by Fig. 4, which shows that if we only select a break-bin as the layer base that will contain a larger error. However, after refining, the accuracy of the determined layer base can be improved significantly.

#196329 - $15.00 USD Received 23 Aug 2013; revised 18 Oct 2013; accepted 20 Oct 2013; published 30 Oct 2013 (C) 2013 OSA 4 November 2013 | Vol. 21, No. 22 | DOI:10.1364/OE.21.026876 | OPTICS EXPRESS 26883

5

10

break bins X(r) X (r) layer

Refined Base

X

X(r)

(r)

refine

4

10

Primary Base 3

10

3

3.5

4

4.5 5 Altitude (km)

5.5

6

Fig. 4. Refining the base and top using an extrapolating strategy. Blue denotes the original signal, orange indicates the fitted and extrapolated line, and red signifies the accepted layer detections.

3. Results and discussion 3.1 Testing with simulated signals 5

10

break_bins X(r) Xfit(r)

(a)

Range-corrected signal (a.u)

Range-corrected signal (a.u)

10

Xlayer(r) 10

4

10

3

Multi-scale linear Seg

3

4

5

6

5

break_bins X(r) Xfit(r)

(b)

Xlayer(r) 10

4

10

3

Multi-scale linear Seg

3

4

Altitude (km) 5

10

break_bins X(r) Xfit(r)

(c)

Range-corrected signal (a.u)

Range-corrected signal (a.u)

10

Xlayer(r) 10

4

10

3

3

Multi-scale linear Seg

4

5 Altitude (km)

5

6

Altitude (km)

6

5

break_bins X(r) Xfit(r)

(d)

Xlayer(r) 10

4

10

3

3

Multi-scale linear Seg

4

5

6

Altitude (km)

Fig. 5. Detection of simulated layers at 4-5 km with different noise levels. The standard deviation of the errors of the signal in Fig. 5(b)-5(d) are 2, 3, and 4 times as that of Fig. 5(a), respectively. Blue denotes the original signal, orange indicates the fitted and extrapolated line, and red signifies the accepted layer detections. Black and green error-bars of the detected base and top are calculated from 100 repeated simulations of the simple multi-scale method and linear segment scheme, respectively.

#196329 - $15.00 USD Received 23 Aug 2013; revised 18 Oct 2013; accepted 20 Oct 2013; published 30 Oct 2013 (C) 2013 OSA 4 November 2013 | Vol. 21, No. 22 | DOI:10.1364/OE.21.026876 | OPTICS EXPRESS 26884

An extensive simulation study was conducted to verify that algorithm performance. Four kinds of signals were simulated, which were combined a pure signal with four different zeromean Gaussian noise arrays. The pure signal is generated based upon a standard atmosphere with a layer (i.e. Gaussian peak) with base of 4 km and top of 5 km. The optical depth and lidar ratio of the layer are 0.05 and 20 sr, respectively. The error standard deviations of the noise arrays in Figs. 5(b)-5(d) are 2, 3, and 4 times as that of Fig. 5(a), respectively. Figure 5 shows a case detection of the linear segment algorithm and the standard error-bar of both methods calculated based on 100 repeated simulations and detections. The statistics show decent accuracy of the layer detection of both methods, but the base (top) is overestimated (underestimated). This is because the enhancement of the signal intensity near the layer base and top are very weak which is easily hidden by noise. Figures 5(a)-5(d) shows that if the SNR is lower, then both the random error and bias will be larger, however the linear segment algorithm will still work well with less bias in a low SNR regime. The linear segment scheme can accurately position both the base and top owing to the selecting and refining strategies. The top underestimation of the simple multi-scale method is more than that of the linear segment algorithm, because in-layer attenuation is neglected while the top is simply selected as the first range bin where X(r) ≤ X(rb) above the layer base [8, 15]. 3.2 Testing with real signals 5

x 10

10

7

10

slope (r)

2

Break bins X(r) X (r)

Slope (a.u)

Range-corrected signal (a.u) Range-corrected signal (a.u)

8

10

fit

6

10

fit

1 0 -1

5

10

(a) 2

4 6 Altitude (km)

8

10

8

X(r) X (r)

7

10

refine

6

10

5

10

(c) 0

2

4 6 Altitude (km)

8

10

0

Range-corrected signal (a.u)

0

-2 (b) 2

4 6 Altitude (km)

8

10

8

10

8

10

X(r) X (r)

7

layer

10

6

10

5

10

(d) 0

2

4 6 Altitude (km)

Fig. 6. Detection of real signal at 21:26 on the 25 December 2008 in Wuhan. (a) The original signal is in blue, break-bins are in gray, the fitted lines are in red, (b) red indicates the slope of the fitted lines, (c) Blue denotes the original signal, red indicates the fitted and extrapolated line, (d) Blue denotes the original signal, and red signifies the accepted layer detections.

We present an actual case that will demonstrate the performance of the layer detection algorithm. The data was designed as a stress test for the boundary detection scheme, which was acquired by a 532-nm Mie lidar on the 25-26 December 2008 in Wuhan, China. First, we

#196329 - $15.00 USD Received 23 Aug 2013; revised 18 Oct 2013; accepted 20 Oct 2013; published 30 Oct 2013 (C) 2013 OSA 4 November 2013 | Vol. 21, No. 22 | DOI:10.1364/OE.21.026876 | OPTICS EXPRESS 26885

present one signal example of continuous observation for illustrating the performance of the linear segment algorithm. Note that signals in the range affected by the overlap are not processed in this study. The segments in Fig. 6(a) show that the lower the SNR, the more range bins will be contained in a given segment. Thus, we can consider that the segment is self-adapting, and tends to combine as many range bins as possible in one segment to provide a robust linear fitting, which is beneficial for refining the layer boundary and removing false positives. Figure 6(b) shows that the slope of a clear air region is close to zero. As illustrated in Fig. 6(d), there are four layers that have been detected by comparing the original signal and the extrapolated line as in Fig. 6(c). The boundaries of the layers coincide with the locations that would have been identified by a visual inspection. The algorithm failed to refine the top of the layer at 1 km. In this situation, we consider the primary selected top as the final one. Furthermore, both of the small and larger layers have been successfully retained, but all of the false positives were effectively rejected. Figure 7 shows a continuous measurement identifying high layer above 8 km, low layer below 4 km, and overlaying optically thin layers below 2 km. This is a good test to exercise all aspects of the linear segment algorithm. In this study, we consider a continuous layer as cloud layers when the mean ratio of X(rp) to X(rb) is larger than four, otherwise it is considered as an aerosol layer based on previous studies [12, 16, 19]. Note that this classification criterion should be used with caution before further investigation has been done [23]. Figure 7(a) shows the boundaries of the aerosol (gray curve) and cloud (black curve) layers as reported by the simple multi-scale method while Fig. 7(b) pertains to the linear segment algorithm. Figure 7 demonstrates that both algorithms detect the layer satisfactorily, and their results are nearly consistent with one another. Note that the result of the simple multi-scale method is slightly different from our previous result [16] because we have updated the background-noise removal and calibration methods in the new version of our lidar package. 12

12

(a)

14

8

12

6

10 8

4

6 4

2

(b)

10

16

Altitude (km)

Altitude (km)

10

18

16 14

8

12

6

10 8

4

6 4

2

2

22:00

00:00 02:00 Time (h)

04:00

06:00

18

2

22:00

00:00 02:00 Time (h)

04:00

06:00

Fig. 7. Time-altitude diagrams of detected layer boundaries with ln[X(r)] as measured by a 532-nm Mie lidar on 25-26 December 2008 in Wuhan, China. (a) Boundaries of the aerosol (gray curve) and cloud (black curve) layers detected by the simple multi-scale method, respectively, and (b) is the same as (a) but is detected by the linear segment algorithm.

Figure 8 shows a scatter-plot of the layer bases (with “o” markers in blue) and tops (with “+” markers in red) retrieved by the linear segment algorithm and the simple multi-scale method, respectively. We consider the two estimates of the two methods belong to the same layer when they share similar peaks within the same signal. A base (top) regression-line was solved from the two kind of bases (tops) as detected by the two methods, respectively. We find that the base regression-line matches the “1:1” line very well. However, the top regression-line is below the “1:1” line which is mainly due to the fact that the linear segment method considers the in-layer attenuation of an optically thick cloud at 3~4 km while the simple multi-scale method does not. As shown in Fig. 8, the tops below 2 km are in good

#196329 - $15.00 USD Received 23 Aug 2013; revised 18 Oct 2013; accepted 20 Oct 2013; published 30 Oct 2013 (C) 2013 OSA 4 November 2013 | Vol. 21, No. 22 | DOI:10.1364/OE.21.026876 | OPTICS EXPRESS 26886

4

(a)

3

B A

2

1:1 line base top base reg top reg

Base and Top of MultiScale Alg (km)

Base and Top of MultiScale Alg (km)

agreement mainly because the aerosol layer is spatially and optically thin such that any inlayer attenuation is negligible. However, the tops above 7 km are in good agreement primarily because the noise level is very low and therefore hides the effect of the in-layer attenuation. The simple multi-scale method will separate a multi-peak layer into two layers when there are bins with X(r) ≤ X(rb) between the peaks of the layer. This occurs even when the bins are at the very least affected by pure Rayleigh scattering as opposed to other stronger atmospheric effects (i.e., aerosols, hydrometeors, etc). This incorrect identification is the source of the outliers as marked in Regions A and B in Fig. 8.

1 1 2 3 4 Base and Top of Linear Segment Alg (km)

12

(b)

11 10 9 8

1:1 line base top

7 7 8 9 10 11 12 Base and Top of Linear Segment Alg (km)

Fig. 8. Correlation analysis of the two methods. (a) Scatter-plot of the base (with “o” markers in blue) and top (with “+” markers in red) retrieved by the linear segment algorithm and the simple multi-scale method in 1-4.5 km, and (b) is the same as (a) but for 7-12 km.

4. Summary A linear segmentation algorithm has been proposed to analyze lidar backscatter data sets for detecting the base and top of significant particle layers in the atmosphere. This algorithm is based on four key steps, which are signal segmentation, primary layer base and top selection, false positive rejection, and layer base and top refining. We find no significant bias in the detected of the linear segment algorithm. Small backscatter spikes due to noise are fully ignored by both methods, but the algorithms still work well when the SNR is low. Identification of layer boundary is done based on linear segmentation, and accurately refines the estimate of layer boundary by extrapolating the linear fitting from the nearest “clear air” segmentations into the layer. We find good agreement between cloud-base heights retrieved by both methods. Furthermore, the top detection is underestimated by employing the simple multi-scale method because the definition of which neglects in-layer attenuation. However, the linear segment algorithm also can position the top satisfactorily owing to the accurate segmentation and the refining strategies. In this study, only a few cases have been tested with encouraging results, therefore the potential of using the linear segment in various tests and applications of lidar data processing should be explored. Improving or even proposing new segment algorithms are needed in future studies. Furthermore, cases of cloud embedded within aerosol layers should be considered in future studies. Acknowledgments This work was supported by 973 Programs (2011CB707106), and the NSFC (41127901, 10978003).

#196329 - $15.00 USD Received 23 Aug 2013; revised 18 Oct 2013; accepted 20 Oct 2013; published 30 Oct 2013 (C) 2013 OSA 4 November 2013 | Vol. 21, No. 22 | DOI:10.1364/OE.21.026876 | OPTICS EXPRESS 26887

Linear segmentation algorithm for detecting layer boundary with lidar.

The automatic detection of aerosol- and cloud-layer boundary (base and top) is important in atmospheric lidar data processing, because the boundary in...
1MB Sizes 0 Downloads 0 Views