Background modeling for moving object detection in long-distance imaging through turbulent medium Adiel Elkabetz and Yitzhak Yitzhaky* Ben-Gurion University of the Negev, Department of Electro-Optics Engineering, Beer Sheva 84105, P.O. Box 653, Israel *Corresponding author: [email protected] Received 8 July 2013; revised 31 December 2013; accepted 15 January 2014; posted 16 January 2014 (Doc. ID 193411); published 17 February 2014

A basic step in automatic moving objects detection is often modeling the background (i.e., the scene excluding the moving objects). The background model describes the temporal intensity distribution expected at different image locations. Long-distance imaging through atmospheric turbulent medium is affected mainly by blur and spatiotemporal movements in the image, which have contradicting effects on the temporal intensity distribution, mainly at edge locations. This paper addresses this modeling problem theoretically, and experimentally, for various long-distance imaging conditions. Results show that a unimodal distribution is usually a more appropriate model. However, if image deblurring is performed, a multimodal modeling might be more appropriate. © 2014 Optical Society of America OCIS codes: (110.0115) Imaging through turbulent media; (010.1330) Atmospheric turbulence; (110.4100) Modulation transfer function; (110.2960) Image analysis. http://dx.doi.org/10.1364/AO.53.001132

1. Introduction

Detection of moving objects (usually targets such as people and vehicles) in long-distance imaging through the atmosphere is often very challenging, because of the image-distorting atmospheric effects that include mainly blur and spatiotemporal local movements of the scene as a result of the air turbulence at the imaging path [1–5]. The scene is constructed from two elements: the background, which is the scene without the moving targets, and the foreground, which includes only the moving targets of interest. Automatic detection of moving objects is typically performed using the “background subtraction” approach, in which a model of the background is subtracted from the recorded video images in order to separate the moving targets. Therefore, an important step in this detection process is to model the background image. Unlike the typical background in a short-distance imaging (mainly at indoor situations) that is static 1559-128X/14/061132-10$15.00/0 © 2014 Optical Society of America 1132

APPLIED OPTICS / Vol. 53, No. 6 / 20 February 2014

in time, the background in imaging through longdistance atmospheric path is very temporally dynamic. The spatial extent of the spatiotemporal image movements depends mainly on the length of the imaging path and on the strength of the turbulence. The background model should reflect the movements of the background but not the movements of the moving targets. The background model is usually calculated according to the temporal statistics of the pixels in the video. The statistics can be described by a probability density function at the various locations in the video images, which can be estimated by the temporal histogram. Backgrounds in dynamic scenes are usually considered to have single-mode (unimodal) [6] or multimode (usually bimodal) [7,8] histograms. A unimodal is the case where the value of a dynamic background pixel changes in time around a certain value according to some decreasing distribution that can be parametric or nonparametric [9]. A specific case of a unimodal distribution is the single-level case, which assumes that other than the real-moving objects, the rest of the image is static. The bimodal case assumes that the

dynamic background pixel distributes around two distinct levels. Choosing the correct modeling method is important for the success of the target detection (with fewer false alarms). In [10] it can be seen that implementing a multimodal modeling inappropriately can decrease the ability to detect moving objects. The temporal behavior of an image pixel depends on the movements caused by the imaging optical path (as turbulence effects), movements of the scene itself (as movements of objects caused by winds), and the local scene structure. At the same imaging conditions, one pixel’s histogram can be bimodal at a stepedge area structure, while the histogram of another pixel at a relatively flat area would be unimodal. This means that the temporal probability density function of the pixels at dynamic scenes is space-variant. In long-distance observations, the images are affected by blur caused mainly by turbulence and aerosols in the atmospheric path, and by spatiotemporal random image movements caused by the turbulence. These movements cause local geometric distortions in each frame commonly termed as image warping. These effects have contradicting influences on the number of lobes in the pixel’s histogram at nonuniform locations in the image. The spatiotemporal movements will influence toward a bimodal histogram. Due to the changes of the refractive index along the path (caused by the turbulence), an image pixel at the edge location may arbitrarily receive photos from either the dark or the bright sides of the edge in the scene, causing a bimodal histogram of its intensity. The blur on the other hand, will reduce the brightness differences in the image between the two sides of the edge, thus increasing the likeliness of a single-lobe histogram. In this paper, we analyze for the first time, the spatiotemporal behavior resulting from these contradicting effects, at different imaging distances, turbulence strengths, and electromagnetic wavelengths appropriate for imaging through the atmospheric path. Based on this analysis, we find the appropriate background modeling in the process of moving target detection. Furthermore, we examine this analysis for the case where the images are deblurred (using blind deconvolution), which changes the relative weights of the two atmospheric effects. A theoretical analysis is performed according conventional formulations that define these atmospheric phenomena. Experimental examination is shown with both simulations and real-degraded videos recorded in long-range horizontal imaging. Finally, we show experimentally the effect of video restoration (deblurring) that may have a significant influence on the background model. The effect of the deblurring on the statistics of the temporal behavior of an edge pixel was evaluated also quantitatively. The rest of the paper is constructed as follows: Section 2 briefly discusses a few methods for background modeling and demonstrates the use of a multimodal modeling in outdoor windy conditions. Section 3 presents a quantitative analysis of the turbulent background distortion. The analysis was

made in both day and night observation channels, at wavelengths in the visual (VIS), near-infrared (NIR), and mid-wavelength infrared (MWIR) spectra. The specific central wavelength used in each channel is presented in Table 1. In Section 4, we examine the turbulent background modeling in different conditions by simulating turbulence effects on a video and by examining a real turbulence-degraded and restored videos. Conclusions are in Section 5. 2. Dynamic Background Modeling

Several survey papers exist, which review background subtraction and background modeling [11,12]. This section briefly discusses some common background modeling approaches for the aim of categorizing them into two different model types of dynamic scenes: unimodal and multimodal. The background model in dynamic scenes should also be dynamic, i.e., adaptively changed in time. A. Median Filter

Modeling the background by the temporal median is one of the most commonly used techniques. In this method, a buffer of the N last frames is constructed, and the background estimate at each pixel location is the median through all the frames in the buffer. The foreground (the moving regions) is then calculated through subtraction of the background model:   1; jI t x; y − medianI n x; yj > T FGt x; y  0; else n  t − 1;

t − 2…t − N;

(1)

where x; y are the image coordinates, FGt x; y is the foreground image (the moving regions), I t x; y is the current pixel intensity value and T is a threshold value. The assumption is that each image location is background for more than half of the frames in the buffer. Since each background location has a single continuous range of allowable values, it is a unimodal background model. The method is simple to implement, but may require a large memory to store the N last frames. Few improvements can be done, such as adaptive threshold, in which the threshold is changing in different areas in the frame [13], and in a low contrast or noisy area the threshold value becomes higher. B. Nonparametric Model [9]

In this background model, probability distribution is associated with each pixel location, according to the N previous frames using a kernel density function (which might be Gaussian). The method is nonrecursive, and a buffer is needed similar to the median method: o n   1P 2 p1 exp − 1 I t x;y−I2 i x;y < T 1; N t−1 it−N 2 σ σ 2π FGt x;y  ; 0; else (2) where σ is the variance value of the distribution. 20 February 2014 / Vol. 53, No. 6 / APPLIED OPTICS

1133

This technique can be implemented either as unimodal by taking a single-mode distribution [Eq. (2)] or as multimodal by taking a multimodal distribution. C.

Gaussian Mixture Model [6–8]

In recent years, common multimodal modeling techniques have been published; one of them is the Gaussian mixture model (GMM), which enjoys high popularity since it was first proposed for background modeling. GMM maintains for each pixel a multimode density function, constructed from several Gaussians. Thus, it is capable of handling background distributions that contain more than a single main intensity level. Unlike the above methods, GMM is “parametric,” which means that only several model parameters are kept and updated for each pixel, without keeping a large buffer of video frames. The probability of observing the current pixel value is PI t x; y 

k X

ωi;t · ηI t x; y;

μi;t ;

σ i;t ;

(3)

i1

where k is the number of modes, ωi;t is an estimate of the weight of the ith Gaussian in the mixture at time t, μi;t and σ i;t are the mean and variance values of the ith Gaussian in the mixture at time t, and η is a Gaussian probability density function. The weight of the Gaussians is updated recursively and by the recursive process a dynamic background is constructed. Multimodal background modeling was found effective in various scenarios, such as the observation of cars at the intersection of roads. The advantage of using a multimodal method is more accurate modeling of the background, its major disadvantage is high complexity and learning time. An example of a frame taken from a video (Vid1 in [14]) that includes multimodal background location is shown in Fig. 1(a). In this analyzed example there is a tree swaying in the wind. A pixel value in the edge areas will distribute most of the time around two distant values corresponding to the gray values at the tree and at the “background” behind it. In this case, a bimodal background model would be appropriate. In a case of a moving object at that location, another temporary central value may occur.

The histogram in Fig. 1(b) describes the gray level distribution of a pixel located at an edge area. As can be seen, the histogram has two distant modes (a bimodal behavior), at which the use of multimodal modeling is appropriate. On the other hand, Fig. 1(c) shows a location with a unimodal background behavior. 3. Quantitative Analysis of the Contradicting Turbulence Effects on the Background Behavior

In the previous section, we demonstrated a case of windy “moving tree” as a dynamic background that is appropriate for multimodal modeling. Similarly, a turbulence image is also dynamic due to the fact that the refractive index changes randomly, and the image shakes and movements are apparent. On one hand, it appears that the use of multimodal background modeling such as the GMM is appropriate for modeling turbulence background. But on the other hand, we must consider the influence of the blurring, which comes from the pass through the atmosphere. Such a blur can reduce the effect of the movements, so modeling them by a multimodal technique might be inaccurate. Our main question is whether multimodal modeling is suitable and required for modeling turbulence background or not. In order to examine the suitable method for modeling turbulence background, in the next section we will examine theoretically the effects of the turbulent movements and the blur at the image plane. The analysis will be done using three typical observation systems at different wavelengths and different turbulence strengths. A. Turbulence-Induced Movements

The variance of the light arrival-angles to an optical system due to the turbulence can be calculated as follows [15]: 1

α2  2.914 · D−3 · C2n · L;

(4)

where α indicates standard deviation (STD) of the incident angle to the entrance pupil of the optical system, D is the optical system aperture, L is the object distance, C2n is the refractive index structure constant that expresses the strength of the turbulence effect on the optical propagation. Generally observed

Fig. 1. Bimodal background case. (a) A frame from a “moving tree” video (Vid1 in [14]) taken at a windy day. The white arrows point to the checked pixel locations, one at an edge of a tree leaf and the other at the sky. (b) A bimodal histogram, describing the edge pixel gray level over time. (c) A single modal histogram, describing the fixed background (sky) pixel gray level over time. 1134

APPLIED OPTICS / Vol. 53, No. 6 / 20 February 2014

values of C2n are in the range of 10−17 to 10−12 m−2∕3 ;. High values of C2n, 10−13 m−2∕3 ; or greater usually indicate a highly turbulent atmosphere causing strong image movements and blurring. Lower values of C2n indicate better observation conditions and 10−17 indicate very light turbulence. Therefore, we chose the following values to define three levels of turbulence strength in this analysis: 2

C2n  10−13 m−3 ;

Heavy turbulence

−23

Moderate turbulence

−23

Light turbulence:

C2n  10−15 m ; C2n  10−17 m ;

The STD of the linear displacements at the image plane can be calculated as follows:

Fig. 2. Point movement (STD) at the NIR channel as a function of the object distance at the different turbulence strengths.

h0 ≈ F · α;

Assuming a path-integrated value of C2n, the longexposure turbulence MTF can be described as follows [15]:

(5)

where F is the effective focal length. The above formula is correct for small observation angles, and in our case of long-distance imaging, we can use it. The field of view values used here, which are typical for a horizontal long-distance imaging system, can be seen in Table 1. For convenience, we divided the displacement results by the pixel size and thus we get the amplitude of the movements at pixel units. According to the above formulation, for a given optical system we can calculate the movements’ STD as a function of the observation distance. The analysis was done with the systems described in Table 1. According to Eq. (4) we calculated the STD of the displacements of a pixel at the image plane for the three turbulence levels, with respect to the imaging distance. Results for the NIR view channel are presented in Fig. 2. It can be seen from Fig. 2 that as expected, higher turbulence strengths and longer imaging paths cause larger spatiotemporal movements of object points at the image plane. B.

Atmospheric and Other Blurring Effects

In order to model the blurring effect we use the modulation transfer function (MTF), which is the absolute value of the Fourier transform of the blurring point spread function (PSF). Table 1.

Optical Parameters of the Imaging Systems Used in this Study

Wavelength Focal length Pixel size (pitch) Aperture diameter Field of view

Thermal Observation MWIR

Day ObservationNIR

Day ObservationVIS

3500 nm 850 mm 30 μ 100 mm

950 nm 1000 mm 8μ 60 mm

650 nm 1000 mm 8μ 60 mm

0.51°

0.45°

0.45°

  3 5 −13 2 3 MTFturbulence ξ  exp − · 57.53 · ξ · λ · Cn · L ; (6) 8 where ξ is the angular spatial frequency and λ is the electromagnetic wavelength. Short exposure is considered as the case in which the turbulence MTF is not sensitive to tilt, while long exposure is when the tilt effect is included in the MTF [16]. This may imply that short exposures should be significantly less than 10 ms [15]. In our case, where exposure is around 30 ms (in a standard video rate), tilt is affecting the image blur, thus our use of a long-exposure average MTF. However, the exposure time is not long enough to integrate a very high number of tilts. Therefore, while having a wider average blur as a result of tilt integrations, we also have a significant effect of image movements. An interesting point is that if the imaging were done with very short exposure times, the much smaller short exposure turbulence blur would act like an image deblurring operation implemented to the long-exposure blurred signal (as considered in Section 4.C). Besides the turbulence effect, the total MTF is influenced by the optics and the detector, which also cause blur. For the optics, diffraction MTF can be described as follows [17]: MTFdiff ξ       2 1  2 2 ξ ξ ξ − 1− ;  cos−1 π ξcutoff ξcutoff ξcutoff

(7)

where ξcutoff is the cutoff frequency, which is equal to the ratio between the aperture diameter and the multiple of focal length and the wavelength. In order to evaluate the blurring effect of the detector, its blurring PSF can be considered the pixel sensor (square) area. The corresponding MTF would be a SINC function [17]: 20 February 2014 / Vol. 53, No. 6 / APPLIED OPTICS

1135

   sinπξw  ; MTFdetctor ξ  jsincξwj  πξw 

(8)

where w is the pixel size. For calculating the total frequency response, we assume a cascade connection between all blur causes, which means that the total MTF will be a multiple of the various MTFs. Two representative examples of the MTFs’ behaviors are shown in Fig. 3. In both examples the observation distance is 5 km, where in Fig. 3(a), C2n  10−15 m−2∕3  (moderate turbulence) and in Fig. 3(b), C2n  10−14 m−2∕3  (heavier turbulence). From the above examples, it can be seen the turbulence has the most significant influence on the total MTF, as it increases above moderate levels. Once having the system’s total frequency response, we can estimate the system’s PSF. We assume that the PSF has a real part only, and it can be estimated from the MTF using the inverse Fourier transform. In order to determine the point blur size we defined the blur size as the width of the PSF until the point where it decreases by 90% from its maximum. According to the above description, we calculated the blur spot size for the various analyzed systems, as a function of the observation distance. Similarly to the movements calculation, we converted the blur size units to pixels. Figure 4 shows an example showing the blur spot size as a function of distance for three levels of turbulence strengths (values of C2n ) at the VIS channel. Similarly to the movements’ extents analysis, it can be seen from Fig. 4 that as expected, the blur size also becomes larger as the turbulence strength is higher and as the viewing distance increases. C.

Comparison between Movements and Blurring Extents

As stated earlier, the statistical behavior of an edge image point in long-distance imaging depends on the contradicting effects of the blur and the movements. Large movements with small blur will cause a bimodal probability density of that point, while at large blur with small movements the probability density will tend to be unimodal.

Figure 5 presents the results of comparison between the extents of the blur PSFs and the STDs of the movements (refers to the NIR channel), as functions of the imaging distance, for different turbulence strengths. The purpose of this comparison is to show generally the relative influence of these contradicting effects with regard to the different imaging conditions. From the graphs of Fig. 5 we can conclude that • The movements tend to be in the same order of magnitude as the blurring. This means that we cannot assume a bimodal distribution of a background edge pixel. If only the movements were dominant, two distinct modes would be obtained; however, since the blur is at the same order of magnitude as the movement, it would likely merge the two modes. • At light turbulence (almost without turbulence), the major factor of the blurring is the optics and the detector, which stay constant for all distances. • At heavy turbulence, the less degraded channel is the MWIR channel, which has the longest wavelength of all three channels. • It can be seen, as expected, that a lowerwavelength imaging system (the MWIR here) has a lower atmospheric blur. Therefore, this channel will have a higher tendency for a bimodal distribution of a background edge pixel. 4. Examination of the Turbulent Background at Different Conditions

In this section, the temporal behavior of the background at an edge location will be examined through both simulated turbulence (Section 4.A) and real turbulence (Section 4.B). A. Simulated Turbulence-Degraded Video

The purpose of the simulation is to produce controlled levels of blur and movements on a real video in order to analyze its effect on background modeling. Turbulence simulation will be carried out by adding blur and movement effects as presented in the previous section.

Fig. 3. MTF analysis result at the NIR channel includes diffraction, detector, and turbulence effects for 5 km observation distance. (a) Moderate turbulence C2n  10−15 m−2∕3 . (b) Heavy-moderate turbulence C2n  10−14 m−2∕3 . 1136

APPLIED OPTICS / Vol. 53, No. 6 / 20 February 2014

PSF (as described in Section 3.B) and the nondistorted image. Figures 6(a)–6(c) show respectively, the basic (nondistorted) frame, a simulated frame according to the calculated PSF for moderate turbulence at 5 km distance, and a simulated frame for heavy turbulence at 15 km distance. Both PSF calculations were for the NIR channel. It can be seen from Fig. 6(b) that the blur effect is observed mainly at the sharp areas (such as edges), but its effect is moderate because the turbulence is not heavy and the distance is not very far. Figure 6(c) shows a significant blur effect, due to the stringent conditions. Fig. 4. Blur spot size at the VIS channel as a function of the object distance and the turbulence strength.

The basic frame on which we applied the turbulence effects, was recorded with a thermal camera from about 2 km distance, and was digitally deblurred [18]. This frame was considered to be a nondistorted frame. 1. Turbulent Blurred Image Simulation Assuming that the atmospheric main blur effect is linear and space-invariant, the recorded blurred image is a convolution between the image of a clear scene and the overall PSF. We assume that the turbulence blur is dominant, ignoring blur effects that may be caused by aerosols in the atmosphere in certain conditions. We simulated the turbulence-blurred image by a convolution between the overall blurring

2. Turbulent Movements Image Simulation The turbulence-induced image movements are random, with STD according to Eqs. (4) and (5). These movements are not per-pixel, but have spatial correlation which depends on the imaging conditions [19]. In order to simulate reliable turbulence-caused background movements we used a real turbulent video (Vid2 in [14]) from which we calculated the turbulent movement behavior. Each frame was partitioned into blocks (sized 32 × 32 pixels to approximate a reasonable isoplanatic patch), and a motion vector was assigned to each block, where the motion vectors were calculated from the real video using block matching estimation [20]. Integrating the movements with the blurring describes properly the main turbulence effect on a video. In Fig. 7(a), one frame from a simulated blurred and movement-distorted movie is presented [the movie itself can be seen in Vid3 in [14], a location

Fig. 5. Comparison between the movements and the blurring at the different observation channels for different turbulence strengths.

Fig. 6. (a) Undistorted frame used for simulation. (b) The simulated blurred frame for 5 km distance at a moderate turbulence. (c) The simulated blurred frame for 15 km distance at a heavy turbulence. 20 February 2014 / Vol. 53, No. 6 / APPLIED OPTICS

1137

Fig. 7. (a) One frame from a simulated blurred and movement-distorted movie (moderate turbulence) (Vid3 in [14]). (b) The histogram of an edge pixel located at coordinates (459, 394) and marked by a red circle around it.

of a pixel at the edge of a step in the image is marked by a surrounding red circle. The histogram of the edge pixel is shown in Fig. 7(b). It can be seen clearly from Fig. 7(b) that the histogram can be considered as a single mode, as concluded in Section 3.C. In this case, a bimodal modeling will not be required. Figure 8(a) presents one frame from the simulated movie in which the spatiotemporal local turbulencerecorded movements were applied only (no blurring). The movie itself can be seen in Vid4 in [14]. Figure 8(b) presents the histogram of the same edge pixel examined in Fig. 7. It can be seen that the histogram distribution after adding only the movements’ effect is clearly bimodal. The peaks of the modes in the histograms are median values of the two gray levels at the two sides of the analyzed step-edge pixel location, and the clustering of pixels around them result since the step-edge is not perfect and the existence of noisiness. Despite a small bimodality that may still exist in the blurred image case (Fig. 7), the histogram is considered as unimodal in the application of moving object detection, because the two modes in the histogram are too close, such that no gray level between the peaks can be associated with a moving object and not with the background. In order to evaluate the bimodality property quantitatively, we used the kurtosis, defined as the ratio between the fourth central moment and the square of the second central moment [18,21]: Efgl − μ4 g ; Efgl − μ2 g2

B. Examination of Real Turbulence-Degraded Video

For the examination of the background behavior with real turbulence-degraded signals, we used video sequences recorded through distances of 1, 3, 4.5, 10, and 15 km. Figures 9(a) and 9(c) present frames from real-degraded videos at the NIR channel (Vid5 and Vid6 in [14]). In Fig. 9(a), the imaging distance to the middle of the scene is about 4.5 km at light-moderate turbulence. The video contains moving objects (people) and a static object (a car) over a vegetation background. Figure 9(b) shows the histogram of an edge pixel (an edge of the car). In the example of Fig. 9(c) the distance is about 15 km to the middle of the scene, at moderate turbulence. An edge pixel histogram analysis of this video can be seen in Fig. 9(d). The results in Figs. 9(b) and 9(d) agree with the analysis and the simulation presented in the previous sections, that the turbulence overall effects do

9

where gl is the gray level in the histogram, μ is the mean of the gl values, and E· represents the expected value operation. The lower the kurtosis the more pronounced the bimodality is [22]. Another important parameter is the distance between the histogram modes. Since the kurtosis is not sensitive to the distance between the modes, we also took into account the variance of the histogram as a measure of this distance. Therefore, we define a bimodality criterion to be the histogram variance divided by 1138

the kurtosis. As long as the bimodality criterion is higher, the histogram is considered to have higher bimodality. The bimodality criterion in Fig. 7(b) is 9.01, while in Fig. 8(b) without the blurring effect it is 12.68. This means that if the blur in a turbulence-affected video becomes nonsignificant, multimodal modeling (such as the GMM) can be appropriate.

APPLIED OPTICS / Vol. 53, No. 6 / 20 February 2014

Fig. 8. (a) One frame from a simulated “shaken” not blurred (restored) movie with detected motion vectors obtained from a real turbulent video using 32 × 32 pixels block size (Vid4 in [14]). (b) The histogram of the same edge pixel as in Fig. 7 (marked by a red circle around it).

Fig. 9. Real-degraded video frames and their edge pixel histograms. (a) A video frame at the NIR channel in a distance of about 4.5 km at light-moderate turbulence (Vid5 in [14]). The edge area is shown within a red circle. (b) The corresponding histogram of a pixel located at the edge of the car. (c) A video frame at the NIR channel in a distance of about 15 km at moderate turbulence (Vid6 in [14]). (d) The corresponding histogram of an edge pixel.

Fig. 10. Real-degraded and deblurred video frames and their edge pixel histograms. (a) A frame from a real distorted video (Vid7 in [14]). The checked edge area is shown within a red circle. (b) Histogram of an edge pixel from the degraded video. (c) A frame from the deblurred video (Vid8 in [14]). The edge area is shown within a red circle. (d) The corresponding histogram of an edge pixel. 20 February 2014 / Vol. 53, No. 6 / APPLIED OPTICS

1139

not produce a bimodal histogram edge distribution. This behavior was confirmed in all the tested videos. It can be seen that the histogram in Fig. 9(b) has a slight tendency for bimodality; however, for moving object detection purposes (using the GMM method for instance), such distribution should be considered as unimodal because no clear separation exists between the modes. C. Examination of the Effect of Video Restoration (Deblurring)

One of the conclusions from the above analysis was that the turbulence movements cause a bimodal distribution of the edge histogram, while the blurring effect does the opposite, toward a unimodal histogram. According to this principle, if a real turbulencedegraded video is deblurred, so that the blurring effect will be strongly decreased, the edge histogram may become bimodal (depending on the success of the deblurring process and the movement extents). Bimodality behavior at step-edge locations depends also on the sharpness of the edge and whether the two sides of the edge are uniform. An example of frames from real-degraded and deblurred videos (Vid7 and Vid8 in [14]) is shown in Figs. 10(a) and 10(c), respectively. Figures 10(b) and 10(d) present the histograms of an edge pixel from 10(a) and 10(c), respectively. Comparing Figs. 10(b) and 10(d) shows the effect of deblurring, which separates the modes of the histogram away from each other toward a clearer bimodal distribution. Comparing the bimodality criterion for both histograms reveals that without deblurring [Fig. 10(b)] the bimodality criterion is 14.20, while after deblurring it becomes 24.45. The increase in the bimodality criterion as a result of image deblurring was found in all the examined videos. It can be deduced that a video taken at very short exposures may likely result in a bimodel histogram when significant turbulent movements occur, because the short-exposure turbulence-caused blur is much smaller than the blur in long exposures that is affected by integration of radiation tilts. 5. Conclusions

This paper considered the problem of background modeling (i.e., temporal statistical scene behavior), in long-range imaging through turbulent medium, mainly for the purpose of separating it from moving targets. The question regarding the statistical scene behavior is not trivial because of the contradicting influences of the main turbulence effects on an image, which are blur and spatiotemporal movements. While the movements may likely cause a bimodal temporal intensity distribution at edge locations, the blur will have an opposite effect, i.e., changing the distribution toward unimodality, Theoretical analysis was performed, and examined experimentally using both turbulence simulations and real turbulencedegraded videos. The analysis was carried out for various practical long-distance conditions of imaging 1140

APPLIED OPTICS / Vol. 53, No. 6 / 20 February 2014

through atmospheric turbulent medium. The analysis took into consideration calculations of blurring and movement effects for different types of imaging systems at different electromagnetic wavelengths [i.e., visual, NIR, and MWIR (thermal)]. We have found that for the purpose of moving object detection, a unimodal distribution of an edge image point is a more appropriate model for the various imaging conditions, which means that blur effect is also dominant for the modeling in conventional video rates. We additionally showed that if successful image deblurring is performed, a multimodal modeling might be appropriate. In a future work, an automatic selection background modeling method can be obtained by similarly analyzing edge points’ histograms. The edge area can be automatically extracted according to [17]. The conclusion of a unimodal background modeling in turbulence-affected conditions is important for the purpose of background subtraction. The use of multimodal background modeling, such as the GMM, would be less effective and too computationally complex. However, in practical moving target detection under significant turbulence effects and noisiness, additional operations may be required, even when using the correct background model. For example, mainly in stronger turbulence effects, we tend to reduce the threshold of the background subtraction in order not to miss too low contrast targets, at the expense of false detections. (A lower threshold means narrowing the single mode in the background histogram model so that less pixel levels are considered background.) Then, additional false alarm filtering can be performed based on consistency properties of the detected segments [2,3,23]. References 1. O. Haik, D. Nahmani, Y. Lior, and Y. Yitzhaky, “Effects of image restoration on acquisition of moving objects from thermal video sequences degraded by the atmosphere,” Opt. Eng. 45, 117006 (2006). 2. O. Haik and Y. Yitzhaky, “Effects of image restoration on automatic acquisition of moving objects in thermal video sequences degraded by the atmosphere,” Appl. Opt. 46, 8562–8572 (2007). 3. E. Chen, O. Haik, and Y. Yitzhaky, “Classification of thermal moving objects in atmospherically-degraded video,” Opt. Eng. 51, 1–14 (2012). 4. O. Oreifej, X. Li, and M. Shah, “Simultaneous video stabilization and moving object detection in turbulence,” IEEE Trans. Pattern Anal. Mach. Intell. 35, 450–462 (2013). 5. B. Fishbain, L. Yaroslavsky, and I. Ideses, “Real-time stabilization of long range observation system turbulent video,” J. Real-Time Image Proc. 2, 11–22 (2007). 6. S. Cheung and C. Kamath, “Robust techniques for background subtraction in urban traffic video,” Proc. SPIE 5308, 881–892 (2004). 7. C. Stauffer and W. E. L. Grimson, “Learning patterns of activity using real-time tracking,” IEEE Trans. Pattern Anal. Mach. Intell. 22, 747–757 (2000). 8. N. Friedman and S. Russell, “Image segmentation in video sequences: a probabilistic approach,” 13th Conference on Uncertainty in Artificial Intelligence (UAI) (1997). 9. A. Elgammal, D. Hanvood, and L. S. Davis, “Non parametric model for background subtraction,” ECCV 2000, 751–767 (2000).

10. B. Chan, V. Mahadevan, and N. Vasconcelos, “Generalized Stauffer–Grimson background subtraction for dynamic scenes,” Mach. Vis. Appl. 22, 751–766 (2011). 11. Y. Elhabian, M. El-Sayed, and H. Ahmed, “Moving object detection in spatial domain using background removal techniques - state-of-art,” in Recent Patents on Computer Science (Bentham Science, 2008), pp. 32–54. 12. Y. Benezeth, P. M. Jodoin, B. Emile, H. Laurent, and C. Rosenberger, “Comparative study of background subtraction algorithms,” J. Electron. Imaging 19, 033003 (2010). 13. Q. Zhou and J. K. Aggarwal, “Object tracking in an outdoor environment using fusion of features and cameras,” Image Vis. Comput. 24, 1244–1255 (2006). 14. Y. Yitzhaky’s website, video examples (last accessed 30/Dec/ 2013), http://www.ee.bgu.ac.il/~itzik/TurbImagEffects/. 15. N. S. Kopeika, A System Engineering Approach to Imaging, 2nd ed. (SPIE, 1998), pp. 458–475. 16. D. L. Fried, “Optical resolution through a randomly inhomogeneous medium for very long and very short exposures,” J. Opt. Soc. Am. 23, 52–61 (1966).

17. G. D. Boreman, Modulation Transfer Function in Optical and Electro-Optical Systems (SPIE, 2001), pp. 20–25, 31–35. 18. O. Shacham, O. Haik, and Y. Yitzhaky, “Blind restoration of atmospherically degraded images by automatic best step-edge detection,” Pattern Recogn. Lett. 28, 2094–2103 (2007). 19. S. Zamek and Y. Yitzhaky, “Turbulence strength estimation from an arbitrary set of atmospherically degraded images,” J. Opt. Soc. Am. A 23, 3106–3113 (2006). 20. A. Barjatya, “Block matching algorithms for motion estimation,” Tech. Rep. (Utah State University, 2004). 21. E. H. Barney-Smith, “PSF estimation by gradient descent fit to the ESF,” Proc. SPIE 6059, 60590E (2006). 22. A. R. Palmer and C. Strobeck, “Fluctuating asymmetry analyses revisited,” in Developmental Instability: Causes and Consequences, M. Polak, ed. (Oxford University, 2003), pp. 279–319. 23. Y. Yitzhaky, E. Chen, and O. Haik, “Surveillance in longdistance turbulence-degraded videos,” Proc. SPIE 8897, 889704 (2013).

20 February 2014 / Vol. 53, No. 6 / APPLIED OPTICS

1141

Background modeling for moving object detection in long-distance imaging through turbulent medium.

A basic step in automatic moving objects detection is often modeling the background (i.e., the scene excluding the moving objects). The background mod...
933KB Sizes 0 Downloads 3 Views