Robust method for infrared small-target detection based on Boolean map visual theory.

Robust method for infrared small-target detection based on Boolean map visual theory Shengxiang Qi, Delie Ming,* Jie Ma, Xiao Sun, and Jinwen Tian National Key Laboratory of Science & Technology on Multi-spectral Information Processing, School of Automation, Huazhong University of Science and Technology, Wuhan 430074, China *Corresponding author: [email protected] Received 21 February 2014; revised 21 April 2014; accepted 7 May 2014; posted 9 May 2014 (Doc. ID 206609); published 16 June 2014

In this paper, we present an infrared small target detection method based on Boolean map visual theory. The scheme is inspired by the phenomenon that small targets can often attract human attention due to two characteristics: brightness and Gaussian-like shape in the local context area. Motivated by this observation, we perform the task under a visual attention framework with Boolean map theory, which reveals that an observer’s visual awareness corresponds to one Boolean map via a selected feature at any given instant. Formally, the infrared image is separated into two feature channels, including a color channel with the original gray intensity map and an orientation channel with the orientation texture maps produced by a designed second order directional derivative filter. For each feature map, Boolean maps delineating targets are computed from hierarchical segmentations. Small targets are then extracted from the target enhanced map, which is obtained by fusing the weighted Boolean maps of the two channels. In experiments, a set of real infrared images covering typical backgrounds with sky, sea, and ground clutters are tested to verify the effectiveness of our method. The results demonstrate that it outperforms the state-of-the-art methods with good performance. © 2014 Optical Society of America OCIS codes: (040.1880) Detection; (040.2480) FLIR, forward-looking infrared; (040.3060) Infrared; (100.2000) Digital image processing; (040.2235) Far infrared or terahertz; (100.4999) Pattern recognition, target tracking. http://dx.doi.org/10.1364/AO.53.003929

1. Introduction

Infrared small target detection is one of the most crucial techniques in automatic target recognition (ATR) [1] and holds the key for automatic target detection (ATD) [2]. Benefiting from its valuable outputs, small target detection is widely applied in military reconnaissance, early warning, terminal guidance, etc. [3]. Because of their limited imaging sizes, small targets are often disturbed by heavy noises and nonstationary clutters [4]. Although large amounts of research in this field have been carried out for decades, it is still admittedly a challenging and difficult task nowadays due to the unpredictable complexity of 1559-128X/14/183929-12$15.00/0 © 2014 Optical Society of America

backgrounds. Existing approaches solve the problem from perspectives of spatial space [5], frequency domain [3], and mathematical morphology [6]; however, many of them just concern specific scenes like the sky [7] or sea-sky [8]. Until now, robust techniques adapting for various circumstances are still under exploration. The traditional manner for detecting small targets contains two steps: the foregoing target enhancement and the subsequent target extraction [3]. In the former step, the target is highlighted while the background is suppressed to obtain a target enhanced map with high signal-to-clutter ratio (SCR). In the latter step, the target is extracted from the target enhanced map by global threshold segmentation [9] or local threshold segmentation [10]. Targets with high SCR in the enhanced map intuitively offer an 20 June 2014 / Vol. 53, No. 18 / APPLIED OPTICS

3929

aggressive assistance for the final extraction, by which the detection will be more feasible than straightforward acting on the original image. Therefore, most existing methods elaborate their work on how to “pop out” the target and “neglect” background regions as much as possible [9,11]. Following the two-step framework, in this paper, we make our major contributions to target enhancement based on Boolean map visual theory [12]. Different from traditional ones, the proposed method incorporates visual attention mechanism of humans in identifying targets, so it would be more effective and robust. As for the task, existing techniques could be roughly classified into three categories: the spatial analysis, frequency analysis, and morphological analysis. For spatial analysis, two manners, the direct way and indirect way, are frequently adopted to enhance the small target. By the direct way, the target is imagined as a region with local extremum, and detection is performed to find out extreme points from the image. Wang et al. [5] adopt a cubic facet modelbased [13] extremum filter for detection. Wang et al. [11] construct the extremum filter by the least squares support vector machine (LS-SVM) [14]. Kim [15] presents a spatial filter constructed by decomposing the original 2D LoG filter. Deng et al. [10] perform the task via a self-information map under the guidance of information theory. By the indirect way, the problem is formulated as computing the residual map between the original image and its predicted background image to highlight targets. One of the early attempts is a technique called max-mean and max-median filters proposed by Deshpande et al. [16]. Bae [17] also provides a bilateral filter made toward predicting backgrounds. Gu et al. [9] work to predict and eliminate background clutters by using a kernel-based nonparametric regression model. Intrinsically, with background prediction, it is hard to discriminate target area and background region perfectly due to the unknown target positions. The frequency analysis methods conduct small target detection as the energy of backgrounds is mainly concentrated in components with low frequency while small targets possess middle or high frequency. Based on this variance, Yang et al. [8] provide an adaptive Butterworth highpass filter to highlight targets. Qi et al. [3] argue that the phase spectrum of Fourier transform that has a desired property of standing out salient signal could be helpful in small target enhancement. The mathematical morphology has also been widely used in small target detection for decades. Tom et al. [18] present a milestone morphologic operator, namely the well-known top-hat, to enhance targets. In order to improve the performance, lots of modified and variant algorithms derived from mathematical morphology such as the new top-hat transform [19], toggle contrast operator [20], multiscale center-surround top-hat transform [21], hit-or-miss transform [22], and adaptive morphological top-hat 3930

APPLIED OPTICS / Vol. 53, No. 18 / 20 June 2014

transform [6] are further put forward. Despite these methods being able to effectively ameliorate the quality for detection, the shortage of this technique is that the results are too sensitive to the given structural element that should be consistent with the shape and size of the target [23]. Recently, with the development of study on visual attention mechanisms [24], visual saliency with its applications in object detection [25], object recognition [26], etc., has attracted lots of research interest in image processing and computer vision fields. A visual attention model simulates the brain and vision system to identify salient regions under the guidance of real biological visual system. Accordingly, it obtains better results compared to traditional methods for small target detection [1,4,27,28]. In this paper, inspired by the phenomenon that small targets can often attract human attention [4], we propose a new infrared small target detection method based on Boolean map visual theory [12]. The theory reveals that at any given instant, an observer’s visual awareness just corresponds to only one Boolean map via a selected feature channel. Usually, the optics point spread function of thermal imaging system and long distance imagery would render the small target to Gaussian-like shape [29,30]. Such a status provides two visually salient characteristics for distinguishing small targets from backgrounds: • The small target is brighter than its surroundings in local context area. • The small target is approximately isotropous due to the Gaussian-like shape. These two characteristics properly correspond to the color feature and orientation feature, which are proved to be the most available features in Boolean map visual mechanism [31]. According to this, we detect a small target by exploiting the intensity and shape under a visual attention framework with Boolean map theory. First, we separate the infrared image into a color channel and an orientation channel. For the color channel, the original gray intensity map is used. For the orientation channel, four second order directional derivative (SODD) maps at four directions are computed by a new designed SODD filter that is based on a facet model [13]. The SODD filter can suppress noises and distinguish targets and backgrounds into different textures. Second, for each channel, numerous Boolean maps are computed from hierarchical segmentations by uniformly spaced thresholding. Third, the target enhanced map is calculated by fusing the weighted Boolean maps of the two feature channels. Finally, targets are extracted by a energy ratio-based segmentation. In essence, the combination of color and orientation adequately leverages the complementarity of multiple features. Because of this multi-feature consideration, the proposed method improves the detection accuracy, robustness, and adaptivity for various scenes.

The remainder of this paper is organized as follows: Section 2 reviews the Boolean map theory of visual attention. The construction of the designed SODD filter and the computation of the Boolean map are detailedly introduced in Section 3. Section 4 describes the creation of target enhanced map and target extraction. The effectiveness of the proposed method is verified by experiments in Section 5. Conclusions are drawn in Section 6. 2. Boolean Map Theory of Visual Attention

Boolean map theory of visual attention [12] reveals a basic mechanism for an observer to select and access objects from a scene at a moment. It depicts how one’s attention to noticed objects comes from given features. Motivated by this study, the small target could be detected by its distinctive characters under the guidance of Boolean map theory. In this section, we first introduce the concept of the Boolean map and then briefly describe the Boolean map visual theory. A.

Boolean Map

A Boolean map [12] is a spatial representation that partitions a visual scene into two complementary regions: the region that is selected and region that is not selected. In a Boolean map, if a region is selected, then other regions would be missing from the selected region. That is, the Boolean map divides the visual scene into only two binary levels, which as a whole may have only a single featural label per dimension, and this featural label must provide an overall featural description of the entire region. An observer can draw a Boolean map by specifying a single value from an individual feature to select all objects that contain the feature value [32]. B.

Boolean Map Visual Theory

The Boolean map theory of visual attention [12] indicates that when an observer gives saccades to a scene, he/she can voluntarily select what to access by choosing one feature value in one dimension or combining the output of the former with a preexisting Boolean map. Furthermore, only one feature value per dimension can be accessed into the observer’s visual contents at one moment. In other words, an observer’s visual awareness corresponds to one and only one Boolean map at any given instant. This momentary conscious apprehension provides access to both the shape of the Boolean map and the identity of associated feature labels. Briefly, such a case can be imagined in which the Boolean map provides two significant cues for attention: the functional feature (the dimension from which the current Boolean map is acquired) and the attentive locations (the binary signs labeled by the distinction of the feature value). Actually, the Boolean map theory of visual attention reveals a real and fundamental aspect of human vision, and it also provides an important cue for visual saliency using Boolean maps.

Fig. 1. Explanation for Boolean map visual theory in feature channels of (a) color and (b) orientation.

Figure 1 gives an explanation for Boolean map visual theory in feature channels of color and orientation. For example, given an original color image shown in the left of Fig. 1(a), if an observer chose to look for red objects, by which the feature label was selected as the “red” value, then the resulting Boolean map would be created in the form of a binary-labeled map as the middle of Fig. 1(a). It indicates the spatial locations in one’s attention (the “white” regions) and the omitted areas outside one’s attention (the “black” regions) at this moment. Likewise, if the green object was searched, namely the “green” value was selected, then the Boolean map would be drawn to identify regions only containing green elements shown as the right of Fig. 1(a). In Fig. 1(b), two Boolean maps created by horizontality and verticality in the orientation feature channel are also illustrated. Furthermore, a Boolean map can be produced by applying the set operators union or intersection on more than one existing maps [32] in practice. For instance, a Boolean map characterizing verticality and red meanwhile could be created by intersecting a vertical Boolean map with a red Boolean map. Accordingly, Boolean map visual theory is concisely understandable and compatibly available for detecting the target with evident features. In this paper, we use it for small target detection since the small target has characteristics of local brightness and Gaussian-like shape (described in Section 1), which correspond to the color feature and orientation feature respectively. 3. Mathematical Methodology

To detect the small target, Boolean maps are computed from both the color and orientation channels. In this procedure, the gray intensity map and SODD maps are used as feature maps of color channel and orientation channel, respectively. The construction of SODD filter and computation of Boolean map is detailedly described in this section. 20 June 2014 / Vol. 53, No. 18 / APPLIED OPTICS

3931

A.

SODD Filter Construction

The SODD filter is able to detect ramps and isolated uplift of signal. Operated by this filter, clutters could be transformed into fixed-orientational strip-like textures, and small targets could be transformed into Gaussian-like spots with high values relative to the clutter textures [3]. Put another way, clutters are sensitive to the direction of SODD filter while small targets are insensitive (from Gaussian-like shape to Gaussian-like shape). Benefiting from this attribute, SODD maps are represented for the orientation feature channel in our work by its capability of distinguishing background clutters and targets. Considering that the traditional second-order difference or Laplacian operator is susceptible to noises and lack of smoothness, we propose to adopt the facet model [13] to construct the SODD filter. The facet model can fit the underlying intensity surface of image patches so it has better robustness in overcoming noises. Detailed mathematical description is stated by the following. Assuming that R × C represents the 5 × 5 neighboring window for a centered pixel in which R f −2 −10 12 g and C f −2 −10 12 g, then the gray intensity value of the pixel expressed by bivariate cubic function can be calculated as follows: f r; c

10 X

K i · Pi r; c;

(1)

where Pi r; c f1; r; c; r2 − 2; rc; c2 − 2; r3 − 17∕5r; r2 − 2c; rc2 − 2; c3 − 17∕5cg is the set of discrete orthogonal polynomials with r; c ∈ R × C; K i are coefficients. Given expression (1), the second-order partial derivative along the row and the column at the center pixel (0, 0) can be obtained:

(2)

Since the values K i, i 4, 5, 6, are related to different pixels x; y of the image, we further reify these coefficients as K i x; y, i 4, 5, 6, to avoid confusion. the As suggested in [13], P P coefficients can be x; y determined as K i r c Ix r; y cPi r;c∕ P P 2 r c Pi r; c using least-squares surface hitting and orthogonal property of polynomials in which Ix; y is the intensity of the original image. Hence, all the coefficients K i x; y can be computed conveniently via a linear combination of the intensity in the neighboring window R × C of the centered pixel x; y, each of which P P has a weight expressed as wi r; c Pi r; c∕ r c P2i r; c, i 4, 5, 6. Moreover, by substituting the definition of Pi r; c 3932


2

2

6 −1 6 1 6 6 −2 W4 70 6 6 4 −1 2 2

4 6 2 6 1 6 6 0 W5 100 6 6 4 −2 −4 W6

2

2

2

−1

−1

−1

−2

−2

−2

−1

−1

−1

2 2

2 2 0 −2

1

0 −1

0

0

0

−1

0

1

−2

0

2

2

3

−1 7 7 7 −2 7 7; 7 −1 5 2 3 −4 −2 7 7 7 0 7 7; 7 2 5 4

W T4 :

(3)

Given the result of Eq. (2), we then deduce the SODD filter along direction vector l at pixel x0 ; y0 according to the directional derivative theorem as follows: ∂2 f x; y f xx x; ycos2 α 2f xy x; y ∂l2 x0 ;y0 × cos α cos β f yy x; ycos2 βjx0 ;y0

i1

∂2 f r; c 2K 4 ; ∂r2 0;0 ∂2 f r; c K 5; ∂r∂c 0;0 ∂2 f r; c 2K 6 : ∂c2 0;0

mentioned above, these three weight kernels can be acquired as follows:

2K 4 x0 ; y0 cos2 α 2K 5 x0 ; y0 × cos α cos β 2K 6 x0 ; y0 cos2 β; (4) where α is the angle between l and the x axis (row of image), and β is the angle between l and the y axis (column of image). Note that the values of the target region in the primitive SODD map calculated by Eq. (4) are less than zero whereas the neighboring values along the assigned direction of l are larger than zero. In order to normalize the map into a unified representation range that the target values are higher than backgrounds, which is able to facilitate subsequent operations, the primitive SODD map needs to be amended by two steps consisting of: (1) set values of the primitive SODD map F pri S larger than zero to be zero to get F st1 and (2) inverse the whole S st1 st1 st1 map F st1 by 1 − F − min F ∕ max F st1 S S S S − min F S st2 to get the amended SODD map F S . After this, the shape characteristics of the target (Gaussian-like spot) and clutters (fixed-orientational strip-like texture) are visualized in the amended SODD map. Figure 2 illustrates the framework of SODD filtering by which the original image [Fig. 2(a)] is filtered to a primitive SODD map [Fig. 2(b)] and then amended to a amended map [Fig. 2(c)]. In our proposed detection method, four SODD maps of the original image with four directions, namely α 0°; β 90°, α 90°; β 0°, α 45°; β 45°, α 135°; β 45°, are used as feature maps for the orientation channel shown in Fig. 3.

to the prior distributions over the threshold. By fairly considering every possible feature value in each independent feature map, we suggest the threshold as subjecting to uniform distributions without loss of generality as [33]. The expression is as follows: Fig. 2. Illustration for the designed SODD filter: (a) the original map; (b) the primitive SODD map; and (c) the amended SODD map.

B.

Boolean Map Computation

In general, for an input feature map, it is hard to get an accurate and divisible feature value automatically as the threshold to confirm the spatial locations of targets. Any auto-adaptive algorithm for calculating the threshold would be dependent. Once the threshold is not suitable for the target, the obtained Boolean map will give a misleading result. To solve this problem, we adopt the idea suggested by Zhang et al. [33]. Specifically, instead of using a precise threshold to compute a “fine” Boolean map, we adopt a series of relaxed thresholds to compute a set of “coarse” Boolean maps and then give adaptive weights to these maps for final fusion (described in Section 4). No matter whether in the color channel or in the orientation channel, a small target is relatively bright in the local context area or even the global area. So, the probability that the target value is above a random threshold is higher than the probability below the threshold. Taking advantage of this principle, we compute Boolean maps through characterizing the feature map into a set of hierarchically segmented binary maps similar to [33]. It originally derives from the theory of generating Boolean maps by randomly thresholding the feature map according

Bij x; y

1

if F i x; y ≥ T j ;

0

if F i x; y < T j ;

subject to T j ∈ T;

T ∼ pT ;

(5)

in which Bij is the Boolean map of the i-th feature map F i with threshold T j ; x; y denotes the pixel location; and pT denotes a uniform distribution. One can see that the calculation of expression (5) is somewhat inconvenient due to the unknown magnitude range of each feature map and random values produced by the distribution. In practice, for simpleness in our work, each input feature map is linearly scaled into a integer interval [0, 255] for normalization, and the uniformly spaced thresholds fT j g, j 1; 2; …; n, with a fixed step size δ is adopted to calculate Boolean maps, in which T j1 T j δ and n is the number of thresholds. At last, we also set 0 to connected areas adjacent to image boundaries which are labeled 1 by previous steps in each Boolean map. There are two reasons for this operation. On one hand, areas adjacent to image boundaries are mainly backgrounds in general cases. On the other hand, even though the target is adjacent to the boundaries, it is damage that could not be accurately estimated as a target because of the incomplete surrounding information. Therefore, removing these areas could reduce the error rate from the Boolean map representation. In our work, thresholds fT j g, j 1; 2; …; n, are set from 3 to 251 with fixed step size δ 4 to balance the efficacy and efficiency.

Fig. 3. Framework of our proposed infrared small target detection method based on Boolean map visual theory. 20 June 2014 / Vol. 53, No. 18 / APPLIED OPTICS

3933

An instance of Boolean maps created by uniformly spaced thresholding is shown in Fig. 3. 4. Proposed Infrared Small Target Detection

Small target detection is performed by target enhancement and extraction. Different from traditional methods, we calculate the target enhanced map by fusing the weighted Boolean maps of the two feature channels, and then extract the target by an energy ratio based segmentation. This section explains how to create the target enhanced map and how to extract the target. A.

one feature map (the gray intensity map) in the color channel and four feature maps (the SODD maps) in the orientation channel; hence the BFc for color channel is in fact the BFi for the gray intensity map and the BFc for orientation channel is the average of BFi for four SODD maps. • Third, the final target enhanced map I e is obtained by a multiplication among all fused Boolean maps for each feature channel expressed as Ie

In our work, the target enhanced map is calculated by fusing all the Boolean maps from color and orientation channels. To improve the SCR of small targets, each Boolean map needs to be weighted before fusion. These weights suggest the accuracy of target expression in each Boolean map. Formally, given a Boolean map B, its weight w is calculated as follows: (6)

where N label and N map denote the number of labeled pixels (with value 1) and total pixels of Boolean map B, respectively. The weighting expression is adopted by the reason that small targets merely possess very few pixels with limited sizes while backgrounds always spread large areas. It suggests that if the number of labeled pixels is small, the Boolean map will get a high weight, so that small targets can be highlighted. On the contrary, if the labeled area is large, it is very likely to be the background so the weight will be low. Although expression (6) might attach high weights to some isolated bright noises in color channel, they would also be suppressed in the orientation channel as they are not sensitive to the SODD filter. After weighting each Boolean map, we then introduce how to obtain the target enhanced map by fusing weighted Boolean maps in details. • First, the fused Boolean map for each feature map is calculated as X BFi wij Bij ; (7) j

where BFi is the fused Boolean map of the ith feature map and wij is the weight for Bij. • Second, the fused Boolean map for each feature channel is computed as BFc

1 X F B ; jΩc j i∈Ω i

Intrinsically, the combination of color and orientation in Eq. (9) is a procedure of multi-feature fusion. It leverages the complementarity of multiple features that improves the accuracy, robustness, and adaptivity of the detection for various scenes. With such a combination, the target enhanced map could highlight small targets and suppress backgrounds in a big extent relative to the original one. In addition, we smooth the enhanced map through a Gaussian filtering for the case that the target might be enhanced discontinuously into a few fractions due to the blind hierarchical segmentation in Boolean map computation. The standard deviation of the Gaussian kernel is set as 0.8 percent of the shortest boundary of the image according to experiments. B. Target Extraction

Traditional segmentation methods for small target extraction such as the well known adaptive thresholding [6,9] requires that the target enhanced map should obey a certain distribution like Gaussian. However, a desired target enhanced map might not obey this distribution. For instance, suppose that in an ideal target enhanced map (with gray level from 0 to 255), backgrounds are 0 and targets are 255. As a result, in its histogram, most pixels concentrate in 0 while very few pixels gather in 255, which obviously do not meet Gaussian distribution. In general, Gaussian distribution is always unsatisfied in maps with high SCR, for which traditional methods may lead to high error rate. In order to overcome this deficiency, we propose to use an energy ratio based segmentation instead of traditional approaches to extract targets. This technique has no prior requirements on distribution and is very suitable for targets possessing major energy of the map. The threshold T is calculated as follows: P t xy I e x; y T arg min P − γ ; t I x; y

(8)

where BFc is the fused Boolean map of the cth feature channel and Ωc is the collection of feature maps in the cth feature channel. As in our work, there is APPLIED OPTICS / Vol. 53, No. 18 / 20 June 2014

(9)

where C contains the color channel and the orientation channel in our work.

c

3934

BFc ;

c∈C

Target Enhanced Map Creation

N label −1 w ; N map

Y

xy e

in which

(10)

I te x; y

I te x; y 0

if I e x; y ≥ t; if I e x; y < t;

where γ is a ratio given from 0 to 1. Expression (10) indicates that the segmentation threshold T is defined according to the ratio γ of the energy of reserved regions to the total energy of the map. In the target enhanced map, since small targets are highlighted by previous processing, the major energy is almost concentrated in target regions. The merit of the proposed thresholding approach is that it does not rely on the distribution of target enhanced map, and the higher the SCR can be acquired, the more accurate the extraction will be. Figure 3 shows the whole framework of our proposed method. 5. Experiments and Results

For a desired infrared small target detection method, not only is accuracy needed, but robustness is also required. In this sense, we hope that an applied technique has the adaptability for various complex scenes and meanwhile has a strong antinoise capability. To verify the performance mentioned above, we use a set of collected real infrared images with various and complex clutters to test our proposed method. These images are shown in Fig. 4, which covers typical backgrounds including the sky [Figs. 4(a)– 4(d) and 4(i)], sea [Figs. 4(e)–4(g)], and ground [Figs. 4(h) and 4(j)] scenes. Among these images, multiple targets [Fig. 4(g)], strong noises [Figs. 4(h)– 4(j)], and irregular interferences [Figs. 4(i) and 4(j)] are contained as well. The resolution of these test images are: 146 × 146 for Figs. 4(a)–4(c), 557 × 448 for Fig. 4(d), 276 × 224 for Figs. 4(e)–4(g), 316 × 252 for Fig. 4(h), 252 × 252 for Fig. 4(i), and 338 × 217 for Fig. 4(j). Red boxes in Fig. 4 denote regions containing real small targets. In addition, we perform six state-of-the-art small target detection algorithms on these images to make comparisons. They are the morphological method (Top-hat) [18], the max-mean filter method (Max-mean) [16], the max-median filter method (Max-median) [16], the adaptive Butterworth highpass filter method (BHPF) [8], the faced-based method (Facet-model) [5], and the min-local-LoG filter method (MLL) [15], which are typical approaches from perspectives of spatial analysis (Max-mean, Max-median, Facet-model, MLL), frequency analysis (BHPF), and morphological analysis (Top-hat). In order to unify the parameters for comparing, the filter window size is set to 9 × 9 for Max-mean and Max-median, and the structural element is set as a disk template with radius of 4, which are approximately selected according to target sizes in test images. Hereinafter, we refer our proposed method as BMVT for abbreviation. Results with target enhanced maps processed by all methods are displayed in Fig. 4.

As suggested in [34], two commonly used evaluation indicators in small target detection, i.e., signal-to-clutter ratio gain (SCRG) and background suppression factor (BSF), are adopted for quantitatively evaluating the performance of all these methods. Their expressions are as follows: SCRG

S∕Cout ; S∕Cin

BSF

Cin ; Cout

(11)

where S and C are the signal amplitude and clutter standard deviation, respectively; in and out represent the input original image and the output target enhanced map, respectively. For our experiments, S is calculated as the difference between the mean values of targets and backgrounds, and C is calculated as the standard deviation of backgrounds. Besides the SCRG and BSF, the receiver operation characteristic (ROC) curves are drawn as well as the work in [9]. The ROC curve reflects the varying relationship between the detection probability Pd and the false alarm rate Pf , in which Pd is defined as the ratio of the number of detected pixels to the number of real target pixels and Pf is the ratio of the number of false alarms to the total number of pixels in the whole image. SCRG and BSF obtained by all compared methods for each test image in Fig. 4 are shown in Fig. 5, and their ROC curves are shown in Fig. 6. From the experimental results, it can be seen that BMVT has obtained the best SCRG for all images and the highest BSF for most images, which suggest that our proposed method has good performance on target enhancement of small target detection. Facetmodel has comparable BSF in Figs. 5(a) and 5(d) and even better BSF in Figs. 5(f) and 5(h). It indicates that in real images Figs. 4(a), 4(d), 4(f), and 4(h), there are few noises owing extreme spread so that extremum searching methods such as the Facetmodel can suppress background clutters well. However, for images like Fig. 4(e), this technique works poor in BSF because large amounts of complex clutters and extremum-like noises are taken on. In fact, as stated in [4], SCRG reflects the amplification of target signal relative to backgrounds after and before processing, whereas BSF only expresses the suppression level of backgrounds without any information about target. From this perspective, SCRG is more significant and meaningful for target detection compared to BSF. Thus, BMVT actually outperforms these state-of-the-art methods. Compared with the “macroscopic” evaluation term, i.e., SCRG and BSF, ROC curves can be seemed as “microscopic” indicators in describing the target segmentation performance by thresholding the target enhanced map. In this sense, a good ROC curve should have high detection probability when the false alarm rate is low. The better the ROC curve can be acquired, the easier the target extraction will be. That is, if all the target values are higher than background values in the target enhanced map, then 20 June 2014 / Vol. 53, No. 18 / APPLIED OPTICS

3935

the selection for segmentation threshold will be manageable. From Fig. 6, one can see that our proposed method performs well and gets respective results in ROC curves for the test images. Such an observation suggests that the detection accuracy of BMVT is higher than others under the same false alarm rate in general cases. Note that in few circumstances such as Figs. 4(a), 4(d), and 4(g) [referring Figs. 6(a), 6(d), and 6(g)], Top-hat could draw a litter better ROC curves than the proposed method. This is easily com-

prehended because the detection results for Top-hat are depended on the design of structural element. When the element is very appropriate for the test infrared images, including the target shapes and background textures, the ROC curves will be excellent. On the contrary, once the size and shape of the structural element was not suitable for current target or backgrounds, the result would become poor just like scenes in Figs. 4(b), 4(c), and 4(j) [referring Figs. 6(b), 6(c), and 6(j)]. Top-hat is less robust since it

Fig. 4. Test infrared images and their target enhanced maps processed by our method and several state-of-the-art methods. For each row, maps from left to right are: the original image, Top-hat, Max-mean, Max-median, BHPF, Facet-model, MLL, and BMVT. 3936


is not auto-adaptive in essence, which restricts its usage. Additionally, compared to SCRG and BSF, the ROC curves of Facet-model become worse than the previous two indicators. The reason is that the Facet-model could not only enhance the small target but also highlight a lot of noises that have extremum spread. It blindly enhances all extremum points regardless of distinguishing real targets and noises. Thus, the Facet-model often leads to large amounts of residual peak values in the target enhanced map. It will increase the false alarm rate. Inspiringly, different from Facet-model and Top-hat, there are almost no sensitive parameters in BMVT and the results of our proposed method is encouraging for

SCRG, BSF, and ROC curves. Accordingly, it demonstrates that the proposed method is indeed robust and effective for various complex infrared scenes. Furthermore, BMVT outperforms many state-ofthe-art methods. Visual comparisons for target enhanced maps are also shown in Fig. 4. All these maps are linearly scaled to [0, 255] for easy viewing. Clearly, we can see that background clutters are almost “clean” in BMVT maps while they are disorderly distributed in other maps. In addition, all the BMVT maps have evident variances that large background areas are “black” while target regions are “white”. However, differences between these two regions are smaller

Fig. 5. SCRG and BSF obtained by all compared methods for each test image in Fig. 4. 20 June 2014 / Vol. 53, No. 18 / APPLIED OPTICS

3937

in other maps. For situations with irregular interferences and strong noises such as Figs. 4(i) and 4(j), the results of the proposed method are still excellent. By contrast, apparent clutters like the tree-fork in Fig. 4(i) and the road-side in Fig. 4(j) are reserved yet in other maps. That is, our proposed method actually has a wide application range even for some harsh circumstances. Finally, we vary the ratio γ in expression (10) to test the target extraction performance of BMVT. Detection rate and false rate are recorded when γ varies from 0 to 1 with a step of 0.02. In this experiment, the detection rate is calculated as the number ratio of correctly detected targets to totally real targets, and the false rate is calculated as the number ratio of incorrectly detected targets to totally

detected targets. The result curves for all the test infrared images are shown in Fig. 7. With the increase of γ, more energy of the map is to be reserved, so the segmentation threshold T in Eq. (10) turns out to be small, by which both the detection rate and false rate are promoted to increase. Ideally, we hope that the detection rate is high while the false rate is small enough. By observing the two curves, it is obvious that when γ is in the interval of (0, 0.4], detection results will be acceptable. As a matter of fact, there is a trade-off between detection rate and false rate. It is very hard to decide which value of γ is the best for detection because the optimization depends on specific tasks. For example, if the detection error is intolerable and the missing case is allowable, we prefer to set γ into the range of (0, 0.2]. Otherwise,

Fig. 6. ROC curves obtained by all compared methods for each test image in Fig. 4. 3938


This work was supported by the National Natural Science Foundation of China under Grant 61273279 and Grant 61273241. References

Fig. 7. Relationship between detection/false rate and the energy ratio γ.

γ should be in (0.2, 0.4] and vice versa. The suitable range of this parameter can be determined by numerous experiments from actual applications. 6. Conclusions

An infrared small target detection method based on Boolean map visual theory is proposed in this paper. Different from traditional methods, we solve the problem by visual characteristics of small targets, namely, brightness and Gaussian-like shape in local context area. Regularly, these characteristics can make small targets attract human attention. By this observation, the task is performed under a visual attention framework with Boolean map theory. This theory reveals that an observer’s visual awareness corresponds to only one Boolean map via a selected feature at any given instant. In detail, we detect small targets by Boolean maps from color and orientation features. In order to analyze the characters of small targets in the orientation channel, we construct a SODD filter based on the facet model to classify targets and backgrounds into different textures. This filter calculates SODD maps by fitting to underlying intensity surfaces of image patches so that it is insensitive to noises. In addition, an energy ratio based segmentation is recommended to extract targets. The threshold is set according to the ratio of reserved energy to total energy, which does not rely on the prior distribution of the enhanced map. The proposed method has a wide application for various complex backgrounds, including the sky, sea, and ground clutters. It is also very robust to harsh circumstances with irregular inferences and strong noises. Furthermore, there are no sensitive parameters in the technique. Experiments demonstrate that our proposed method has good performance in both detection accuracy, robustness and adaptivity compared to state-of-the-art methods. It could be further used in target recognition and tracking in ATR systems.

1. J. Zhao, H. Feng, Z. Xu, Q. Li, and H. Peng, “Real-time automatic small target detection using saliency extraction and morphological theory,” Opt. Laser Technol. 47, 268–277 (2013). 2. J. Shaik and K. Iftekharuddin, “Detection and tracking of targets in infrared images using Bayesian techniques,” Opt. Laser Technol. 41, 832–842 (2009). 3. S. Qi, J. Ma, H. Li, S. Zhang, and J. Tian, “Infrared small target enhancement via phase spectrum of quaternion Fourier transform,” Infrared Phys. Technol. 62, 50–58 (2014). 4. S. Qi, J. Ma, C. Tao, C. Yang, and J. Tian, “A robust directional saliency-based method for infrared small-target detection under various complex backgrounds,” IEEE Geosci. Remote Sens. Lett. 10, 495–499 (2013). 5. G. Wang, C. Chen, and X. Shen, “Facet-based infrared small target detection,” Electron. Lett. 41, 1244–1246 (2005). 6. W. Meng, T. Jin, and X. Zhao, “Adaptive method of dim small object detection with heavy clutter,” Appl. Opt. 52, D64–D74 (2013). 7. E. Vasquez, F. Galland, G. Delyon, and P. Réfrégier, “Mixed segmentation-detection-based technique for point target detection in nonhomogeneous sky,” Appl. Opt. 49, 1518–1527 (2010). 8. L. Yang, J. Yang, and K. Yang, “Adaptive detection for infrared small target under sea-sky complex background,” Electron. Lett. 40, 1083–1085 (2004). 9. Y. Gu, C. Wang, B. Liu, and Y. Zhang, “A kernel-based nonparametric regression method for clutter removal in infrared small-target detection applications,” IEEE Geosci. Remote Sens. Lett. 7, 469–473 (2010). 10. H. Deng, Y. Wei, and M. Tong, “Small target detection based on weighted self-information map,” Infrared Phys. Technol. 60, 197–206 (2013). 11. P. Wang, J. Tian, and C. Gao, “Infrared small target detection using directional highpass filters based on ls-svm,” Electron. Lett. 45, 156–158 (2009). 12. L. Huang and H. Pashler, “A Boolean map theory of visual attention,” Psychol. Rev. 114, 599–631 (2007). 13. R. Haralick, “Digital step edges from zero crossing of second directional derivatives,” in IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE, 1984), pp. 58–68. 14. J. Suykens and J. Vandewalle, “Least squares support vector machine classifiers,” Neural Proc. Lett. 9, 293–300 (1999). 15. S. Kim, “Min-local-log filter for detecting small targets in cluttered background,” Electron. Lett. 47, 105–106 (2011). 16. S. Deshpande, M. Er, R. Venkateswarlu, and P. Chan, “Maxmean and max-median filters for detection of small-targets,” Proc. SPIE 3809, 74–83 (1999). 17. T. Bae, “Small target detection using bilateral filter and temporal cross product in infrared image,” Infrared Phys. Technol. 54, 403–411 (2011). 18. V. Tom, T. Peli, M. Leung, and J. Bondaryk, “Morphologybased algorithm for point target detection in infrared backgrounds,” Proc. SPIE 1954, 2–11 (1993). 19. X. Bai and F. Zhou, “Analysis of new top-hat transformation and the application for infrared dim small target detection,” Pattern Recognition 43, 2145–2156 (2010). 20. X. Bai, F. Zhou, and B. Xue, “Infrared dim small target enhancement using toggle contrast operator,” Infrared Phys. Technol. 55, 177–182 (2012). 21. X. Bai, F. Zhou, and B. Xue, “Fusion of infrared and visual images through region extraction by using multi scale center-surround top-hat transform,” Opt. Express 19, 8444– 8457 (2011). 22. X. Bai and F. Zhou, “Hit-or-miss transform based infrared dim small target enhancement,” Opt. Laser Technol. 43, 1084– 1090 (2011). 23. J. Guo and G. Chen, “Analysis of selection of structural element in mathematical morphology with application to 20 June 2014 / Vol. 53, No. 18 / APPLIED OPTICS

3939

24. 25. 26.

27.

28.

infrared point target detection,” Proc. SPIE 6835, 68350P (2007). A. Borji and L. Itti, “State-of-the-art in visual attention modeling,” IEEE Trans. Pattern Anal. Mach. Intell. 35, 185–207 (2013). B. Alexe, T. Deselaers, and V. Ferrari, “What is an object?” in IEEE Conference on Computation Vision and Pattern Recognition (IEEE, 2010), pp. 73–80. U. Rutishauser, D. Walther, C. Koch, and P. Perona, “Is bottom-up attention useful for object recognition?” in IEEE Conference on Computation Vision and Pattern Recognition (IEEE, 2004), pp. II37–II44. X. Shao, H. Fan, G. Lu, and J. Xu, “An improved infrared dim and small target detection algorithm based on the contrast mechanism of human visual system,” Infrared Phys. Technol. 55, 403–408 (2012). X. Wang, G. Lv, and L. Xu, “Infrared dim target detection based on visual attention,” Infrared Phys. Technol. 55, 513–521 (2012).

3940


29. D. Chan, D. Langan, and D. Staver, “Spatial processing techniques for the detection of small targets in IR clutter,” Proc. SPIE 1305, 53–62 (1990). 30. T. Soni, J. Zeidler, and W. Ku, “Performance evaluation of 2-d adaptive prediction filters for detection of small objects in image data,” IEEE Trans. Image Process. 2, 327–340 (1993). 31. L. Huang, A. Treisman, and H. Pashler, “Characterizing the limits of human visual awareness,” Science 317, 823–825 (2007). 32. C. Healey and J. Enns, “Attention and visual memory in visualization and computer graphics,” IEEE Trans. Vis. Comput. Graph. 18, 1170–1188 (2012). 33. J. Zhang and S. Sclaroff, “Saliency detection: a Boolean map approach,” in IEEE International Conference on Computation Vision (IEEE, 2013). 34. C. I. Hilliard, “Selection of a clutter rejection algorithm for real-time target detection from an airborne platform,” Proc. SPIE 4048, 74–84 (2000).

A robust vision-based method for staircase detection and localization.

Robust membrane detection based on tensor voting for electron tomography.

Analysis of infrared signature variation and robust filter-based supersonic target detection.

A Robust Shape Reconstruction Method for Facial Feature Point Detection.

Robust pedestrian detection by combining visible and thermal infrared cameras.

Digital vision theory: Boolean logic model.

A retinal vessel boundary tracking method based on Bayesian theory and multi-scale line detection.

An effective and robust method for tracking multiple fish in video image based on fish head detection.

Robust stabilization control based on guardian maps theory for a longitudinal model of hypersonic vehicle.

Swarm: robust and fast clustering method for amplicon-based studies.

BACs-on-beads: a new robust and rapid detection method for prenatal diagnosis.

LEA Detection and Tracking Method for Color-Independent Visual-MIMO.

Grey situation group decision-making method based on prospect theory.

Amplification-based method for microRNA detection.

Retraction Notice: 3D DWT-DCT and Logistic MAP Based Robust Watermarking for Medical Volume Data.

A novel robot visual homing method based on SIFT features.

Statistical detection of boolean regulatory relationships.

Light Emitting Marker for Robust Vision-Based On-The-Spot Bacterial Growth Detection.

Small infrared target detection by region-adaptive clutter rejection for sea-based infrared search and track.

A simple and efficient method for breast cancer diagnosis based on infrared thermal imaging.

Robust Feature Selection from Microarray Data Based on Cooperative Game Theory and Qualitative Mutual Information.

On Various Nonlinearity Measures for Boolean Functions.

Electrically tunable infrared filter based on the liquid crystal Fabry-Perot structure for spectral imaging detection.

A novel trust evaluation method for Ubiquitous Healthcare based on cloud computational theory.