Research Article

Vol. 55, No. 5 / February 10 2016 / Applied Optics

1151

Parametric temporal compression of infrared imagery sequences containing a slow-moving point target REVITAL HUBER-SHALEM,1,* OFER HADAR,1 STANLEY R. ROTMAN,2

AND

MERAV HUBER-LERNER1

1

Dept. of Communication Systems Engineering, Ben Gurion University of the Negev, P.O. Box 653, Be’er Sheva, 84105, Israel Dept. of Electrical and Computer Engineering, Ben Gurion University of the Negev, P.O. Box 653, Be’er Sheva, 84105, Israel *Corresponding author: [email protected]

2

Received 20 October 2015; revised 16 December 2015; accepted 29 December 2015; posted 5 January 2016 (Doc. ID 252331); published 10 February 2016

Infrared (IR) imagery sequences are commonly used for detecting moving targets in the presence of evolving cloud clutter or background noise. This research focuses on slow-moving point targets that are less than one pixel in size, such as aircraft at long range from a sensor. Since transmitting IR imagery sequences to a base unit or storing them consumes considerable time and resources, a compression method that maintains the point target detection capabilities is highly desirable. In this work, we introduce a new parametric temporal compression that incorporates Gaussian fit and polynomial fit. We then proceed to spatial compression by spatially applying the lowest possible number of bits for representing each parameter over the parameters extracted by temporal compression, which is followed by bit encoding to achieve an end-to-end compression process of the sequence for data storage and transmission. We evaluate the proposed compression method using the variance estimation ratio score (VERS), which is a signal-to-noise ratio (SNR)-based measure for point target detection that scores each pixel and yields an SNR scores image. A high pixel score indicates that a target is suspected to traverse the pixel. From this score image we calculate the movie scores, which are found to be close to those of the original sequences. Furthermore, we present a new algorithm for automatic detection of the target tracks. This algorithm extracts the target location from the SNR scores image, which is acquired during the evaluation process, using Hough transform. This algorithm yields a similar detection probability (PD) and false alarm probability (PFA) of the compressed sequences and the original sequences. The parameters of the new parametric temporal compression successfully differentiate the targets from the background, yielding high PDs (above 83%) with low PFAs (below 0.043%) without the need to calculate pixel scores or to apply automatic detection of the target tracks. © 2016 Optical Society of America OCIS codes: (100.2000) Digital image processing; (100.4999) Pattern recognition, target tracking. http://dx.doi.org/10.1364/AO.55.001151

1. INTRODUCTION Infrared imagery sequences are used for automatic detection of moving targets in the presence of evolving cloud clutter or background noise [1–3]. In particular, point target detection is performed in various infrared (IR) surveillance applications [3]. We focus here on such IR video sequences, which contain point targets. Such sequences contain an enormous amount of data whose transmission and storage are very time and resource consuming. In today’s scenarios, when every piece of information is monitored, recorded, and saved for later use such as debriefing, or needs to be promptly received at the headquarters for higher-level decisions, the size of the video sequences should be as minimal as possible for storage and archiving or for faster delivery on a busy network, respectively. To reduce the time and the resources required, a compression method that 1559-128X/16/051151-13$15/0$15.00 © 2016 Optical Society of America

maintains the point target detection capabilities is desired. Most compression methods deal with images or video streams designed to be viewed by human viewers, or contain large targets that can be detected spatially before compression. Common compression algorithms that are used in image and video compression standards incorporate a spatial lossy compression stage that reduces coefficients of high spatial frequencies and may cause a point target to be lost or smeared. The second aspect of video sequence compression is temporal. Common temporal compression algorithms for video streams involve motion estimation and compensation where the best spatial match between blocks from the current frame and the previous or next frames is searched [4,5]. In such cases, individual pixels of point targets in IR sequences are ignored if there are other larger matching elements such as clouds.

1152

Research Article

Vol. 55, No. 5 / February 10 2016 / Applied Optics

Thus, we compress the sequences in a lossy manner, which alters the temporal profiles but still maintains their general shapes and, more importantly, the detection performance of targets. In our previous work [1,2], we introduced two lossy temporal compression methods—the discrete cosine transform (DCT) quantization and the parabola-fit—for IR imagery sequences containing slow moving targets of size less than one pixel in the presence of evolving cloud clutter or background noise. These compression methods can maintain the point target detection capabilities and can be followed by a lossless spatial compression stage. A detection algorithm is not implemented on the raw data at the sensor, but rather the data is compressed and then transmitted for evaluation [1]. The performance of our compression algorithm is evaluated via the compression ratio, as well as a detection score for each pixel (variance estimation ratio score—VERS), a signal-to-noise ratio (SNR)-based detection measure for the entire sequence, and probabilities of detection (PD) and false alarm (PFA). Then, the results of our compression method are compared with the scores of the original sequence and of our previously introduced DCT quantization [1,2] compressed sequences. Prior to compression, we extract the noise level for each temporal profile. It is used as the standard deviation (STD) of white Gaussian noise (WGN) added to the temporal profiles after compression while calculating VERS, to compensate for smoothing caused by the compression [1,2]. Our current work deals with the following. First, to yield PD and PFA from VERS results, we developed a new algorithm for target track detection based on straight-line extraction using Hough transform (Section 4). Second, the noise level calculation process is modified from our previous work [1,2] to suppress clutter and to improve target detection (Section 5). Finally, we introduce a new temporal compression method, namely parametric temporal compression (PTC), where the temporal profile is fitted into a Gaussian shape, a straight line, or a parabola (Section 6). Bit allocation and bit encoding are then performed on the parameters of the Gaussians and the polynomials to yield a compact representation of the original data. The target peaks are best described by the Gaussian representation, and a scatter plot of the parameters reveals a clear distinction between target and background pixel parameters. This distinction is then utilized for a simple and efficient detection process through classification of the target pixels from the background, which yields high PD and low PFA values. The paper is organized as follows. Section 2 presents the original and synthetic IR sequences. Section 3 deals with the score and the evaluation metrics. In Section 4 we elaborate on the automatic algorithm for detecting target tracks. The modified noise estimation is shown in Section 5. Section 6 details the proposed lossy parametric temporal compression method, followed by spatial compression and bit encoding. Section 7 contains the results of the proposed compression method. Section 8 summarizes the paper and outlines plans for our future work. 2. IR SEQUENCES The IR imagery sequences used in this work were taken from an American Air Force site [6]. The data was acquired using two IR cameras with a spectral region of 3.4 to 5.0 microns

Fig. 1. Representative frames and target locations of (a) J2A, (b) J13C, (c) NA23A, and (d) NPA.

and square pixel sizes of 40 and 24 microns. The sequences contain 95 or 100 frames (depending on the sequence) sized 244 × 320 pixels per frame, and a 12-bit pixel representation. The frame rate is 30 frames per second (fps). Since the sensors used for this scene are fixed on the ground and point toward the sky, the temporal variation of the signal originates from clutter (clouds), the target’s movement, and the background noise. The sequences include one or two slow targets moving across different types of sky elements—clouds (clutter dominated) or clear sky (noise dominated) [7–11]. The IR sequences we use are shown in Fig. 1, with their corresponding target locations in the white squares. Their respective scene types are presented in Table 1. A. Synthetic IR Sequences

We have a very limited source of only four IR sequences that contain targets and one that does not contain targets (Table 1). Nonetheless, each sequence with targets has different target and background characteristics—from fluffy clouds to strong clouds, from very slow to fast targets, and targets traversing either cloud or noise pixels. The limited source may not enable complete analysis. Hence, we synthetically created sequences that are based on the original ones. The first type of synthetic IR sequences (type A) consists of the original sequences J2A, J13C, NPA, and M21F with a single target, taken from the sequence NA23A, instead of the original sequences’ targets.

Table 1. IR Imagery Sequences Characteristics [1,2,10,11] Movie

Targets

Background

J2A J13C NA23A NPA M21F

Two fast targets One very slow target One medium-speed target Two slow targets —

Fluffy clouds Strong clouds Small bright clouds Wispy clouds Hot hazy night sky, no clouds

Research Article The original targets are replaced by temporal noise pixels, with average intensities as the average value of their spatial neighbors. This type combines one target on different backgrounds from cloudless sky to strong clouds, which alters the ratio between the target pixel scores and the background pixel scores, and tests the ability to detect the targets on those different backgrounds. The movies are named A-J2A, A-J13C, and so on. The second type (type B) includes the original IR sequences J2A, J13C, NPA, and NA23A, with a modified target velocity that is twice as slower or faster than the original velocities, except for J13C which has velocities that are two and four times faster. An example for a slower target pixel is shown in Fig. 2 for NA23A. Since we deal with slow-moving point targets, with peak widths in the range of 5–30 frames, we expanded the variety inside this range and even to slower velocities. Faster velocities are considered as fast targets and are out of the scope of this work. The names of the movies contain their respective new target velocity as “J2Afast2”, “J13Cfast4”, “NA23Aslow2”, etc.

Vol. 55, No. 5 / February 10 2016 / Applied Optics

1153

3. SCORE AND EVALUATION METRICS The temporal profile of a pixel through which a point target passed has a high and narrow peak in which the intensity of the point source is proportional to the height of the peak and the velocity of the point source is inversely proportional to the width of the peak [1,2,7–9]. Since a point target cannot be detected spatially, each pixel is scored individually using a temporal process. This temporal process compares the overall estimated variance of a temporal profile of a pixel (the average over the K minimal variance values σ˜2i of the temporal profile) and its highest fluctuation σˆ 2max , and thus is referred to as the variance estimation ratio score (VERS). If this fluctuation is high enough, and so is the pixel score, the pixel is suspected to be a target [1,2,10,11]. The variances are calculated first by an overlapping linear fit (LF) for mean estimation, and then by a sliding variance. The pixel score is calculated by K /1X σ˜2 : (1) scorepixel  σˆ 2max K i1 i VERS is a simple measure that performs well in both noiseand clutter-dominated backgrounds, and produces higher pixel scores for target pixels than most background pixels [10,11]. The pixel score images of four sequences are presented in Fig. 3, while focusing on the target location. The target tracks are vividly seen. These pixel scores are used to calculate the movie score by applying a SNR measure. The first step divides the pixel score image into spatial blocks that are 30 × 30 in size, and scores each block (scoreblock ) as the average E· of the maximum N M pixel scores vi in the block, where M is the maximum N M score group, subtracted by the average of the pixel scores that do not belong to the maximum group M , and divided by the standard deviation (STD) of the pixel scores that do not belong to M. The block scores are calculated by

Fig. 2. Temporal profile of target pixel (136,130) of NA23A from (a) the original sequence and (b) type B (slower target NA23Aslow2).

Fig. 3. Pixel scores images of the targets of (a) J2A, (b) J13C, (c) NA23A, and (d) NPA.

1154

Research Article

Vol. 55, No. 5 / February 10 2016 / Applied Optics

scoreblock i 

Eνi ∈ M  − Eνi∉M  : stdνi∉M 

(2)

The second step calculates the overall movie score (scoremovie ) in a similar manner, but it uses the block scores, where the known target blocks (TB) replace the maximum score group. As the movie score becomes higher, the target detection task becomes easier [1,2,10–13]. The overall movie score is calculated by

1: score > th_score AND blk_score > th_blk_score ?

Pixel scores

yes Pass pixel

2: Hough transform

EScoreblock i ∈ TB − EScoreblock i∉TB : (3) scoremovie  stdScoreblock i∉TB Another evaluation metric regards the compression of the movie after temporal and spatial compression and bit encoding. It is referred to as total compression ratio (TCR). TCR is calculated as the size of the original movie in bytes before compression, divided by its size after compression, as follows: TCR 

Original movie size : Compressed movie size

yes 4: Does the group have a 2-D shape?

yes

no

Group is not a track

In [1] we presented an automatic detection of target tracks that groups high-scored pixels into suspected target tracks and eliminates groups that do not comply with a straight-line criterion, which stems from the expected transition of a nonmaneuvering target with a constant velocity in a straight line trajectory between pixels in short-timed sequences as the examined movies. Here we present a quicker and more efficient grouping method that does not require examining the neighbors of each highscored pixel separately. The pixel scores calculated by Eq. (1) create a pixel score image on which we apply a threshold resulting in a binary image. This binary image, in which 1’s represent the high-scored pixels, is the input to our new grouping method that is based on Hough transform. This transform is used for straight-line detection; pixels along these lines are suspected to belong to target tracks. Hough transform [14,15] provides a parametric representation of a straight line using two variables; the distance from the origin to the line–ρ, which is the length of a line vertical to the examined straight line, and the angle between the origin and that vertical line–θ, which is the angle between the origin and that vertical line. Distance ρ is calculated by ρ  x · cosθ  y · sinθ;

3: Detect lines with highest Hough grades. Line detected?

no

(4)

4. TARGET TRACK DETECTION

(5)

where x and y are the pixel coordinates in the image. In Hough-transform space, pixels in an image are represented as sinusoids, and points in the transform space represent straight lines in the image domain. Thus, sinusoids that intersect at the same point in the transform space actually represent pixels along the same line [14]. Hence, the number of coinciding sinusoids in one point in the transform space is equal to the amount of pixels along that line in the image space. A threshold upon the amount of pixels leaves us with the parameters of the lines with most pixels. The target track detection flow chart is presented in Fig. 4. First, we use only high-scored pixels that pass the threshold th_score, which blocks scores also past a threshold th_blk_score. The block score (which is a measure used in

no

5: Hysteresis for close pixels

no

6: Pixel on detected line?

Pixel is not in track

yes Pixel is in track

Fig. 4. Flow-chart of the target track detection algorithm.

movie-score calculation [1,2,10,11]) indicates how prominent the highest-scored pixels are in their neighborhood, since a block with many high-scored pixels may contain mostly clutter and not a real target. All pixels that pass these two thresholds are then forwarded to Hough transform stage. The highest Hough transform grades represent the detected groups of pixels, which are suspected to belong to target tracks. Hough transform image is thresholded. Only the highest Hough transform grades are admitted to the following stage in which the groups of pixels are extracted. The fourth step involves matching large enough groups, which exceed the minimal group size th_size, with long line features, meaning a very low aspect ratio [AR in Eq. (6)] for which the maximal AR is th_AR [1], as we assume the target track is similar to a straight line. The group major and secondary axes (length and width of the group) are calculated using principal component analysis (PCA). The aspect ratio is AR 

widthof group : lengthof group

(6)

The fifth step uses hysteresis, which allows high-scored pixels that are close enough to the pixels suspected as targets (with a distance threshold of th_dist_h) to participate in the track.

Research Article Finally, if a pixel belongs to a group that passed the final stage, it is detected as containing a target. 5. NOISE ESTIMATION Since temporal compression reduces the temporal noise of each pixel, temporal profiles are smoothed and their variances are reduced. Hence, the minimum variance averages are reduced as well, which consequently increases the variance ratio estimation scores of nontarget pixels. An example is found in Fig. 5(a), where the compression smooths the original temporal profile (in black) into a straight line, beside two very small peaks (in gray). These peaks are translated into small variances in Fig. 5(b) (in gray). But due to the VERS calculation, this pixel score, which is the ratio between the maximal variance (0.5) and the average of the K lowest variances (0), diverges to infinity [see Eq. (1)]. Therefore, white Gaussian noise (WGN) is added to the decompressed signals [2,16,17]. The WGN models the temporal noise very well for blue-sky scenes, night scenes, or tempo-

Vol. 55, No. 5 / February 10 2016 / Applied Optics

1155

rally stationary clutter [18], though it is not dominant in several profiles of cloud pixels. Nonetheless, the noise level (including intensity fluctuations) is required for STD estimation of the WGN, which compensates for the smoothing that occurs during compression. Thus, the actual shape or intensity of the cloud are of less importance for detection. According to the theoretical noise shape [18] and based on our simulation results (Section 7), this noise contributes to the performance of point target detection after the proposed compression. The noise level is computed using the STD of the high-frequency coefficients of the temporal 1D DCT of the current pixel, starting from the qth coefficient, as in Eqs. (7): vecDCT  DCTtemporal profile; vecDCT0  0; …; 0; vecDCTq:end ; vecIDCT  IDCTvecDCT0 ; STDWGN  STDvecIDCT :

(7)

The final signal after transmission and decompression is final signal  decompressed signal  WGN:

(8)

The STDWGN values are computed on the temporally median filtered (TMF) [1,2] movie, prior to compression. They are then bit-encoded and transmitted with the compressed sequence. The compression and decompression processes of the sequence itself are not affected by this addition, and the compression ratio is only very slightly reduced compared to not adding the noise parameter, since the noise parameter is added to the transmitted signal. In order to keep the high noise level of the cloud pixels, and to blindly enhance the target (for which the location is unknown for a new sequence), we further modified the STD of the WGN calculation. Our goal is to add a low noise level to target pixels, and a high noise level to cloud and noise pixels. The STDWGN , as it appears in Eq. (7), shows that the target’s STD usually lies between the STDs of the noise pixels and the STDs of the cloud pixels, as it appears in Fig. 6. The left-hand side of the histogram in Fig. 6 with the high peak contains the STD of the noise pixels, and the right-hand low side contains the noise pixels’ STD. By choosing a value from the area between these two groups, and subtracting it from the STD,

3.5

x 10

4

std est, j13c

3

number of pixels

2.5

25 36

2 1.5 1 0.5 0 0

Fig. 5. Noise pixel (72,3) from NA23A before and after compression: (a) temporal profile and (b) temporal variance estimation.

50

100

150

200

250

std values

Fig. 6. Histogram of STD of WGN values for J13C.

300

1156

Research Article

Vol. 55, No. 5 / February 10 2016 / Applied Optics

we receive minimal noise addition to pixels close to this area, and higher for distant pixels. To create some kind of symmetry between the noise and cloud areas, we stretched the left-hand side into the negative domain. The minimal STD values were calculated using k-means over the STDs. Since the minimal STD values are estimated using k-means, and not calculated using preliminary knowledge on the actual targets, we compared them to the mean of the real target pixels’ STDs from the sequences we tested. The target pixels’ STD obviously cannot be calculated for new sequences with unknown target locations. The amount of clusters for k-means is two—noise and cloud. The k-means algorithm returns the centroid of both clusters (c 1 and c 2 , respectively), which intravariances (σ 21 and σ 22 , respectively) are minimal, and which intervariance is maximal. The boundary between the two k-mean clusters of noise and cloud is used as the minimal STD value minimal STD value  c 1  σ 1  c 2 − σ 2 ∕2:

(9)

In case of a clear sky, where c 1 and c 2 are close and the target pixels yield higher values of STDWGN than the noise pixels (of clear sky), the minimal STD value is chosen using a high percentile. The minimal STD values using k-means and mean over target pixels are presented in Table 2. As seen for NA23A and J13C, there may be a big difference between the minimal STD values calculated with the k-means clusters and the mean over target STDs. Since in real time we do not know where the target is, we proceed with k-means, which yielded similar detection results to the original sequences’. After calculating the minimal noise value, the entire range of STD values was subtracted by this minimal value, and the absolute value was calculated to ensure that pixels with such minimal value have indeed the lowest noise added, as appears in Eq. (10) STD of WGN stretch   STD of WGN0 ≤ STD of WGN ≤ 1·  maxSTD of WGN − maxSTD of WGN

Fig. 7. Noise estimation of NA23A target block (a) STDWGN as in Eq. (7), and (b) STD of WGN min val as in Eq. (10).

 STD_of _WGN1 < STD_of _WGN STD of WGN min val

6. PARAMETRIC TEMPORAL COMPRESSION

 jSTD of WGN stretch − minimal valuej:

(10)

The remaining values were then scaled to the range 1–10. The effect of the minimal value can be seen in Fig. 7. While Fig. 7(a) presents the STD values of the target block, as in Eq. (7), where the target’s values are the highest (meaning the addition of higher noise to target pixels), Fig. 7(b) presents the STD values of the target block after calculating the STD values as in Eq. (10), where the target’s values are the lowest in the block, resulting in a lower noise addition to target pixels. Table 2. Minimal STD Values Using k-Means and Mean Over Target Pixels Movie Mean target STD k-means

J2A

J13C

NA23A

NPA

8.7333 7.3541

36.1030 25.4772

5.0218 16.9153

6.1957 6.7777

A. Temporal Gaussian Fit

The Gaussian fit is a parametric compression method that fits a Gaussian to a temporal profile. Since the Gaussian shape contains a high and narrow peak, as well as two sides of moderateincline tails, it can represent a target traversing a noise pixel very well. Thus, it overcomes two shortcomings of the parabola fit temporal compression [2]. The first is the reconstruction of the temporal profile, which is conducted by concatenating several functions. The joints between these functions are discontinuous, and may yield high variances, contrarily to the decompressed and otherwise smoothed temporal profile, and hence high pixel scores for nontarget pixels. The second shortcoming of the parabola fit is the large amount of parameters needed for compression. An accurate description of the temporal profile required seven parameters. As mentioned above, the Gaussian shape already contains a high and narrow peak and two sides of moderate-incline tails. Thus, it can represent

Research Article

Vol. 55, No. 5 / February 10 2016 / Applied Optics

1157

a target traversing a noise pixel using only the following four parameters: μ is the Gaussian center; σ is the Gaussian standard deviation (STD); α is the peak’s height; and m is the offset level of the Gaussian from 0. The temporal profile is denoted by y [in Eqs. (11)], and the temporal axis is x. The Gaussian parameters are computed using a second-order polynomial fit over the natural logarithm (ln ) of the temporal profile y, as appears in Eqs. (11): y ln  lny − m; yeln  ax 2  bx  c; pffiffiffiffiffiffiffiffiffiffiffiffi σ  2 −1∕2a; μ  b · σ2 ; α  expfc  μ2 ∕2σ 2 g;   −x − μ2  m: y˜  α · exp 2σ 2

Fig. 9. Gaussian fit of target pixel (186,176) from J2A.

(11)

The Gaussian fit of a target pixel is shown in Fig. 8, where TMF is the temporally median filtered (TMF) profile, and smoothed is the temporal profile after TMF and temporal smoothing using averaging convolution [2], and it is the input to the compression stage. Obviously, cloud and noise pixels can also be modeled as Gaussians with wide peaks, which may be high or low, respectively. The single-Gaussian model is achieved without the need to determine where the highest peak is along the temporal profile, contrary to the parabola fit method. The most challenging pixel type for Gaussian fitting is a target traversing a cloud-edge pixel, as was for the parabola fit, since this model requires a combination of two functions. In this case, we use a narrow Gaussian over a wide Gaussian, which uses eight parameters; four for each Gaussian, as demonstrated in Fig. 9. The calculation of the second Gaussian remains the same, though the offset m_2nd is now the wide Gaussian y,˜ as sin Eq. (12); it does not require a parameter to save or to store m_2nd  y:˜

(12)

The two-Gaussians shape requires two more additional parameters; the suspected target start (start_2nd) and end (end_2nd) frames. The 2nd Gaussian location (the suspected

target frames) is determined by the maximal height to width ratio of possible peaks along the input temporal profile, which height is above a threshold. B. Background Polynomial Compression

Background pixels, such as noise or cloud pixels, can fit better to a straight line or a parabola, respectively, than to a Gaussian, since the temporal profile of a noise pixel is almost constant (disregarding the noise fluctuations) and the temporal profile of a cloud-edge is monotonic over a relatively long time [2,8]. This fitting procedure is not similar to the case of fitting a parabola to a suspected target peak that is narrower than the entire time section, as was presented in [2]. A single parabola or a single straight line is not adequate to represent a target traversing a pixel, since the target peak is narrower than the temporal profile length. Such a temporal profile needs a combination of several polynomials to be represented correctly, as straight lines and a parabola/Gaussian, or a narrow parabola/ Gaussian on top of a wide parabola. Also, for compression purposes, a straight line and a parabola along the entire temporal profile are better than a Gaussian since they contain only two or three parameters instead of four, respectively, and hence reduce the compressed file size. The straight line equation is y  ax  b;

(13)

and the parabola equation is y  ax 2  bx  c:

Fig. 8. Gaussian fit of target pixel (203,260) from J2A.

(14)

The best-fitting model (in MSE terms) among Gaussian, straight line, and parabola is then selected for the pixel temporal compression. One shortcoming of every parametric compression is that there is no accurate fit to all the temporal profile shapes of real-life data, and it does over smooth several cloud pixels’ profiles. This is also the case for the Gaussian fit and the polynomial fit, where we cannot possibly find an accurate equivalent of a Gaussian form or a polynomial of 1st or 2nd order for each temporal profile, when we limit the number of coefficients (according to the maximal allowed number of Gaussians and the polynomial order, respectively). Such temporal profiles

1158

Research Article

Vol. 55, No. 5 / February 10 2016 / Applied Optics

with fewer output parameters to avoid complicated processing or reduction of the compression ratio and to prevent over fitting or false alarms that may occur if very low and very narrow peaks are considered as potential targets. C. Bit Allocation and Encoding

Fig. 10. Second-order polynomial fit of cloud pixel (102,24) from NPA.

are of several cloud edges and irregular noise pixels. Two examples appear in Figs. 10 and 11, where the profiles were fitted into wide 2nd-order polynomials during compression. We do not intend to maintain similarity to the original pixels for such cases of cloud and irregular noise pixels. We would rather obtain their smoothed form, as appears after the compression and decompression or reconstruction stages. Thus, this parametric compression ignores spikes and some of the nontarget peaks much better than DCT quantization [1,2], and therefore reduces such pixel scores. As opposed to this spike suppression, DCT quantization spreads their energy along the profile due to zeroing high- frequency coefficients, so they resemble a target peak and increase clutter [1]. Naturally, we might not represent a target peak over a cloud-edge well if the temporal profile has too many peaks and valleys, or if the target is not prominent enough. Since we deal with slow-moving point targets, their peaks are usually wider than just a few frames, in the range of 5–30 frames, which is also considered “narrow”, as opposed to a cloud-edge peak. Obviously, the peaks should be sufficiently high to be considered as targets. Thus, such peaks are modeled well into the Gaussian form and are not eliminated. For the purpose of compression and target detection, we prefer to keep the compression simple and fast

Further compression of the Gaussian and polynomial fit output parameters is achieved by spatial bit allocation. The amount of bits for each parameter depends on the parameter range of values along the spatial dimensions and the required precision. The bit allocations for the examined sequences are set to 32 bits for σ; 16 bits for μ, α, m, σ_2nd , μ_2nd , α_2nd ; and 8 bits for start_2nd , end_2nd of the second (narrow) Gaussian. We allocated the minimal possible number of bits for each parameter according to its value span. In addition, we kept the number of bits well below the standard representation of a number with double precision, which is 64 bits. Bit encoding is then performed by Huffman coding [19]. 7. COMPRESSION RESULTS A. Noise Estimation

In order to approximate the minimal STD value required for the WGN addition, we use the k-means method (see Section 5). The movie score results, using k-means, are presented in Table 3. In this table, we also present the movie scores while using the mean of STDs of the target pixels as the minimal STD approximation method (see Section 5). The mean over-target pixel calculation is not applicable for real-time sequences, since we have no information on the target of a newly acquired video sequence, and it is only used for comparison. The different calculations for the STD values yield different movie scores, depending on the background properties and on the target speed and intensity. The differences are relatively low, and we do not expect degradation of target detection performance using k-means, compared to the theoretical minimal STD value. B. Compression Performance Evaluation

In this section, we present a performance evaluation comparison between the PTC and the temporal DCT quantization. Performance is measured here using movie scores (Table 4), the probability of detection (PD, Table 5), and the probability of false alarms (PFA, Table 6) for the following processing and compression types: the original sequences, the temporally median filtered (TMF) sequences, the smoothed sequences, the DCT quantization sequences compressed with parameter 0.2 (previously proposed in [1,2]), and the new PTC sequences (PTC is indicated in bold letters in these tables). Both temporal compression methods are further spatially compressed, and the detection results of the overall compression processes appear in these tables as well. The minimal values of the WGN’s STD are Table 3. Movie Scores with Different Minimal STD Value Calculation for STD of WGN

Fig. 11. Second-order polynomial fit of irregular noise pixel (6,1) from NA23A.

Movie Score DCT0.2 + WGN_DCT

J2A

J13C

NA23A

NPA

k-means Mean over target pixels

62.5 66

12.5 16

8 14.5

81 77

Research Article

Vol. 55, No. 5 / February 10 2016 / Applied Optics

Table 4. Movie Scores of Original and Processed Sequences Movie Scores Original TMF Smoothed DCT0.2+WGN_DCT temporal DCT0.2+WGN_DCT temporal+spatial Q  1 DCT0.2+WGN_DCT temporal+spatial Q  3 PTC+WGN_DCT PTC+WGN_DCT+spatial

J2A

J13C

NA23A

NPA

272 277 196 62.5

8.5 9 10.5 12.5

24.5 1 7.5 8

58 49 13.5 80.5

63

12.5

8

78.5

68

13.5

7.5

75

168.5 168.5

6.5 6

16 16.5

43 43.5

Table 5. Probability of Detection of Original and Processed Sequences PD

J2A

J13C

NA23A

NPA

Original TMF Smoothed DCT0.2+WGN_DCT temporal DCT0.2+WGN_DCT temporal+spatial Q  1 DCT0.2+WGN_DCT temporal+spatial Q  3 PTC+WGN_DCT PTC+WGN_DCT+spatial

1 1 0.98 1

0.83 0.83 1 0.83

1 1 1 0.83

1 1 1 1

1

0.83

0.83

1

0.98

0.83

0.83

1

0.95 0.95

1 1

1 0.96

1 1

calculated using k-means, as detailed in subsection 7.A. The spatial compression that follows the DCT quantization method is the near-lossless uniform quantization of the DCT coefficients [1]. The quantization parameter Q is chosen to be 1 and 3 [1], for comparison purposes. The input value X is a quantized temporal DCT coefficient, and the quantized value X q is calculated by dividing the input X by the spatial quantization parameter Q, rounding toward zero, and multiplying back by Q [1]   X · Q: (15) Xq  Q

Table 6. Probability of False Alarms of Original and Processed Sequences PFA ×10−4 

J2A

J13C

NA23A

NPA

Original TMF Smoothed DCT0.2+WGN_DCT temporal DCT0.2+WGN_DCT temporal+spatial Q  1 DCT 0.2+WGN_DCT temporal+spatial Q  3 PTC+WGN_DCT PTC+WGN_DCT+spatial

1.67 1.92 0.90 1.15

1.15 3.71 4.61 1.54

1.28 1.54 1.79 0

0.90 4.61 3.07 0.64

1.41

1.54

0

0.64

1.28

1.67

0

0.38

1.03 1.03

1.41 6.40

0.77 1.41

0.64 2.56

1159

The PTC stage is followed by bit allocation. Both compression methods are bit-encoded using Huffman coding. Their relative file sizes and compression ratios appear in Tables 7 and 8, respectively. The main difference between our previously suggested and our current temporal compression methods is that the DCT quantization is flexible, does not depend on the shape of temporal profiles, and its compression depth can be easily changed by adjusting the quantization parameters, mainly the number of maintained (nonzeroed) coefficients and the spatial quantization parameter, whereas the PTC is less flexible, it is intended to follow the expected shape of possible targets and profiles, and it has a constant number of parameters. Each compression method behaves differently in unique cases. For example, the DCT quantization may smooth a spike into a target-like peak while the PTC ignores such spikes and yields lower false alarm probabilities. A contradicting example is that the PTC may ignore a target peak if it is too low, while the DCT quantization preserves it and maintains a higher detection probability. The movie scores may vary significantly between different processing and compression methods (Table 4) due to temporal profile changes, but it does not necessarily affect the detection ability measured by PD and PFA (Tables 5 and 6, respectively). The movie scores may change according to one pixel, meaning they are very sensitive to high-scored pixels that actually may not affect the detection if they are isolated from other pixels with high scores [as the irregular noise pixel (6,1) of NA23A is, or a cloud-edge pixel (109,10) of the smoothed NPA which degraded the movie score]. In order to automatically locate the pixels that the target traversed, we applied the target track detection algorithm (Section 4), to yield PD (Table 5) and PFA (Table 6). The target track detection parameters are uniform for all sequences.

Table 7. File Size in KB of Original and Spatially Compressed Sequences File Size [KB] Original DCT0.2+WGN_DCT temporal+spatial Q  1 DCT0.2+WGN_DCT temporal+spatial Q  3 PTC+WGN_DCT +spatial

J2A

J13C

NA23A

NPA

14488 739

15250 978

14488 764

14488 720

413

638

438

393

479

577

472

388

Table 8. Compression Ratio of Spatially Compressed Sequences Compression Ratio

J2A

J13C

NA23A

NPA

DCT0.2+WGN_DCT temporal+spatial Q  1 DCT0.2+WGN_DCT temporal+spatial Q  3 PTC+WGN_DCT+spatial

19.6

15.6

19.0

20.1

35.1

23.9

33.1

36.9

30.2

26.4

30.7

37.3

1160

Vol. 55, No. 5 / February 10 2016 / Applied Optics

The PD levels are maintained high for all compressions, mostly above 95%, and very low PFA levels, lower than 6.5 × 10−4 . Although NA23A with DCT quantization achieved a little lower detection rate (82%), its false alarm rate is 0%. If we increase the PFA a little, to level up with the original’s and the PTC’s, we can still yield 100% detection. For example, when using a threshold of 0.9800 instead of 0.9970, we can achieve a PD of 1 for the temporally compressed and for spatial uniform quantization with Q  1, with a PFA of 4.61 × 10−4 , and a PD of 0.9565 for spatial uniform quantization with Q  3, with a PFA of 4.23 × 10−4 . Similar performance is yielded for our synthetic sequences. Their movie scores and PDs before and after compression are presented in Figs. 12 and 13, respectively. The movie scores of the proposed PTC (in gray) are usually similar or even higher

Research Article than the smoothed sequence, which is the input to the compression process, thanks to clutter suppression during compression and WGN addition. For most sequences, PTC achieved higher movie scores than DCT quantization, since DCT quantization further lightly smooths the profiles. Regarding the original movie scores, the PTC movie scores are sometimes lower, though all except one are higher than 20, meaning a very high SNR after compression. Low movie scores obtained by TMF, smoothing, and DCT quantization are usually due to the smoothing effect of these processes on several pixels and did not result in lower detection performance. The detection probability is similar to the original for both preprocessing stages and for both compression methods, except for sequences with a faster target, which peak width of less than 5 frames is out of the scope of this paper, as explained in subsection 2.A. Fig. 13 shows that the PD of both compression methods is similar to that of the smoothed sequence prior to compression. The PFAs retain their low values and are all between 1.5 × 10−5 and 4 × 10−4 . Both compression methods yield high compression ratios of 24–37, which greatly reduces file sizes and hence storage space and transmission times or rate, while maintaining the detection ability of point targets. The effect of PTC with the new added WGN is presented in Fig. 14, for a block that contains the target and two clouds. Fig. 14(a) shows the original pixel scores, where the shapes of the clouds are visible along with the target track, whereas in Fig. 14(b) only the target track is visible for the PTC movie.

Fig. 12. Movie scores of the synthetic sequences after smoothing (black), PTC (gray), and DCT quantization (white).

Fig. 13. PDs of the synthetic sequences after smoothing (black), PTC (gray), and DCT quantization (white).

Fig. 14. NA23A pixel scores of (a) the original movie, and (b) the PTC movie.

Research Article C. Classification using Gaussian Fit

The Gaussian fit parameters σ (Gaussian STD), μ (Gaussian center), and α (Gaussian peak’s height) successfully differentiate the targets from the background, as demonstrated in Figs. 15 and 16. These figures display scatter plots of μ vs. σ and α vs. σ,

Fig. 15. (a) Scatter plot of Gaussian fit parameters σ and μ, and (b) and (c) expanded target areas.

Vol. 55, No. 5 / February 10 2016 / Applied Optics

1161

respectively. A combination of these three parameters can classify well most of the pixels to target and background. The missed target pixels in this classification have unique and untypical temporal-profile patterns (very low or narrow peaks, or extremely wide and cloud-edge-like peaks). The target pixels are denoted by a circle, whereas the background pixels are denoted by small x 0 s. The terms ‘Gauss1’ and ‘Gauss2’ relate to a profile fit of one or two Gaussians, respectively. In the case of two Gaussians, we consider only the narrow one (the second) for classification, since it represents the suspected target profile, whereas the wide one represents the background profile, as explained in subsection 6.A. Figure 15(a) presents the entire space of μ vs. σ, where you can see that the σ and μ parameters of all pixel types lie in relatively narrow bands and limited areas around the axes. In Figs. 15(b) and 15(c) we zoom in on the target pixels’ values, which are enclosed in rectangles. Most target values are clustered in a very small area of low values for σ (1.5–5) and μ (4–96), as we expect from a relatively narrow peak. This area contains a very small number of background pixels, [Fig. 15(b)]. Slightly higher values of σ are expected for wider target pixels

Fig. 16. Scatter plot of Gaussian fit parameters α vs. σ, with expanded target areas as in (a) Fig. 15(b) and (b) Fig. 15(c).

1162

Vol. 55, No. 5 / February 10 2016 / Applied Optics

[Figs. 15(b) and 15(c)], where more background pixel values exist. Considering the third parameter α further improves the separation from the background pixels, as seen in Figs. 16(a) and 16(b), which depict a zoomed view on the area of α vs. σ, where the target values are present. The classification results using only three parameters of the Gaussian fit without reconstructing the temporal profiles are presented in Table 9. The PD was calculated as the ratio of target pixels with parameter values in the target ranges, and the PFA was calculated as the ratio of background pixels with parameters in the target ranges. The PD values of J13C and NA23A are similar to the original movie PD values, with slightly higher PFA values. The PD and PFA of J2A and NPA are both slightly lower than their relative original sequences. Nonetheless, the PDs maintain values above 83%. The difference between the classification detection results and the PTC results stem from the automatic detection of target tracks algorithm, which compensates for lower scored pixels that lie on the suspected target track. Obviously, this parametric classification estimates the spatial location of the targets well, but it cannot be used to restore the entire sequence. These parameters allow us to classify the pixel content without the need to restore the sequences or use VERS and the automatic detection of target tracks algorithm, and it can be applicable for systems that require only detection of the target location, without video sequence information (decoding or decompression). The compression of such a file is 32 bits for σ, and 16 bits for μ and for α. The total amount of bits for each sequence is 244 × 320 × 32  16  16 bits, which equals 625 Kbytes before Huffman encoding. 8. SUMMARY AND FUTURE WORK In this work, we introduced the PTC method, incorporating the Gaussian fit and the 1st and 2nd polynomial fit, which succeeded in retaining the detection ability of point targets in IR imagery sequences while reducing the size of the sequences for transmission and storage. Furthermore, by using the two Gaussian parameters σ and μ and the scaling parameter α we can classify the pixels to target and background with high PD and low PFA values. The PTC results were compared to the DCT quantization method. It yielded different movie scores, but similar detection probabilities. In a spiky-prone background, the PTC performs better than DCT quantization and eliminates such false alarms. It also preserves the shape and height of fast target peaks, whereas DCT quantization widens and lowers the peak. On the other hand, DCT quantization preserves very low target peaks without increasing the false alarm rate, and the reconstructed profile is usually more similar to the input temporal Table 9. and α

PD and PFA Using Gaussian Fit Parameters σ, μ,

Gaussian Fit Classification PD PFA ×10−4 

J2A

J13C

NA23A

NPA

0.8780 0.7688

0.8333 4.2268

1 2.8185

0.9 0.3844

Research Article profile before compression of all pixel types, and not just monotonic cloud pixels, noise pixels, or target pixels. Overall, the DCT is a more flexible and assumptions-free method and it contains only one parameter for temporal compression. Nonetheless, the PTC retains the detectability of point targets close to the original sequence detectability, and it can locate target pixels by classifying only its three parameters σ, μ, and α without calculating the pixel scores and the automatic target track detection algorithm. It may be useful for detection systems that do not require restoring the sequence by decompression. As for the detection of target tracks, we have found that Hough transform can be utilized for locating tracks of point targets in short video sequences, since it is an efficient method to track lines and shapes in an image. We used it to test our PTC and achieved high detection ratios that were similar to the original sequences. Finally, our noise estimation addition enhances the target pixels and suppresses background pixels, since it restores some of the original noise level that is lost during the compression process. The minimal values can be calculated using k-means over the STD of the high-frequency DCT coefficients, which are available to us, and we do not need to predetermine them for each sequence. We are currently creating new sequences based on the tested original sequences with different target speeds. We plan to use these new sequences to examine the relation between the VERS optimal window lengths (LF and variance windows, as described in Section 3) and the target speed and background characteristics. The performance of the variance estimation ratio score algorithm [1,2] depends on the selected temporal window lengths. The LF window for long-term estimation should be short enough in order to track major changes to the mean intensity level caused by a clutter entrance or departure, which cause rise and fall of the mean intensity level, respectively. On the other hand, it should be long enough to perform an accurate estimation, to suppress the noise and not the target, which may happen in cases where the window is shorter than the target’s temporal width (intensity increase and decrease). The length of the variance window for short-term estimation should be approximately matched to a target peak’s width. A very short window may not include enough samples for calculation and may not suppress spikes, but a very wide window would suppress the influence of the target peak [10,11]. The optimal windows set extraction may work well for offline sequences, but it is not applicable for real-time systems. In order to support real-time systems, which assume no prior knowledge of the tested sequence, we plan to test several spatial and temporal properties to evaluate the scene type, which may assist us in determining the optimal window lengths. In particular, the scene type is affected by the amount of clouds, their intensity and density. The intensity of cloud pixels changes faster than that of noise pixels [8,10,11]. Thus, it is recommended to use a shorter LF window length for a cloudy sky than for a clear sky. As for the variance window length, it should be wide enough to include the target peak, but not too wide, which would suppress the peak’s variance. We would like to test

Research Article the connection between both background and target properties and both window lengths, linear fit and variance. Acknowledgment. The authors would like to thank Stanislav Evstigneev and Amit Machanian for their significant contribution to this research. REFERENCES 1. R. Huber-Shalem, O. Hadar, S. R. Rotman, and M. Huber-Lerner, “Compression of infrared imagery sequences containing a slow moving point target, part II,” Appl. Opt. 52, 1646–3813 (2013). 2. R. Huber-Shalem, O. Hadar, S. R. Rotman, and M. Huber-Lerner, “Compression of infrared imagery sequences containing a slow moving point target,” Appl. Opt. 49, 3798–3813 (2010). 3. M. Bar-Tal and S. R. Rotman, “Performance measurement in point source target detection,” Infrared Phys. Technol. 37, 231–238 (1996). 4. I. E. G. Richardson, Video Codec Design (Wiley, 2004). 5. MPEG compression standard, www.mpeg.org. 6. IR imagery sequences source, formerly located at: http://www.sn.afrl. af.mil/pages/SNH/ir_sensor_branch/sequences.html. 7. C. E. Caefer, J. M. Mooney, and J. Silverman, “Point target detection in consecutive frame staring IR imagery with evolving cloud clutter,” Proc. SPIE 2561, 14–24 (1995). 8. C. E. Caefer, J. Silverman, J. M. Mooney, S. DiSalvo, and R. W. Taylor, “Temporal filtering for point target detection in staring IR imagery part I: damped sinusoid filters,” Proc. SPIE 3373, 111–122 (1998).

Vol. 55, No. 5 / February 10 2016 / Applied Optics

1163

9. C. E. Caefer, J. Silverman, and J. M. Mooney, “Optimization of point target tracking filters,” IEEE Trans. Aerosp. Electron. Syst. 36, 15–25 (2000). 10. L. Varsano-Rozenberg, “Point target tracking in hyperspectral images,” M. Sc. Thesis (Ben-Gurion University of the Negev, 2005). 11. L. Varsano, I. Yatskaer, and S. R. Rotman, “Temporal target tracking in hyperspectral images,” Opt. Eng. 45, 126201 (2006). 12. O. Nichtern and S. R. Rotman, “Parameter adjustment for a dynamic programming track-before-detect-based target detection algorithm,” EURASIP J. Appl. Signal Process. 2008, 146925 (2008). 13. B. Aminov, O. Nichtern, and S. R. Rotman, “Spatial and temporal point tracking in real hyperspectral images,” EURASIP J. Appl. Signal Process. 2011, 30 (2011). 14. R. O. Duda and P. E. Hart, “Use of the Hough transformation to detect lines and curves in pictures,” Commun. Assoc. Comput. Mach. 15, 11–15 (1972). 15. J. Illingworth and J. Kittler, “A survey of the Hough transform,” Comput. Vision Graphics Image Process. 44(1), 87–116 (1988). 16. C. E. Caefer, M. S. Stefanou, E. D. Nielsen, A. P. Rizzuto, O. Raviv, and S. R. Rotman, “Analysis of false alarm distributions in the development and evaluation of hyperspectral point target detection algorithms,” Opt. Eng. 46, 076402 (2007). 17. C. E. Caefer, J. Silverman, O. Orthal, D. Antonelli, Y. Sharoni, and S. R. Rotman, “Improved covariance matrices for point target detection in hyperspectral data,” Opt. Eng. 47, 076402 (2008). 18. J. Silverman, C. E. Caefer, and J. M. Mooney, “Temporal filtering for point target detection in staring IR imagery part II: Recursive variance filter,” Proc. SPIE 3373, 44–53 (1998). 19. D. A. Huffman, “A method for the construction of minimumredundancy codes,” Proc. IRE 40, 1098–1101 (1952).

Parametric temporal compression of infrared imagery sequences containing a slow-moving point target.

Infrared (IR) imagery sequences are commonly used for detecting moving targets in the presence of evolving cloud clutter or background noise. This res...
2MB Sizes 2 Downloads 7 Views