1920

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 12, DECEMBER 2013

Highly Accurate Moving Object Detection in Variable Bit Rate Video-Based Traffic Monitoring Systems Shih-Chia Huang and Bo-Hao Chen Abstract— Automated motion detection, which segments moving objects from video streams, is the key technology of intelligent transportation systems for traffic management. Traffic surveillance systems use video communication over real-world networks with limited bandwidth, which frequently suffers because of either network congestion or unstable bandwidth. Evidence supporting these problems abounds in publications about wireless video communication. Thus, to effectively perform the arduous task of motion detection over a network with unstable bandwidth, a process by which bit-rate is allocated to match the available network bandwidth is necessitated. This process is accomplished by the rate control scheme. This paper presents a new motion detection approach that is based on the cerebellarmodel-articulation-controller (CMAC) through artificial neural networks to completely and accurately detect moving objects in both high and low bit-rate video streams. The proposed approach is consisted of a probabilistic background generation (PBG) module and a moving object detection (MOD) module. To ensure that the properties of variable bit-rate video streams are accommodated, the proposed PBG module effectively produces a probabilistic background model through an unsupervised learning process over variable bit-rate video streams. Next, the MOD module, which is based on the CMAC network, completely and accurately detects moving objects in both low and high bitrate video streams by implementing two procedures: 1) a block selection procedure and 2) an object detection procedure. The detection results show that our proposed approach is capable of performing with higher efficacy when compared with the results produced by other state-of-the-art approaches in variable bitrate video streams over real-world limited bandwidth networks. Both qualitative and quantitative evaluations support this claim; for instance, the proposed approach achieves Si mil ar i t y and F1 accuracy rates that are 76.40% and 84.37% higher than those of existing approaches, respectively. Index Terms— Artificial neural network, automated motion detection, intelligent transportation systems, variable bit-rate.

I. I NTRODUCTION

I

NTELLIGENT transportation systems (ITS) are widely employed for traffic management. They facilitate a wide range of applications, including traffic flow measurement systems, regional multimodal traveler information systems, coorManuscript received September 5, 2012; revised April 11, 2013; accepted June 18, 2013. Date of publication July 11, 2013; date of current version November 1, 2013. This work was supported in part by the National Science Council under Grant NSC 100-2628-E-027-012-MY3. (Corresponding author: S.-C. Huang.) The authors are with the Department of Electronic Engineering, National Taipei University of Technology, Taipei 106, Taiwan (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2013.2270314

dinated traffic control systems, freeway and road management systems, electronic tagging systems, and so on [1]–[3]. These enhanced systems are primarily intended to alleviate traffic congestion, advance transportation safety, and improve traffic flow in ITS by the use of advanced technologies such as intelligent computing, network communications, visual-based analysis, efficient sensor electronics, and so on [4], [5]. Currently, one factor of ITS that is critical in supporting these tasks is the ability to extract information about moving objects within scenes captured by traffic surveillance systems. Thus, automated motion detection is an important component of ITS [6]–[11]. It is the first essential process in the development of traffic surveillance systems, and is required to gather detailed information about traffic situations. Automated motion detection is crucial in accomplishing tasks such as vehicle classification, vehicle recognition, vehicle tracking, collision avoidance, and so on [1]–[3]. Furthermore, as wireless technology proliferates, the development of automated motion detection through wireless sensor networks is becoming increasingly important to ensure enhancement of measurement capabilities under all traffic situations in traffic surveillance systems. Numerous approaches were proposed to accomplish automatic motion detection [12]–[22]. According to previous research [12], the major categories of conventional motion detection methods are temporal difference [13], optical flow [14], and background subtraction [15]–[22]. Temporal difference approaches accomplish motion detection by calculating the difference between consecutive frames and attributing that difference to moving objects. The use of temporal difference, however, often results in incomplete detection of the extracted shapes of moving objects, particularly when the objects are motionless or exhibit limited mobility. Optical flow approaches project motion on the image plane with good approximation to achieve detection of moving objects. Unfortunately, these approaches inevitably cause either addition of noise or enormous computational burden. Background subtraction is considered an effective primary treatment for complete and accurate detection of moving objects with only moderate computational complexity. This is accomplished by subtraction of the pixel features of the current image from those of the reference background model of the previous image. Because of its relative success and prominence, several background subtraction-based methods for motion detection are proposed. The first of these methods that we will review is the sigma difference estimation (SDE) method [18].

2162-237X © 2013 IEEE

HUANG AND CHEN: HIGHLY ACCURATE MOVING OBJECT DETECTION

Frame 550 Bit-rate 1000 (a)

Frame 918 Bit-rate 2000000 (b)

200

Pixel Value

The SDE method detects moving objects through an – filter technique that calculates two orders of temporal statistics from each pixel in a pixel-based decision framework. Unfortunately, this method produces insufficient detection results in certain complex environments as a result of the inability of its background model to keep pace with the updating period. Accordingly, the hybrid background model of the multiple SDE (MSDE) method was developed to overcome the problem of insufficient detection by generating a different reference image at each frame through use of the multiple – filter technique [19]. The Gaussian mixtures models (GMM) method, which uses a mixture of Gaussians for a particular distribution, was proposed in [20]. This task is accomplished by extracting and modeling each pixel value independently, whereupon background pixels are determined within the current frame in the distribution. The simple statistical difference (SSD) method segments moving objects through a simple background model by applying the temporal average to the video sequence [21]. The multiple temporal difference (MTD) method detects moving objects by maintaining several previous reference frames instead of a background model [22]. By doing so, it reduces holes inside moving entities. Video communication over real-world networks with limited bandwidth often suffers from either unstable bandwidth or network congestion [23], [24]. This is especially true when communication occurs through a wireless network. Therefore, as network congestion and unstable bandwidth frequently occur, a rate control scheme is developed that uses H.264/AVC [25], [26] as an effective video-coding tool to allocate the available network bandwidth. This technique results in superior transmission in wireless communication systems because of its ability to alter the bit-rate of video streams to match the capacity of the network. Unfortunately, motion detection in variable bit-rate video streams produced by the rate control scheme in response to networks with limited bandwidth is a very difficult task for many previous state-of-the-art background subtraction methods [18]–[22]. These methods often cannot produce proper background models in such fluctuant situations. This paper proposes a novel motion detection approach for detecting moving objects in variable bit-rate video streams over real-world networks with limited bandwidth, which is based on the cerebellar-model-articulation-controller (CMAC) network [27]–[30]. Compared with previous state-of-the-art background subtraction methods, the proposed approach is capable of attaining the most complete and accurate motion detection in both low and high bit-rate video streams. It attains this through conjunctive use of the proposed probabilistic background generation (PBG) module and the proposed moving object detection (MOD) module. The key features of our proposed method are as follows. 1) The first proposed module involves construction of the probabilistic background model that is capable of adapting to the properties of variable bit-rate video streams. 2) Subsequently, the second module detects moving objects completely and accurately from both high and low bitrate video streams over real-world limited bandwidth networks through the CMAC network in the second

1921

150

100

Low-Quality Background Signal

50

High-Quality Background Signal

Component Y 0 500

570

640

710

780

850

920

990

Frame Index (c)

Fig. 1. Example of the variations in intensity value of a pixel from low to high bit-rate in video streams (a) and (b) 550th and 918th frames of sequence CAM in regard to a sample pixel marked as a red star point, respectively, and (c) intensity variations of the sample pixel in regard to its luminance (Y) component.

module. To accomplish this, each incoming pixel is mapped to the weight memory elements in the weight memory space of CMAC network. Experimental results produced through quantitative and qualitative evaluations prove that our proposed approach attains the most satisfactory detection results in variable bitrate video streams over real-world networks with limited bandwidth in comparison with those of other state-of-the-art approaches. The rest of this paper is divided into Sections II–IV. In Section II, our proposed motion detection method is introduced. The experimental results of our proposed method are equitably contrasted with those of other methods in Section III. Finally, conclusion is summarized in Section IV. II. P ROPOSED CMAC-BASED M OTION D ETECTION A PPROACH In general, the use of previous background subtraction methods [18]–[22] in variable bit-rate video streams cannot provide satisfactory detection results. For example, an illustration of the intensity variations and luminance component of a background pixel from low to high bit-rate over a short period at the region of the image that contains trees is shown in Fig. 1(a)–(c). As can be observed in Fig. 1(a) and (c), the background models are generated using previous background subtraction methods in low bit-rate video streams. This results in the correct association of the signals of the trees as background during motion detection within a congested network. When sufficient bandwidth exists in wireless video transmission over real-world limited bandwidth networks, the

1922

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 12, DECEMBER 2013

Frame 55 Bit-rate 2000000 (a)

Frame 209 Bit-rate 1000 (b)

250

Pixel Value

200 Moving Object 150

Fig. 3.

Overview of the CMAC network architecture.

100

50

High-Quality Background Signal

Low-Quality Signal

Component Y

0 0

30

60

90

120

150

180

210

240

270

300

Frame Index (c)

Fig. 2. Example of the variations in intensity value of a pixel from high bit-rate to low bit-rate in video streams (a) and (b) 55th and 209th frames of sequence CAM in regard to a sample pixel marked as a red star point, respectively, and (c) intensity variations of the sample pixel in regard to its luminance (Y) component.

rate control scheme increases the allocated bit-rate to produce high bit-rate video streams. However, this may prove troublesome, as shown in Fig. 1(b) and (c). When detecting moving objects in high bit-rate video streams, the previous background subtraction methods easily misjudge the highquality background signals as moving objects. In contrast to Fig. 1, an illustration of the intensity variations and luminance component of a background pixel from high to low bit-rate over a short period at a region of the image that contains trees is shown in Fig. 2(a)–(c). As can be observed in Fig. 2(a) and (c), the fluctuant signals of the trees in high bitrate video streams are correctly regarded as background by the background models generated through previous background subtraction methods when sufficient network bandwidth is present. Unfortunately, wireless video transmission over realworld limited bandwidth networks frequently suffers from either network congestion or unstable bandwidth. When this happens, the rate control scheme is used to match the resultant low bit-rate video streams. When the bit-rate is lowered to match the available bandwidth, the signals of moving vehicles could be misinterpreted as background signals by the use of previous background subtraction methods, as shown in Fig. 2(b) and (c). In this section, we propose a novel motion detection approach that is based on the CMAC network to completely and accurately detect moving objects in variable bit-rate video streams over real-world networks with limited bandwidth. According to [27]–[30], the basic CMAC network shown in Fig. 3 consists of an input space, an association memory space, a weight memory space, and an output space. Each input state S in the input space is mapped onto a set C in

the association memory space. In addition, the association memory space can associate the relationship between the input space and the weight memory space. The associated weight memory element W in the weight memory space is yielded in accordance with each input state; that is, the CMAC network sums up the mapped weight memory elements as its output F in the output space. In this paper, the proposed CMAC-based motion detection (CMACMD) approach is composed of two major modules: 1) a PBG module and 2) a MOD module. As can be observed in Fig. 4, the proposed PBG module produces the probabilistic background model to effectively accommodate the properties of variable bit-rate video streams. This is accomplished using the probability mass function (pmf) evaluation from the time series of each incoming pixel at every frame, whereupon it then relays this information to the CMAC network as weight memory elements in weight memory space for creation of the CMAC network structure. After the structure of the CMAC network is constructed in the proposed PBG module, complete and accurate detection of moving objects is accomplished within variable bit-rate video streams by employing the proposed MOD module through our proposed block selection procedure and object detection procedure. The MOD module is based on the CMAC network and detects moving objects by summing up the mapped weight memory elements of the weight memory space as the output value of the CMAC network. A. Probabilistic Background Generation To support a wide range of video applications, the proposed CMACMD is built in YCb Cr color space [31]. In our approach, the luminance component is used only as the input value for motion detection in the input space of the CMAC network. Thus, the component of a pixel pt (x, y) can be represented as (l) through luminance to determine the intensity of each pixel for each incoming video frame It . The properties of variable bit-rate video streams must be accommodated during implementation of the PBG module. This is achieved by the generation of the probabilistic background model through the pmf evaluation from the time series of each incoming frame It , which constructs the weight

HUANG AND CHEN: HIGHLY ACCURATE MOVING OBJECT DETECTION

1923

Fig. 4. Flowchart of the proposed CMAC-based motion detection approach (a) framework of the proposed CMAC-based motion detection approach that consists of one input space, one association memory space, one output space, and a weight memory space of M + 1 weights, (b) proposed PBG module that can generate the probabilistic background model to adapt to the properties of variable bit-rate video streams, and (c) proposed MOD module that can completely and accurately detect moving objects in variable bit-rate video streams.

memory space of the CMAC network. This pmf evaluation can be determined as follows: np (1) pmf( pt ) = t T where pt is the pixel at position (x, y), n pt is the number of each luminance from the time series of each incoming frame It at position (x, y), and T is the number of frames from the time series at position (x, y). Subsequently, the probabilistic background model B(x, y) can be produced, whereupon this information is delivered to each weight memory element W (x, y)0 –W (x, y) M as the weight values for impulse in the weight memory space of the CMAC network. The representative function is as follows:   B(x, y)n = pmf(n)|n ∈ N0 ∧ n ≤ M (2) where pmf(n) is the pmf evaluation at the position (x, y) and N0 is the natural number. Usually, a threshold value M is set equal to the maximum intensity value of a pixel. Moreover, this construction of the probabilistic background model can be regarded as an unsupervised learning process in the CMAC network. B. Moving Object Detection 1) Block Selection Procedure: The structure of the considered CMAC network consists of one input space, one

association memory space, one output space, and a weight memory space of M + 1 weights. The proposed PBG module determines the weight values W (x, y)0 , . . . , W (x, y) M of the weight memory space in the CMAC network, as shown in Fig. 4. After the structure of the CMAC network is constructed in the proposed PBG module, each incoming pixel pt (x, y) is delivered to the input space of the CMAC network in the YCb Cr color space through the proposed MOD module. In the CMAC network [27]–[30], each weight memory element is associated through a physical memory address in the association memory space after the incoming pixel pt (x, y) is added to the input space. Therefore, information about the state of the incoming pixel is associated with k weight memory elements through the physical memory address CY ( pt ) in the association memory space. This can be expressed as follows: CY ( pt ) = [CY ( pt )0 , . . . , CY ( pt ) j +r , . . . , CY ( pt ) M ].

(3)

When j = pt , the physical memory   addresses are labeled with 1 in the range of |r | ≤ k2 , which shows that the corresponding weight memory elements are associated through these physical memory addresses. In addition, the other physical memory addresses are labeled with 0, meaning that they are not activated to associate with the corresponding weight memory elements. As can be observed in Fig. 5, CY (1) is the physical memory address at position (x, y), and associates with three weight memory elements, which shows that j = 1

1924

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 12, DECEMBER 2013

Therefore, the output space of the CMAC network is used to compute the binary motion detection mask as the detection result. The output value of the CMAC network in the output space can be expressed as follows: F=

M 

C Y ( pt ) j W j

(6)

j =0

where CY is the physical memory address in the association memory space and W is the weight values in the weight memory space. The binary motion detection mask Y can be attained as follows:  1, if F (x, y) <  Y (x, y) = (7) 0, otherwise

Fig. 5. Example of the division of the physical memory address in the CMAC network; the set CY (1) of the association memory space are outlined by a solid red line, which is shown that the input state Y1 is mapped onto the set CY (1) in the association memory space, and the values G, H , and I of mapped weight memory elements W0 to W2 by the set CY (1) are then summed up as the output value of the CMAC network.

and k = 3. For purposes here, the physical memory address can be expressed as [1 1 1 0 0 0 0 0]. Thus, the incoming pixel pt (x, y) is associated with the weight memory elements W (x, y)0 to W (x, y)2 . Moreover, to effectively select the blocks that are regarded as having a high probability of containing pixels belonging to moving objects, each incoming frame is divided into N × N size blocks, whereupon we compare the statistical values among the blocks. The calculation of the sum of stored contents in the addressed weight memory elements within each block is as follows: δ=

M 

C Y ( pt ) j W j

(4)

pt ∈μ j =0

where pt represents each independent pixel of the corresponding block μ, M is the maximum pixel intensity value, and the block size N can be set to 16, empirically. After calculating the sum of each block, block selection can be measured as follows:  1, if δ <  A= (5) 0, otherwise where  is an empirical threshold value. The block A is labeled with 1 when the calculated sum is smaller than the threshold , which shows that the block contains a greater profusion of pixels belonging to moving objects. Otherwise, the block A is labeled with 0, which means that it contains background pixels. 2) Object Detection Procedure: Upon the elimination of unnecessary blocks through the proposed block selection procedure, complete and accurate motion detection can be accomplished efficiently using only those blocks that are determined to contain motion pixels. This is achieved through the proposed object detection procedure and can be performed in both low and high bit-rate video streams.

where  is a threshold value that can be set by empiricism. When the output value of the CMAC network exceeds the threshold , the binary motion detection mask is labeled as 0 to represent a background pixel, or labeled as 1 to represent a motion pixel. III. E XPERIMENTAL R ESULTS This section summarizes our experimental results for motion detection. Six state-of-the-art methods including the SDE method [18], the MSDE method [19], the GMM method [20], the SSD method [21], the MTD method [22], and our proposed CMACMD method are compared equitably through the use of quantitative and qualitative evaluations. This comprehensive comparison is conducted by synthesis of video sequences of variable bit-rate by the use of H.264/AVC [25], [26] reference software of joint model supplied in the Joint Video Team committee (ISO/IEC MPEG and ITU-T VCEG) [32]. Table I provides a description of five video sequences including HW, CP, CAM, RD, and ST that are used for testing each method in this paper. These video sequences are captured through a traffic surveillance camera and feature a variety of bit rates that can be divided into three ranks: low, middle, and high bit-rate. Moreover, the numbers of frames adopted in each video sequence is shown in Table II. The first video sequence, HW, presents a scene in which vehicles travel along a highway. The second video sequence, CP, is captured an area where a few vehicles and pedestrians pass through a campus parking lot. In the third video sequence, CAM, many vehicles and pedestrians pass along a road in front of a forest. The fourth video sequence, RD, consists of a scene in which there are many vehicles operating independently on a roadway. The fifth video sequence, ST, features many buses and a few pedestrians moving in traffic. Finally, to prove the computational feasibility for real-time applications, we measure the processing speed of the proposed CMACMD method and other state-of-the-art methods. All the parameters in each method are set to the optimum values. According to [18, Sec. II], the predefined parameter N of the SDE method can be set to four. [19, Sec. IV] of the MSDE method showed that the number of reference images K can be set to three. Moreover, the predefined parameters α1, α2, and α3 are set to 1, 8, and 16, respectively. For the maintenance of the background model by the use of GMM,

HUANG AND CHEN: HIGHLY ACCURATE MOVING OBJECT DETECTION

TABLE I C OMPARISON OF THE VARYING B IT R ATES OF E ACH V IDEO S EQUENCE

1925

TABLE IV AVERAGE A CCURACY VALUES OF THE C OMPARED M ETHODS ATTAINED BY Similarity, F1 , Recall, AND Precision IN A L OW B IT-R ATE V IDEO S EQUENCE

A. Quantitative Evaluation

TABLE II N UMBERS OF F RAMES IN E ACH V IDEO S EQUENCE

TABLE III S PECIFIC PARAMETER VALUES OF CMACMD

the number of Gaussians components K is fixed at three with the learning rate ρ equal to 0.005 in [20, eqs. (1) and (3)]. According to [21, Sec. II], the predefined parameter λ is experimentally set to three, which can yield much better results when implementing the SSD method for motion detection. In regard to the MTD method, the number of previous reference frames can be set to seven as stated in [22, Sec. II]. Moreover, the binary mask of moving objects is calculated by a fixed threshold value, which can be set to 250 for classifying a current pixel value. Specific values of the parameters of the CMACMD method are shown in Table III.

We quantitatively compared the detection accuracy of our CMACMD method with previous state-of-the-art methods in different bit-rate video sequences using four metrics: Recall, Pr eci si on, F1 , and Si milari t y [8]–[11]. The Recall metric detects the percentage of true positives through the total amount of true positives in the ground truth. Moreover, the Pr eci si on metric supplies the percentage of detected true positives in relation to the total amount of positive pixels within the detected binary motion mask. When used alone, Recall and Pr eci si on, however, yield limited information about the accuracy of detection results. This is due to their tendency to selectively measure the incorrect association of lost true positive pixels and external true positive pixels, respectively. Thus, using only Recall and Pr eci si on cannot achieve an objective comparison between the results produced by each method. Accordingly, two additional accuracy metrics—F1 and Si milari t y—are employed to ensure a significant measurement of accuracy. Tables IV–VI offer a comparison between the generated average accuracy rates produced through Si milari t y, F1 , Recall, and Pr eci si on for the SDE, MSDE, GMM, SSD, MTD, and CMACMD methods. Values attained by the aforementioned metrics range between 0 and 1, with higher values showing superior detection accuracy. Table IV lists the accuracy rates produced for the various methods in low bit-rate video sequences. The resulting Si milari t y and F1 rates generated by the use of the CMACMD method are higher than those produced through the other methods in all instances. Specifically, they are up to 51.85% and 48.33% higher than those produced through the

1926

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 12, DECEMBER 2013

TABLE V AVERAGE A CCURACY VALUES OF THE C OMPARED M ETHODS

TABLE VI AVERAGE A CCURACY VALUES OF THE C OMPARED M ETHODS

ATTAINED BY Similarity, F1 , Recall, AND Precision IN A

ATTAINED BY Similarity, F1 , Recall, AND Precision IN A

M IDDLE B IT-R ATE V IDEO S EQUENCE

H IGH B IT-R ATE V IDEO S EQUENCE

SDE method, respectively; they are up to 44.74% and 40.62% higher than those produced through the MSDE method, respectively; they are up to 57.98% and 54.45% higher than those produced through the GMM method, respectively; they are up to 62.80% and 61.23% higher than those produced through the SSD method, respectively; and they are up to 54.32% and 49.18% higher than those produced through the MTD method, respectively. Table V lists the accuracy rates produced for the various methods in middle bit-rate video sequences. As with the low bit-rate results in Table IV, the resulting Si milari t y and F1 rates generated by the use of the CMACMD method in middle bit-rate video streams are higher than those produced through the other methods in all instances. Specifically, they are up to 57.62% and 57.34% higher than those produced through the SDE method, respectively; they are up to 41.07% and 36.79% higher than those produced through the MSDE method, respectively; they are up to 59.87% and 58.57% higher than those produced through the GMM method, respectively; they are up to 63.01% and 64.04% higher than those produced through the SSD method, respectively; and they are up to 42.79% and 36.62% higher than those produced through the MTD method, respectively. Table VI lists the accuracy rates produced for the various methods in high bit-rate video sequences. As with the respective low and middle bit-rate results in Tables IV and V, the resulting Si milari t y and F1 rates generated by the use of the CMACMD method in high bit-rate video streams are higher than those produced through the other methods in all instances. Specifically, they are up to 55.07% and 50.14% higher than those produced through the SDE method, respectively; they are

up to 53.98% and 48.66% higher than those produced through the MSDE method, respectively; they are up to 69.60% and 70.82% higher than those produced through the GMM method, respectively; they are up to 62.90% and 61.66% higher than those produced through the SSD method, respectively; and they are up to 43.24% and 40.67% higher than those produced through the MTD method, respectively. Therefore, the results of the comparison of average accuracy rates show that the motion detection performance of our CMACMD method is significantly superior to those of previous state-of-the-art methods in video streams of all tested bit rates. B. Qualitative Evaluation In addition to the quantitative evaluation presented above, the results produced by the different methods in video streams of varying bit rates are also evaluated qualitatively through visual inspection. Through this subjective examination, the effects generated by the detected binary objects masks of each of the previous methods are assessed and compared with those produced through use of the CMACMD method. In real-world networks with limited bandwidth, the use of previous state-of-the-art motion detection methods often proves problematic, resulting in incomplete or inaccurate detection of moving objects. Not only do our quantitative analyses show that our CMACMD method offers significant improvement in regard to motion detection performance, but also our qualitative evaluation proves that its detection results are visually superior to those of the other compared methods in both low and high bit-rate video streams.

HUANG AND CHEN: HIGHLY ACCURATE MOVING OBJECT DETECTION

Fig. 6. Detection results of sequence HW for video streams that range from low to high bit-rate.

Three representative frames of the HW video sequence, their ground truths, and the binary masks of moving objects detected through use of each of the SDE, MSDE, GMM, SSD, MTD, and CMACMD methods are shown in Fig. 6. This sequence serves to illustrate the differences between the results of each method when the available bandwidth increases and the rate control scheme subsequently increases the bit-rate to match available network bandwidth. Frame 405 exhibits a low bitrate environment. The background models of the SDE, MSDE,

1927

GMM, SSD, and MTD methods generated in this low bitrate video stream all correctly regard the stable background signals as background. Frames 405–1202 represent a situation in which available bandwidth increases, thereby stimulating the rate control scheme to increase the bit-rate of the video stream from low to high. The detection results reflected in frame 445, however, show that the SDE, MSDE, and GMM methods produce insufficient results while generating serious noise. This is because the information of the background models generated using the – filter techniques of the SDE and MSDE methods and the mixture of Gaussians techniques of the GMM method in low bit-rate environment is insufficient for moving object detection in a high bit-rate environment. It results in the generation of serious noise when these methods suffer from high bit-rate video streams. Compared with the SDE, MSDE, and GMM methods, the SSD and MTD methods can produce passable detection results, as shown in frame 445. Nevertheless, the detection results of the SSD and MTD methods show that generation of either noise or ghost trails can occur as the bit-rate changes. With regards to the SSD method, this is due to the use of the temporal average as the main criteria. The MTD method maintains several reference frames in low bit-rate environments, and subsequently generates inaccurate background models for detecting moving objects in high bit-rate video streams. When the bit-rate of video streams is increased continuously, these issues become more serious, as can be seen in frame 1202. The SDE, MSDE, GMM, SSD, and MTD methods all misjudge the high-quality fluctuant background signals as moving objects, thus generating significant artifacts in their detection results. In contrast, the CMACMD method produces detection results without generating additional noise. The reason for this is that the corresponding weight memory elements in the proposed weight memory space are effectively associated through the physical memory address to offer complete information for detection of moving objects in both low and high bit-rate video streams. Three representative frames of the CP video sequence, their ground truths, and the binary masks of moving objects detected through use of the SDE, MSDE, GMM, SSD, MTD, and CMACMD methods are shown in Fig. 7. This sequence serves to illustrate the differences between the results of each method when the available bandwidth decreases and the rate control scheme subsequently reduces the bit-rate to match available network bandwidth. Frame 164 illustrates a high bit-rate situation. In this case, the background models of the SDE, MSDE, GMM, SSD, and MTD methods all adopt these fluctuant signals as background. The SDE and MSDE methods, however, cannot cope with the sufficient information of high bit-rate video streams through the background models that are generated by the – filter. Thus, we can readily observe that both the SDE and MSDE methods create serious noise in high bit-rate video streams, as illustrated in frame 164. In frames 164–394, the available network bandwidth decreases because of network congestion. The rate control scheme compensates by subsequently reducing the bit-rate of the video stream, resulting in a low bit-rate environment. The detection results reflected in frames 164–384, however, show that the SDE, MSDE, GMM, SSD, and MTD

1928

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 12, DECEMBER 2013

Fig. 7. Detection results of sequence CP for video streams that range from high to low bit-rate.

methods all experience difficulty when detecting moving objects. They readily misjudge the signals of moving vehicles as background, especially in regard to the GMM methods. The GMM method sufficiently collects fluctuant signals in high bit-rate video streams to establish background models through the mixture of Gaussians. Subsequently, the mitigating signals of moving objects in low bit-rate video streams can easily be neglected in the inaccurate background models of the

Fig. 8. Detection results of sequence CAM for variable bit-rate video streams.

GMM method. Moreover, the SSD and MTD methods model deficient background images in low bit-rate video streams. With regards to the SSD method, this is due to an inability to maintain a complete background model using the temporal average as the main criteria. For the MTD method, the information present in low bit-rate environments is insufficient for retaining several reference frames. As the bit-rate decreases continuously in frame 394, the SDE, MSDE, GMM, SSD, and MTD methods all almost completely fail to generate

HUANG AND CHEN: HIGHLY ACCURATE MOVING OBJECT DETECTION

1929

Fig. 9. Detection results of sequence RD for variable bit-rate video streams.

Fig. 10. Detection results of sequence ST for variable bit-rate video streams.

any detection results. In contrast, the CMACMD method takes advantage of successfully associating the corresponding weight memory elements in the proposed weight memory space to against the signal variation, thereby achieving detection results that are superior to those of the other compared methods in both high and low bit-rate video streams. Three representative frames of the CAM, RD, and ST video sequences, their ground truths, and the binary masks of moving objects detected through use of the SDE, MSDE, GMM, SSD,

MTD, and CMACMD methods are shown in Figs. 8–10. These sequences serve to illustrate the differences between the results of each method when the available bandwidth fluctuates and the rate control scheme subsequently changes the bitrate to match available network bandwidth. As the video stream frequently changes bit-rate during the entire sequence in Figs. 8–10, it is difficult to produce applicable background models for detecting moving objects through the SDE, MSDE, GMM, SSD, and MTD methods. The incomplete detection

1930

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 12, DECEMBER 2013

TABLE VII P ROCESSING S PEED ( IN FPS) OF THE C OMPARED M ETHODS

results of the SSD, MTD, and GMM methods are presented in frame 709 of Fig. 8; the incomplete detection results of the SDE, MSDE, GMM, SSD, and MTD methods are presented in frame 240 of Fig. 9; the incomplete detection results of the SDE, MSDE, GMM, SSD, and MTD methods are presented in frames 68 and 282 of Fig. 10. The detection results of the SDE, MSDE, and GMM methods all feature serious noise in frames 1025–1389 of Fig. 8, frames 57 and 332 of Fig. 9, and frame 156 of Fig. 10. The detection results of the SSD and MTD methods feature serious noise in frame 1025 of Fig. 8, frame 57 of Fig. 9, and frame 156 of Fig. 10. The results produced through use of the CMACMD method clearly demonstrate that it can offer the most complete and accurate detection of moving objects in both low and high bitrate video streams when compared with other state-of-the-art approaches. This is due to the ability of the proposed method to effectively accommodate bit-rate variation, as evidenced by the test frames of the CAM, RD, and ST sequences. C. Performance Results To verify the computational feasibility for real-time applications, we report the processing speeds of the compared methods for several CIF test video sequences in Table VII. These methods are implemented using C programming language on an Intel Core i5 3.10-GHz processor and 8 GB of RAM, running a Windows 7 operating system. The performance results show that the CMACMD method can achieve speeds higher than 43 fps for each test sequence, which is sufficient for real-time applications. IV. C ONCLUSION This paper proposed a novel approach based on the CMAC network for motion detection of realistic traffic scenes in variable bit-rate video streams over real-world networks with limited bandwidth. Two unique modules made up the structure of the proposed method: 1) a PBG module and 2) a MOD module. The proposed PBG module produced the probabilistic background model in variable bit-rate video streams. This was accomplished by our CMACMD approach through a pmf evaluation from the time series of each incoming pixel at every frame to facilitate accommodation of the properties of variable bit-rate video streams. In addition, the structure of the CMAC network was constructed in this module. After construction of the CMAC network was achieved, the MOD module used a block selection procedure to determine which

blocks had greater profusions of pixels belonging to moving objects, after which an object detection procedure can completely and accurately detect moving objects within the selected blocks. This technique lowered the time complexity of the procedure and can be effectively accomplished in both low and high bit-rate video streams. The results of simulation experiments in variable bit-rate video streams showed that the proposed CMACMD approach achieved the most satisfactory detection results. Qualitative and quantitative evaluations of the results produced by the compared methods showed that the proposed approach performed at a higher degree of visual precision, while its accuracy rates surpassed those of the other methods, in both low and high bit-rate video streams. To the best of our knowledge, we were the first study group to successfully present an approach for motion detection through traffic surveillance systems over variable bit-rate video streams in real-world networks with limited bandwidth. R EFERENCES [1] A. F. Atiya, M. A. Aly, and A. G. Parlos, “Sparse basis selection: New results and application to adaptive prediction of video source traffic,” IEEE Trans. Neural Netw., vol. 16, no. 5, pp. 1136–1146, Sep. 2005. [2] X. Jin, D. Srinivasan, and R. L. Cheu, “Classification of freeway traffic patterns for incident detection using constructive probabilistic neural networks,” IEEE Trans. Neural Netw., vol. 12, no. 5, pp. 1173–1187, Sep. 2001. [3] Y. Wu and T. S. Huang, “Nonstationary color tracking for vision-based human-computer interaction,” IEEE Trans. Neural Netw., vol. 13, no. 4, pp. 948–960, Jul. 2002. [4] N. Buch, S. A. Velastin, and J. Orwell, “A review of computer vision techniques for the analysis of urban traffic,” IEEE Trans. Intell. Trans. Syst., vol. 12, no. 3, pp. 920–939, Sep. 2011. [5] C.-Y. Fang, S.-W. Chen, and C.-S. Fuh, “Automatic change detection of driving environments in a vision-based driver assistance system,” IEEE Trans. Neural Netw., vol. 14, no. 3, pp. 646–657, May 2003. [6] P. Wang, C. Shen, N. Barnes, and H. Zheng, “Fast and robust object detection using asymmetric totally corrective boosting,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 1, pp. 33–46, Jan. 2012. [7] X. Tang, X. B. Gao, I. Liu, and H. Zhang, “A spatial-temporal approach for video caption detection and recognition,” IEEE Trans. Neural Netw., vol. 13, no. 4, pp. 961–971, Jul. 2002. [8] F. C. Cheng, S. C. Huang, and S. J. Ruan, “Scene analysis for object detection in advanced surveillance systems using laplacian distribution model,” IEEE Trans. Syst., Man, Cybern. Part C, Appl. Rev., vol. 41, no. 5, pp. 589–598, Sep. 2011. [9] S. C. Huang and F. C. Cheng, “Motion detection with pyramid structure of background model for intelligent surveillance systems,” ELSEVIER Eng. Appl. Artif. Intell., vol. 25, no. 7, pp. 1338–1348, Oct. 2012. [10] L. Maddalena and A. Petrosino, “Stopped object detection by learning foreground model in videos,” IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 5, pp. 723–735, May 2013. [11] F. C. Cheng, S. C. Huang, and S. J. Ruan, “Illumination-sensitive background modelling approach for accurate moving object detection,” IEEE Trans. Broadcast., vol. 57, no. 4, pp. 794–801, Dec. 2011. [12] W. Hu, T. Tan, L. Wang, and S. Maybank, “A survey on visual surveillance of object motion and behaviors,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 34, no. 3, pp. 334–352, Aug. 2004. [13] A. J. Lipton, H. Fujiyoshi, and R. S. Patil, “Moving target classification and tracking from real-time video,” in Proc. IEEE Workshop Appl. Comput. Vis., Oct. 1998, pp. 8–14. [14] D. Gibson and M. Spann, “Robust optical flow estimation based on a sparse motion trajectory set,” IEEE Trans. Image Process., vol. 12, no. 4, pp. 431–445, Apr. 2003. [15] S. C. Huang, “An advanced motion detection algorithm with video quality analysis for video surveillance systems,” IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 1, pp. 1–14, Jan. 2011. [16] F. C. Cheng, S. C. Huang, and S. J. Ruan, “Advanced background subtraction approach using Laplacian distribution model,” in Proc. IEEE ICME, Jul. 2010, pp. 754–759.

HUANG AND CHEN: HIGHLY ACCURATE MOVING OBJECT DETECTION

[17] F. C. Cheng, S. C. Huang, and S. J. Ruan, “Advanced motion detection for intelligent video surveillance systems,” in Proc. ACM SAC, 2010, pp. 22–26. [18] A. Manzanera and J. C. Richefeu, “A robust and computationally efficient motion detection algorithm based on - background estimation,” in Proc. 4th ICVGIP, 2004, pp. 46–51. [19] A. Manzanera and J. C. Richefeu, “A new motion detection algorithm based on - background estimation,” Pattern Recognit. Lett., vol. 28, pp. 320–328, Feb. 2007. [20] D. Zhou and H. Zhang, “Modified GMM background modeling and optical flow for detection of moving objects,” Proc. Int. Conf. Syst., Man, Cybern., vol. 3, pp. 2224–2229, Oct. 2005. [21] M. Oral and U. Deniz, “Centre of mass model-A novel approach to background modelling for segmentation of moving objects,” Image Vis. Comput., vol. 25, pp. 1365–1376, Aug. 2007. [22] J. E. Ha and W. H. Lee, “Foreground objects detection using multiple difference images,” Opt. Eng., vol. 49, no. 4, pp. 047201–1-047201-5, Apr. 2010. [23] S. Sarkar and L. Tassiulas, “Back pressure based multicast scheduling for fair bandwidth allocation,” IEEE Trans. Neural Netw., vol. 16, no. 5, pp. 1279–1290, Sep. 2005. [24] C. W. Seo, J. K. Han, and T.Q. Nguyen, “Rate control scheme for consistent video quality in scalable video codec,” IEEE Trans. Image Process., vol. 20, no. 8, pp. 2166–2176, Aug. 2011. [25] Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification, document ITU-T Rec. H.264/ISO/IEC 1449610.doc, JVT, 2003. [26] T. Wiegand, G. J. Sullivan, G. Bjntegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560–576, Jul. 2003. [27] F. H. Glanz, W. T. Miller, and L. G. Kraft, “An overview of the CMAC neural network,” in Proc. IEEE Neural Netw. Ocean Eng., Aug. 1991, pp. 301–308. [28] S. F. Su, T. Tao, and T. H. Hung, “Credit assigned CMAC and its application to online learning robust controllers,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 33, no. 2, pp. 202–213, Apr. 2003. [29] H. M. Lee, C. M. Chen, and Y. F. Lu, “A self-organizing HCMAC neuralnetwork classifier,” IEEE Trans. Neural Netw., vol. 14, no. 1, pp. 15–27, Jan. 2003. [30] F. J. Gonzalez-Serrano, A. R. Figueiras-Vidal, and A. Artes-Rodriguez, “Generalizing CMAC architecture and training,” IEEE Trans. Neural Netw., vol. 9, no. 6, pp. 1509–1514, Nov. 1998.

1931

[31] Studio Encoding Parameters of Digital Television for Standard 4:3 and Wide-Screen 16:9 Aspect Ratios, document ITU-R BT.601-5.doc, ITU, 1995. [32] (2009). H.264/AVC Reference Software JM [Online]. Available: http://bs.hhi.de/∼suehring/tml/

Shih-Chia Huang received the Doctorate degree in electrical engineering from National Taiwan University, Taipei, Taiwan, in 2009. He is currently an Associate Professor with the Department of Electronic Engineering, National Taipei University of Technology, Taipei. He has published more than 25 journal and conference papers and holds more than 30 patents in the U.S. and Taiwan. His current research interests include image and video coding, wireless video transmission, video surveillance, error resilience and concealment techniques, digital signal processing, cloud computing, mobile applications and systems, embedded processor design, and embedded software and hardware codesign. Dr. Huang received the Kwoh-Ting Li Young Researcher Award in 2011 from the Taipei Chapter of the Association for Computing Machinery.

Bo-Hao Chen received the B.S. degree from the Department of Electronic Engineering, National Taipei University of Technology, Taipei, Taiwan, in 2011. He is currently pursuing the Ph.D. degree with the Graduate Institute of Computer and Communication Engineering, National Taipei University of Technology. His current research interests include digital image processing, video coding, in particular, moving object detection, contrast enhancement, depth generation, and haze removal.

Highly accurate moving object detection in variable bit rate video-based traffic monitoring systems.

Automated motion detection, which segments moving objects from video streams, is the key technology of intelligent transportation systems for traffic ...
3MB Sizes 0 Downloads 3 Views