sensors Article

Robust Behavior Recognition in Intelligent Surveillance Environments Ganbayar Batchuluun, Yeong Gon Kim, Jong Hyun Kim, Hyung Gil Hong and Kang Ryoung Park * Division of Electronics and Electrical Engineering, Dongguk University, 30 Pildong-ro 1-gil, Jung-gu, Seoul 100-715, Korea; [email protected] (G.B.); [email protected] (Y.G.K.); [email protected] (J.H.K.); [email protected] (H.G.H.) * Correspondence: [email protected]; Tel.: +82-10-3111-7022; Fax: +82-2-2277-8735 Academic Editor: Vittorio M. N. Passaro Received: 16 May 2016; Accepted: 25 June 2016; Published: 30 June 2016

Abstract: Intelligent surveillance systems have been studied by many researchers. These systems should be operated in both daytime and nighttime, but objects are invisible in images captured by visible light camera during the night. Therefore, near infrared (NIR) cameras, thermal cameras (based on medium-wavelength infrared (MWIR), and long-wavelength infrared (LWIR) light) have been considered for usage during the nighttime as an alternative. Due to the usage during both daytime and nighttime, and the limitation of requiring an additional NIR illuminator (which should illuminate a wide area over a great distance) for NIR cameras during the nighttime, a dual system of visible light and thermal cameras is used in our research, and we propose a new behavior recognition in intelligent surveillance environments. Twelve datasets were compiled by collecting data in various environments, and they were used to obtain experimental results. The recognition accuracy of our method was found to be 97.6%, thereby confirming the ability of our method to outperform previous methods. Keywords: intelligent surveillance system; visible light camera; thermal camera; behavior recognition

1. Introduction Accurate recognition of behavior is very important and a challenging research topic in intelligent surveillance systems. Although this system should be also operated during the nighttime, objects cannot be visualized in images acquired by a conventional visible light camera. Therefore, a near infrared (NIR) camera or thermal camera (based on medium-wavelength infrared (MWIR) or long-wavelength infrared (LWIR) light) has been considered for usage during the nighttime as an alternative. The NIR camera usually requires an additional NIR illuminator, which should illuminate a wide area over a great distance during nighttime. However, a thermal camera usually does not need an additional illuminator. In addition, an intelligent surveillance system should be operated during both the daytime and nighttime. Considering all these factors, a dual system of visible light and thermal cameras is used in our research, and we propose a new behavior recognition in intelligent surveillance environments. A thermal camera image clearly reveals the human body at night and in winter by measuring the human body temperature, which is detectable in the range of MWIR of 3–8 µm and LWIR of 8–15 µm [1–12]. Based on these characteristics, Ghiass et al. [12] introduced various methods of infrared face recognition, also. However, the distinction between the human and the background in the thermal image diminishes when the background temperature is similar to that of the human in some situations during the daytime, which can reduce the consequent recognition accuracy of behavior.

Sensors 2016, 16, 1010; doi:10.3390/s16071010

www.mdpi.com/journal/sensors

Sensors 2016, 16, 1010

2 of 23

Therefore, the use of both thermal and visible light images enables us to enhance the recognition accuracy of behavior in intelligent surveillance system. There have been previous studies where the fusion of visible light and thermal cameras is used for various applications [13–16]. Davis et al. [13] proposed the fusion method for human detection from visible light and thermal cameras based on background modeling by statistical methods. Arandjelovi´c et al. [14] proposed the face recognition method by combining the similarity scores from visible light and thermal camera-based recognition. Leykin et al. [15] proposed the method for tracking pedestrians based on the combined information from visible light and thermal cameras. In addition, Kong et al. [16] proposed the face recognition method robust to illumination variation based on the multiscale fusion of visible light and thermal camera images. Although there have been previous studies combining visible light and thermal cameras in various applications, we focus on the behavior recognition in this research. Previous research on behavior recognition can be categorized into two groups: single camera-based and multiple camera-based approaches. Rahman et al. [17–20] proposed the single camera-based approaches which used negative space analysis based on a background region inside the detected box of the human body. However, these approaches deliver limited performance in cases in which there is a large difference between the detected box and the box that was defined based on the actual human area. Fating and Ghotkar [21] used the histogram of the chain code descriptor to analyze the boundary of the detected area. Although the histogram represents the efficient features, it is computationally expensive to obtain the chain codes from all the shapes of the detected area. A Fourier descriptor was used to extract the scale and rotation invariant features for the detected foreground area [22–25]. This method has the advantage of correctly representing the foreground shape, but it has difficulty in discriminating behavior with a similar foreground shape, such as walking and kicking. Sun et al. [26] proposed the method using a local descriptor based on scale invariant feature transform (SIFT) and holistic features by Zernike moments for human action recognition. However, their method has the disadvantage of requiring much processing time to extract the features by using both the SIFT and Zernike moments. Schüldt and Laptev et al. [27–29] performed behavior recognition based on the local spatiotemporal features, although a clear background is required to guarantee highly accurate results. In addition, corner detection-based feature extraction can cause the false detections of corners, which can degrade the accuracy of behavior recognition. Wang et al. and Arandjelovi´c [30,31] proposed the concept of motionlets, a motion saliency method, which achieves high performance in terms of human motion recognition provided the foreground is clearly segmented from the background. Optical flow-based methods were used to represent motion clearly using a feature histogram. However, the process of obtaining optical flow is computationally expensive, and it is also sensitive to noise and illumination changes [27,28,32]. Various groups [32–37] have performed behavior recognition based on a gait flow image (GFI), gait energy image (GEI), gait history image (GHI), accumulated motion image (AMI), and motion history image (MHI). These studies focused on obtaining a rough foreground region in which motion occurs, but the attempt to determine the exact path of motion of hands and legs was less satisfactory. Wong et al. [38] proposed a method of faint detection based on conducting a statistical search of the human’s height and width using the images obtained by a thermal camera. Youssef [39] proposed a method based on the convexity defect feature point for human action recognition. This method finds the tip points of hand, leg, and head. However, incorrect points can be detected as a result of an increase in the number of points for large-sized humans surrounded by noise. In addition, this method cannot discriminate the tip point of a hand from that of a leg in cases in which the leg is positioned at a height similar to that of the hand when kicking motion happens. Besides, it is computationally expensive because of the need to calculate the contour, polygon, convex hull, and convexity defect in every frame. All this research based on images acquired with a single visible camera has the limitation of performance enhancement when a suitable foreground is difficult to be obtained from the image in situations containing extensive shadows, illumination variations, and darkness. In addition, previous

Sensors 2016, 16, 1010

3 of 23

research based on thermal cameras was performed in constrained environments such as indoors or without considering situations in which the background temperature is similar to the foreground in hot weather. Therefore, multiple camera-based approaches were subsequently considered as an alternative. Zhang et al. [40] proposed a method of ethnicity classification based on gait using the fusion of information of GEI with multi-linear principal component analysis (MPCA) obtained from multiple visible light cameras. Rusu et al. [41] proposed a method for human action recognition by using five visible light cameras in combination with a thermal camera, whereas Kim et al. [42] used a visible light camera with a thermal camera. However, all these studies were conducted in indoor environments without considering temperature and light variations associated with outdoor environments. We aim to overcome these problems experienced by previous researchers, by proposing a new robust system for recognizing behavior in outdoor environments including various temperature and illumination variations. Compared to previous studies, our research is novel in the following three ways: ‚





Contrary to previous research where the camera is installed at the height of a human, we research behavior recognition in environments in which surveillance cameras are typically used, i.e., where the camera is installed at a height much higher than that of a human. In this case, behavior recognition usually becomes more difficult because the camera looks down on humans. We overcome the disadvantages of previous solutions of behavior recognition based on GEI and the region-based method by proposing the projection-based distance (PbD) method based on a binarized human blob. Based on the PbD method, we detect the tip positions of the hand and leg of the human blob. Then, various types of behavior are recognized based on the tracking information of these tip positions and our decision rules.

The remainder of our paper is structured as follows. In Section 2, we describe the proposed system and the method used for behavior recognition. The experimental results with analyses are presented in Section 3. Finally, the paper is concluded in Section 4. 2. Proposed System and Method for Behavior Recognition 2.1. System Setup and Human Detection In this section, we present a brief introduction of our human detection system because the behavior recognition is carried out based on the detection results. Figure 1 shows the examples of our system setup and human detection method. Detailed explanations of the system and our human detection method can be found in our previous paper [43]. We have implemented dual camera systems (including an FLIR thermal camera capturing the images of 640 ˆ 480 pixels [44] and a visible light camera capturing the images of 800 ˆ 600 pixels), where the two axes of the visible light and thermal cameras are oriented parallel to the horizontal direction as shown in the upper left image of Figure 1. Therefore, the visible light and thermal images are simultaneously captured with minimum disparity. We prevented rain from entering the thermal and visible light cameras by attaching a glass cover (germanium glass, which is transparent to MWIR and LWIR light [45] and conventional transparent glass for the thermal and visible light cameras, respectively) to the front of each camera, as shown in the upper left image of Figure 1. Our dual camera system is set at a height of 5–10 m above the ground, as shown in Figure 1. In addition, we tested it at five different heights (see details in Section 3.1) to measure the performance of our system in various environments.

Sensors 2016, 16, 1010 Sensors 2016, 16, 1010

4 of 23 4 of 23

Figure 1. Examples of system setup with brief explanations of our human detection method.

Figure 1. Examples of system setup with brief explanations of our human detection method.

The flow diagram on the right-hand side of Figure 1 summarizes our human detection algorithm. Throughonbackground subtraction after preprocessing to our remove noise, two human The flow diagram the right-hand side of Figure 1 summarizes human detection algorithm. regions are detected from the visible light and thermal images. Then, the two regions are combined Through background subtraction after preprocessing to remove noise, two human regions are detected on a light geometric transformimages. matrix, Then, whichthe is obtained by the both from based the visible and thermal two regions arecalibration combinedprocedure based on of a geometric cameras in advance. Finally, the correct human area is detected as shown on the right of Figure 1 transform matrix, which is obtained by the calibration procedure of both cameras in advance. Finally, after post-processing for noise reduction, morphological operation, and size filtering [43,46]. Our the correct human area is detected as shown on the right of Figure 1 after post-processing for noise detection method can be applied to images including multiple humans. The use of dual camera reduction, morphological operation, and size filtering detection method can be applied systems enables our method to robustly detect human [43,46]. areas in Our the images of various environments to images including multiple The use ofvariations, dual camera systems method such as those with severe humans. shadow, illumination darkness, and enables cases in our which the to robustly detect human areas is insimilar the images of various background temperature to human area in environments hot weather. such as those with severe shadow, Most previous darkness, research onand behavior recognition performedtemperature with imagesiscaptured byhuman a illumination variations, cases in which thewas background similar to camera at low height. That is, the camera was set at a height of about 2 m above the ground and can area in hot weather. capture frontal-viewing body recognition parts as shown Figure 2a. with This images arrangement allowed Most previous research human on behavior wasinperformed captured by athe camera motions of human hands and feet to be observed more distinctively. However, our camera is set at a at low height. That is, the camera was set at a height of about 2 m above the ground and can capture height of 5–10 m, which is the normal height of the camera of a surveillance system. This causes the frontal-viewing human body parts as shown in Figure 2a. This arrangement allowed the motions human area in the image to shrink in the vertical direction as shown in Figure 2b. In addition, our of human hands and feet to be observed more distinctively. However, our camera is set at a height camera arrangement could also reduce the motions of human hands and feet in the vertical of 5–10 m, which is thethe normal height camera of ahuman surveillance system. causes the human direction, change ratio of heightoftothe width of the box, and reduceThis the vertical image area in the image to shrink in the vertical direction as shown in Figure 2b. In addition, our camera resolution of the human box in the image, all of which can complicate the correct recognition of arrangement also reduce the motions and system feet in the vertical direction, behavior.could However, our research needs of to human considerhands that our is intended for use inchange a conventional therefore, the camera shown image in Figure 2b is used in our the ratio of heightsurveillance to width ofsystem; the human box, and reduce setup the vertical resolution of the human andall we points ofthe thecorrect hand and leg to track them for However, accurate behavior box inresearch, the image, ofextract which the can tip complicate recognition of behavior. our research recognition rather than using the entire region covering the detected human body. needs to consider that our system is intended for use in a conventional surveillance system; therefore, the camera setup shown in Figure 2b is used in our research, and we extract the tip points of the hand and leg to track them for accurate behavior recognition rather than using the entire region covering the detected human body.

Sensors 2016, 16, 1010

5 of 23

Sensors 2016, 16, 1010

5 of 23

Sensors 2016, 16, 1010

5 of 23

(a)

(b)

Figure 2. Comparison of different camera setup used in (a) previous research; (b) our research.

Figure 2. Comparison of different camera setup used in (a) previous research; (b) our research.

2.2. Proposed Method of Behavior Recognition

(a) 2.2. Proposed Method of Behavior Recognition

(b)

2.2.1. Overall of Behavior Recognition FigureProcedure 2. Comparison of different camera setup used in (a) previous research; (b) our research.

2.2.1. Overall Procedure of Behavior Recognition

The method proposed for behavior recognition is presented in Figure 3. Detailed explanations 2.2. Proposed Method of Behavior Recognition The method proposed behavior recognition presented Figure 3.method Detailed explanations of Rules 1–4 are providedfor in Section 2.2.2. We adoptedisan IF-THEN in rule-based to recognize behavior. Asprovided shown in in Figure 3, behaviors roughly intorule-based two types method accordingtotorecognize the of Rules 1–4 are Section 2.2.2. Weare adopted andivided IF-THEN 2.2.1. Overall Procedure of Behavior Recognition horizontal activeness measured by the change in the width of detected human box. Our research behavior. As shown in Figure 3, behaviors are roughly divided into two types according to the The method the proposed for recognition is presented in Figure 3. Detailed explanations aims to recognize following 11the types of behavior: horizontal activeness measured bybehavior change in the width of detected human box. Our research aims of Rules 1–4 are provided in Section 2.2.2. We adopted an IF-THEN rule-based method to recognize to recognize the following 11 types of behavior: Waving with two hands behavior. As shown in Figure 3, behaviors are roughly divided into two types according to the Waving with one hand horizontal activeness measured by the change in the width of detected human box. Our research - Waving with two hands Punching aims to recognize the following 11 types of behavior: Kicking - Waving with one hand Waving with two hands -Lying down - Punching Waving with one handvertically, or diagonally) -Walking (horizontally, - Kicking Punching(horizontally, vertically, or diagonally) -Running - Lying down Kicking -Standing - Walking (horizontally, vertically, or diagonally) Lying down -Sitting Walking (horizontally, vertically, diagonally) -Leaving - Running (horizontally, vertically, or or diagonally) Running (horizontally, vertically, or diagonally) Approaching - Standing Standing - Sitting Sitting - Leaving Leaving - Approaching Approaching

Figure 3. Flowchart of the proposed method of behavior recognition.

Figure 3. Flowchartofofthe theproposed proposed method recognition. Figure 3. Flowchart methodofofbehavior behavior recognition.

Among the 11 types of behavior, leaving and approaching are recognized simply based on the change in the distance between the two center positions of the detected boxes of two people. If the

Sensors Sensors 2016, 16,2016, 101016, 1010

6 of 23

6 of 23

Among the 11 types of behavior, leaving and approaching are recognized simply based on the change in the distance between the two center positions of the detected boxes people. If themethod distance is reduced, our system determines it as approaching, whereas if of it two increases, our distance is reduced, our system determines it as approaching, whereas if it increases, our method determines it as leaving. determines it as leaving. Behavior such such as hand waving, punching, kicking has very complicated Behavior as hand waving, punching, and and kicking has very complicated motionmotion patterns.patterns. For example, waving can either be be full waving halfwaving. waving. assume the directions of 3, 6, 9, For example, waving can either full waving or or half WeWe assume the directions of 3, 6, 9, ˝ , and ˝ , respectively. and 12 o’clock 0°,˝270°, and90 90°, respectively. and 12 o’clock to be to 0˝be , 270 , 180180°,

-

-

Full waving, where the arm starts from about 270° and moves up to 90° in a circle as shown in

Full waving, the arm starts from about 270˝ and moves up to 90˝ in a circle as shown Figure where 4a,c. in Figure 4a,c. Half waving, where the arm starts from about 0° and moves up to 90° in a circle as shown in Figure 4where b,d. Half waving, the arm starts from about 0˝ and moves up to 90˝ in a circle as shown Low punching, in which the hand moves directly toward the lower positions of the target’s in Figure 4b,d. chest shown in Figure 4e. Low punching, in which the hand moves directly toward the lower positions of the target’s chest Middle punching, where the hand moves directly toward the target’s chest as shown in shown in Figure Figure 4f. 4e. Middle punching, where thethe hand directlytoward toward target’s chest as shown in Figure 4f. High punching, where handmoves moves directly thethe upper positions of the target’s chest as shown in Figure 4g. High punching, where the hand moves directly toward the upper positions of the target’s chest as Similarly, in low, middle, and high kicking, the leg moves to different heights as shown in shown in Figure 4g. Figure 4h–j, respectively. Similarly, in low, middle, and high kicking, the leg moves to different heights as shown in In addition, according to martial arts, kicking and punching can be shown in many different ways. Figure 4h–j, respectively.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

Figure 4. Comparison of motion patterns of each type of behavior. Waving: (a) full, by using two

Figure 4. Comparison of motion patterns of each type of behavior. Waving: (a) full, by using two hands; (b) half, by using two hands; (c) full, by using one hand; (d) half, by using one hand. hands; (b) half, by two hands; (c) full, by using one(i)hand; (d)(j)half, Punching: (e)using low; (f) middle; (g) high. Kicking: (h) low; middle; high.by using one hand. Punching: (e) low; (f) middle; (g) high. Kicking: (h) low; (i) middle; (j) high. In our research, the 11 types of behavior are categorized into three classes, with each class representing behavior with different intention and meaning. This requires us to extract the different In addition, according to martial arts, kicking and punching can be shown in many different ways. features according to each class.

In our research, the 11 types of behavior are categorized into three classes, with each class Class 1: Walking, running, sitting, and standing are regular types of behavior with less detailed representing behavior with different intention and meaning. This requires us to extract the different gestures of the hands and feet than the behavior in Class 2. Walking and running behavior involves featuresactive according each class. motion,towhereas sitting and standing behavior is motionless. In addition, the rough shape of

the human body when standing is different from that when sitting. Therefore, the motion and rough Class 1: Walking, running, sitting, and standing are regular types of behavior with less detailed shape of the human body are measured using the speed and ratio of height to width of the detected gestures of the hands and feet than the behavior in Class 2. Walking and running behavior involves human box, respectively. That is, the speed and ratio are key features of the spatial and temporal active motion, whereas is motionless. addition, the rough of the information of the sitting detectedand boxstanding that enablebehavior recognition of the regular In types of behavior of Classshape 1. human bodyClass when standing is different from that whenwith sitting. the motion 2: Kicking, punching, lying down, waving two Therefore, hands, and waving withand one rough hand shape are types of behavior consisting of special gestures forratio handsoforheight feet, motion, andof shape except for human of the human body are measured using the speed and to width the detected box, respectively. That is, the speed and ratio are key features of the spatial and temporal information of the detected box that enable recognition of the regular types of behavior of Class 1. Class 2: Kicking, punching, lying down, waving with two hands, and waving with one hand are types of behavior consisting of special gestures for hands or feet, motion, and shape except for lying down. In addition, they are characterized by very active motions of the hands or feet. Some of the behavior produces similar shapes in an image, such as waving with one hand and punching. Therefore, we need to be able to track feature points efficiently, which we achieve by using the proposed PbD

2016, 16, 1010 SensorsSensors 2016, 16, 1010

7 of 237 of 23

lying down. In addition, they are characterized by very active motions of the hands or feet. Some of the to behavior similar shapes in an image, such points. as waving punching. method find theproduces tip points of legs and hands as feature We with trackone thehand path and of these tip points Therefore, we need to be able to track feature points efficiently, which we achieve by to recognize the behavior in this class. However, lying down can be recognized in theusing samethe way as proposed PbD method to find the tip points of legs and hands as feature points. We track the path of the behavior in Class 1, because it is motionless and the ratio of height to width of the human box is these tip points to recognize the behavior in this class. However, lying down can be recognized in different from that of other behaviors. the same way as the behavior in Class 1, because it is motionless and the ratio of height to width of Class 3: Approaching andfrom leaving areother interactional the human box is different that of behaviors.types of behavior, both of which are recognized by measuring change in theand distance between the boxes types of twoofpersons. Class the 3: Approaching leaving are interactional behavior, both of which are recognized by measuring the change in the distance between the boxes of two persons.

2.2.2. Recognition of Behavior in Classes 1 and 3

2.2.2. Recognition of Behavior in Classes 1 and 3

In this section, we explain the proposed method in terms of the recognition of behavior in Classes 1 In this in section, we1 explain proposed by method in terms the recognition of behavior in box and 3. Behavior Classes and 3 isthe recognized analyzing andof comparing the detected human and 3. Behavior Classes 1nine andboxes 3 is recognized analyzing and comparing the detected in theClasses current1 frame with the in previous from the by previous nine frames as shown in Figure 5. human boxininEquation the current withof the previous ninewidth boxes of from the previous nine framesfrom as 10 As shown (1),frame the sum change in the detected box is calculated shown in Figure 5. frames (current and previous nine frames): As shown in Equation (1), the sum of change in the width of detected box is calculated from 10 frames (current and previous nine frames): N ´1

Wν “

ÿ

|WB pt ´ pi ` 1qq ´ WB pt ´ iq|

=i“0 |

( − ( + 1)) −

( − )|

(1) (1)

where N is 9 and WB ptq is the width of the detected human box at the t-th frame as shown in Figure 5. ( ) is the width of the detected human box at the t-th frame as shown in Figure 5. where N is 9 and BasedBased on Won to the threshold (Twidth ), the first classification for behavior recognition is v compared compared to the threshold (T width), the first classification for behavior recognition is conducted as shown in Figure 3. conducted as shown in Figure 3.

Figure 5. Example showing the parameters used to analyze the detected human box by comparing

Figure 5. Example showing the parameters used to analyze the detected human box by comparing the the box in the current frame with those in the previous nine frames. box in the current frame with those in the previous nine frames.

If

is less than the threshold (Twidth), the sum of change in the center of the detected box in the

vertical calculated from(T 10width frames (current and previous frames).ofBased on the sum If Wν is direction less thanisthe threshold ), the sum of change in nine the center the detected box in compared to the threshold (Tp_y), from walking running behaviors in the vertical orframes). diagonal Based direction the vertical direction is calculated 10 and frames (current and previous nine on the are distinguished those of(Tstanding, sitting, and lying down as shown in the left-hand part of sum compared to thefrom threshold p_y ), walking and running behaviors in the vertical or diagonal Figure 3. direction are distinguished from those of standing, sitting, and lying down as shown in the left-hand The distance between the center positions (G(x, y, t) of Figure 5) of two detected boxes is part of Figure 3. measured by comparing two successive frames among the 10 frames (current and previous nine The distance the center positionsvelocity (G(x, y,oft) the of Figure of twocan detected boxes isas measured frames). Then,between the consequent movement center 5) position be calculated the by comparing two successive frames among the 10 frames (current and previous nine frames). movement speed of the detected box because the time difference between two successive framesThen, can the consequent movement the center be (considering calculated asthe themovement movementdirection) speed of the be obtained by ourvelocity system. of Based on theposition average can speed detected box because time(current difference framesand canrunning be obtained by our system. calculated from 10the frames andbetween previoustwo ninesuccessive frames), walking are recognized. thethe speed exceeds the(considering specified threshold, our system determines the behavior running, BasedIfon average speed the movement direction) calculated from 10 as frames (current otherwise the behavior is determined as walking, as shown in Figure 3. and previous nine frames), walking and running are recognized. If the speed exceeds the specified general, thedetermines distance between two centeraspositions inotherwise the image in thebehavior horizontal is as threshold,Inour system the behavior running, the is direction determined different from that in the vertical or diagonal directions even with the same distance in 3D space. walking, as shown in Figure 3. That is because the camera setup is tilted as shown in Figure 2b. Therefore, a different threshold In general, the distance between two center positions in the image in the horizontal direction is different from that in the vertical or diagonal directions even with the same distance in 3D space. That is because the camera setup is tilted as shown in Figure 2b. Therefore, a different threshold (Ts1_y ) is used for recognizing walking and running in the vertical or diagonal directions compared to the thresholds (Ts1_x and Ts2_x ) for the horizontal direction, as shown in Figure 3.

Sensors 2016, 16, 1010

8 of 23

Sensors 2016, 16, 8 of 23 on The ratio of1010 the height to width of the detected box is calculated by Equation (2), and based the average R from 10 frames (current and previous nine frames), standing, sitting, or lying down (Ts1_y) is used for recognizing walking and running in the vertical or diagonal directions compared to are recognized. As shown in Figure 3, if the average R is larger than the threshold (Tr2 ), our system the thresholds (Ts1_x and Ts2_x) for the horizontal direction, as shown in Figure 3. determines the behavior as standing. If the average R is less than threshold (Tr2 ), but larger than The ratio of the height to width of the detected box is calculated by Equation (2), and based on threshold (Tr3 ), the behavior is recognized as sitting. If the average R is less than both threshold (Tr2 ) the average R from 10 frames (current and previous nine frames), standing, sitting, or lying down and are threshold (Tr3 ),As our system determines the behavior lyingthan down. recognized. shown in Figure 3, if the average R isas larger the threshold (Tr2), our system

determines the behavior as standing. If the average R is less than threshold (Tr2), but larger than H ptq “ B If the average R is less than both threshold (Tr2) (2) threshold (Tr3), the behavior is recognized asRsitting. WB ptq and threshold (Tr3), our system determines the behavior as lying down. ( ) respectively, of the detected human box at where WB ptq and HB ptq represent the width and height, (2) = ( ) t-th frame as shown in Figure 5. Approaching and leaving behavior recognized by measuring theofchange in the distance between ( ) and ( ) represent where theiswidth and height, respectively, the detected human box at the center positions of the boxes5.of two persons, as shown in Equation (3): t-th frame as shown in Figure Approaching and leaving behavior is recognized by measuring the change in the distance Nÿ ´1 between the center positions of the boxes of two persons, as shown in Equation (3):

Dν “

=

pb pt ´ pi ` 1qq ´ b pt ´ iqq

i “0

− ( + 1) − ( − )

(3)

(3)

where N is 9 and b(t) is the Euclidean distance between the two center positions of the boxes of two persons at N theist-th frame. DνEuclidean is larger than the between threshold, behavior is determined as approaching, where 9 and b(t) isIfthe distance thethe two center positions of the boxes of two whereas the behavior determined as if Dν the is smaller thanthe thebehavior threshold. persons at the t-this frame. If is leaving larger than threshold, is determined as approaching, the in behavior is determined as leaving is smaller than the threshold. The optimalwhereas thresholds the flowchart of Figure 3 wereifexperimentally determined to minimize The optimal thresholds in the flowchart of Figure 3 were experimentally determined the error rate of behavior recognition. To ensure fairness, the images used for the determination to of the minimize theexcluded error ratefrom of behavior recognition. ensure fairness, the images for the of thresholds were the experiments that To were conducted to measure theused performance determination of the thresholds were excluded from the experiments that were conducted to our system. measure the performance of our system.

2.2.3. Recognition of Behavior in Class 2

2.2.3. Recognition of Behavior in Class 2

We developed a new method, named the PbD method, to extract the features of behavior in We developed a new method, named the PbD method, to extract the features of behavior in Class 2. Firstly, XG is calculated as the geometric center position of the binarized image of the human Class 2. Firstly, XG is calculated as the geometric center position of the binarized image of the human area as shown in Figure 6a. Then, the distances (dr (or dl )) between the X positions of XG and the area as shown in Figure 6a. Then, the distances (dr (or dl)) between the X positions of XG and the right-most (or left-most) white pixelpixel of the area area are calculated at each Y position of theofdetected right-most (or left-most) white of human the human are calculated at each Y position the human box. The processing time was reduced by calculating the distances at each Y position for every detected human box. The processing time was reduced by calculating the distances at each Y threeposition pixels. for These distances enableThese us to obtain theenable two profile Dright ) graphs including every three pixels. distances us tographs obtain (D thele f two t andprofile the leg, body, head the of human area as shown in of Figure 6b,c. theinprofile ( and arm,) and including leg, body, arm, and head human areaBecause as shown Figure graph 6b,c. is Because the on profile graph is obtained based in onthe thehorizontal projection direction, of the distance in thethis horizontal obtained based the projection of the distance we named method the PbDdirection, method. we named this method the PbD method. Dle f t “ td = l1 , dl2 , . . .…dln u ,, Dright=“ tdr1 , , dr2…. . . drn u (4) (4) where n is nthe number of of distance. where is the number distance.

(a)

(b)

(c)

Figure 6. Example of profile graphs produced by the proposed PbD method. (a) Binarized image of

Figure 6. Example of profile graphs produced by the proposed PbD method. (a) Binarized image of human area in the detected box; (b) profile graph representing the right part of the human area; human area in the detected box; (b) profile graph representing the right part of the human area; (c) two (c) two profile graphs representing the right and left parts of the human area, respectively. profile graphs representing the right and left parts of the human area, respectively.

Sensors 2016, 16, 1010

9 of 23

Sensors 2016, 16, 1010

9 of 23

The profile graph is used to determine the position whose distance (Y-axis value) is maximized as The profile graph is used to determine the position whose distance (Y-axis value) is maximized the tip position (M in (M Figure 7) of 7) theofhand or leg. TheThe tiptip point isisdetermined belongtoto the as the tip position in Figure the hand or leg. point determined to to either either belong Sensors 2016, 16, 1010 9 of 23 handthe or the the L1 the andL1Land fromfrom the the firstfirst andand lastlast points ofofthe 2 lines handleg or by theobtaining leg by obtaining L2 lines points theprofile profilegraph graph with pointwith M aspoint shown Figureis7. M asinshown Figure The profile graph in used to7.determine the position whose distance (Y-axis value) is maximized as the tip position (M in Figure 7) of the hand or leg. The tip point is determined to either belong to the hand or the leg by obtaining the L1 and L2 lines from the first and last points of the profile graph with point M as shown in Figure 7.

Figure 7. Example illustrating the extraction of features from the profile graph obtained by the

Figure 7. Example illustrating the extraction of features from the profile graph obtained by the proposed proposed PbD method. PbD method.

Then, the two points on the graph (whose distances from the L1 and L2 lines are maximized, Figure 7. Example illustrating the extraction of features from the profile graph obtained by the

respectively) arepoints detected starting and ending points of the the human respectively, as Then, the two on as thethe graph (whose distances from L1 and arm, L2 lines are maximized, proposed PbD method. shown in h 1 and h 2 of Figure 7. In case the distance from the L 2 line is smaller than the threshold, the respectively) are detected as the starting and ending points of the human arm, respectively, as shown pointthe of two the7.profile graph isgraph determined h2L . Inline addition, if the distance from L1 the line ending is in h1 ending and hThen, In case from as the is smaller threshold, points onthe thedistance (whose distances from the L1 than and Lthe 2 lines arethe maximized, 2 of Figure 2 smaller than the threshold, the starting point of the profile graph is determined as h 1. detected as the starting points of distance the human arm, as point respectively) of the profileare graph is determined as h2and . In ending addition, if the from therespectively, L1 line is smaller Based on the starting point, h1, the anddistance the M from points, the two distances and can be shown in h 1 and h 2 of Figure 7. In case the L 2 line is smaller than the threshold, the than the threshold, the starting point of the profile graph is determined as h1 . calculated. Because human armisisdetermined shorter than distance between starting position of theis endingon point thethe profile graph asthe h2. the In addition, if thethe distance thebe L1calculated. line Based the of starting point, h1 , and the M points, two distances λ1 and from λ2 can arm (h 1 in Figure 7) and the foot (the starting position of the profile graph in Figure 7), inevitably smaller than the threshold, the starting point of the profile graph is determined as h1. Because the human arm is shorter thanMthe distance between the starting position of the arm (h1 in becomes smaller thanstarting when point is the tip M position of the the hand. On the other and hand, because Based on the point, h1, and the points, two distances can be Figure 7)human and the foot (the starting position ofbetween the profile starting graph inposition Figureof 7),the λ1 leg inevitably becomes the leg is longer than the and foot, calculated. Because the human armdistance is shorter than thethe distance between the starting position of the smaller than λ when point M is the tip position of the hand. On the other hand, because the human inevitably becomes larger than when point M is the tip position of the leg (foot) as shown in arm (h1 in2Figure 7) and the foot (the starting position of the profile graph in Figure 7), inevitably Figure 8b. Based on these observations, we can discriminate between the position of the hand tip and leg is longer the distance between the hand. leg and λ1 inevitably becomes becomesthan smaller than when pointthe M starting is the tipposition position of of the Onfoot, the other hand, because of the legleg tipis based on is the ratio andbetween as shown in Equation (5):in of largerthat than λ2 when point M the tipof position of the legthe (foot) as shown Figure 8b.and Based on these the human longer than the distance starting position the leg foot, observations, we can discriminate between the position of the hand tip and that of the leg tip based inevitably becomes larger than when point M is the tip position of the leg (foot) as shown in on belongs to hand tip, 1 Figure Based observations, we(5): can discriminate between the position of the hand tip(5) and the ratio of λ8b. λ2on asthese shown in Equation 1 and that of the leg tip based on the #ratio of

belongs legshown tip, in Equation (5): and toas

M belongs to hand tip, i f λλ2 ą 1 1 belongs to hand tip, 1 M belongs to leg tip, else

(5)

belongs to leg tip,

(a)

(b)

(5)

(c)

Figure 8. Examples of the two features and for kicking, and the comparison of these two features for kicking and punching. (a) Case of kicking; (b) profile graph of (a) by the proposed PbD (a) (c) comparison of (b) (c) and for kicking and punching. method; Figure 8. Examples of the two features

and

for kicking, and the comparison of these two

Figure Examples of the 9, two features λ1 and λ2 forcan kicking, andby theunfolding comparison of these features As8.shown in Figure hand waving behavior happen (Figure 9a)two or bending features for kicking and punching. (a) Case of kicking; (b) profile graph of (a) by the proposed PbD (Figure 9b) and the user’s arms.(a) In Case the latter case (Figure 9b), point Mofcan positionPbD of the elbow for kicking punching. of kicking; (b) profile graph (a)be bythe thetip proposed method; and for kicking and punching. method; (c) comparison of instead of thatofofλ1the proposed the method expressed by Equations (6) and (7) to (c) comparison andhand. λ2 forWe kicking and punching.

As shown in Figure 9, hand waving behavior can happen by unfolding (Figure 9a) or bending (Figure 9b)in theFigure user’s arms. In the latter case (Figure 9b),happen point Mby canunfolding be the tip position theorelbow As shown 9, hand waving behavior can (Figureof 9a) bending instead of that of the hand. We proposed the method expressed by Equations (6) and (7) (Figure 9b) the user’s arms. In the latter case (Figure 9b), point M can be the tip position of the to elbow

Sensors 2016, 16, 1010

10 of 23

instead of that of the hand. We proposed the method expressed by Equations (6) and (7) to discriminate the former case (point M is the tip position of the hand) from the latter (point M is the tip position of the elbow). Sensors 2016, 16, 1010 10 of 23 As shown in Figure 9, the two angles θh1 and θh2 are calculated based on three points of the starting point of the theformer arm, point M, and point profile Then, the discriminate case (point M isthe theending tip position of of thethe hand) fromgraph. the latter (point M difference is the tip position of the elbow). between these two angles is calculated as β. In general, β is smaller when hand waving occurs by shown in hand Figurewaving 9, the two angles and of the areuser’s calculated based on three points the unfoldingAs than when occurs by bending arm as shown in Figure 9. of Therefore, starting point of the arm, point M, and the ending point of the profile graph. Then, the difference based on β, the former case (point M is the tip position of the hand) is discriminated from the latter between these two angles is calculated as . In general, is smaller when hand waving occurs by case (point M is the tip position of the elbow) as follows: unfolding than when hand waving occurs by bending of the user’s arm as shown in Figure 9. # case (point M is the tip position of the hand) is discriminated from Therefore, based on , the former M belongs to hand tip, i f β ă Th the latter case (point M is the tip position of the elbow) as follows: θ

M belongs to elbow tip, else

¨ ˚ π ´ arcsin ˝ c´ β “ absp

= abs(

belongs to hand tip, < ℎ ˛ tip, ¨ belongs to elbow

YM ´Yh2



¯2 ‚´ ¯2 ´ X M ´Xh2 ` YM ´Yh2

˛(6)

¯ ´ YM ´Yh ˚c 1 arcsin ˝ ´ ¯2 ¯2 ´ ` YM ´Yh1 X M ´ Xh 1

‹ ‚

)

/

π{180

where (XM, YM) is the coordinate of point M in Figure 7, and (

(6)

,

(7)

q

) is the position of h1. In

(7)

where (XM , YM ) is the coordinate of point M in Figure 7, and (Xh1 , Yh1 ) is the position of h1 . In addition, addition, ( , ) is that of h2 in Figure 7. abs is the function for obtaining absolute value (Xh2 , (non-negative Yh2 ) is that ofvalue). h2 in Figure the function forpoints obtaining absolute value (non-negative value). Based 7. onabs theisextracted feature of Figures 6–9, the types of behavior in BasedClass on the extracted feature Figuresrules 6–9,ofthe types of behavior in Class 2 are recognized 2 are recognized basedpoints on theofdecision Table 1. Rules 1–4 of Figure 3 are shown in basedTable on the 1. decision rules of Table 1. Rules 1–4 of Figure 3 are shown in Table 1.

(a)

(b) Figure 9. Examples of two types of hand waving by (a) unfolding; and (b) bending of the user’s arm.

Figure 9. Examples of two types of hand waving by (a) unfolding; and (b) bending of the user’s arm. Table 1. Decision rules for recognizing the types of behavior in Class 2.

Table 1. Decision rules for recognizing the types of behavior in Class 2. Decision rules If (“Condition 1 * is TRUE”) Decision rules Rule 1 : Lying down If (“Condition 1 * is TRUE”) (Lying down) Else : Lying down Rule 1 (Lying down) : Go to Rule 2 Else If (“M :isGo legtotip” Ruleand 2 “Condition 2 * is TRUE”) Rule 2 : Kicking If (“M is leg tip” and “Condition 2 * is TRUE”) (Kicking) Else : Kicking Rule 2 (Kicking) : Go to Rule 3 Else Rule 3 If (“M :isGo hand tip”3and “Condition 3 * is TRUE”) to Rule (Hand waving) : Hand waving

Sensors 2016, 16, 1010

11 of 23

Table 1. Cont. Sensors 2016, 16,rules 1010 Decision

11 of 23

If (“M is hand tip” and “Condition 3 * is TRUE”) Else : Hand waving Rule 3 (Hand waving) : Go to Rule 4 Else If (“M :isGo hand tip”4and (“Condition 4 * or Condition 5 * is TRUE”)) to Rule : Punching If (“M is hand tip” and (“Condition 4 * or Condition 5 * is TRUE”)) Rule 4 Else : Punching (Punching) : Go to Else the rule (for checking running, walking or undetermined case Rule 4 (Punching) as shown lower part ofrunning, Figure 3)walking or undetermined : Gointothe the right rule (for checking * Condition 1: Average R of Equation is lessin than thresholds (Tr2 of andFigure Tr3) as 3) shown in Figure 3; case as (2) shown thetwo right lower part * Condition 2: both X andEquation Y positions M than pointtwo are thresholds higher than(Tthreshold; * Condition 3: The * Condition 1: Average R of (2) isofless r2 and Tr3 ) as shown in Figure 3; position2:ofboth M isX changed in bothofXMand Y are directions in recent N frames as shown in position Figure 10a; * Condition and Y positions point higher than threshold; * Condition 3: The of M is changed in both 4: X and directions in is recent N frames as shown 10a; in * Condition 4: The position * Condition The Y position of M increased in both X andin Y Figure directions recent N frames as shownof M is increased both X and Y directions in recent as shown 10b; * Condition 5: N The position in Figurein10b; * Condition 5: The position of N M frames is increased only in in Figure the Y directions in recent frames of M is increased only in the Y directions in recent N frames as shown in Figure 10c. as shown in Figure 10c.

(a)

(b)

(c) Figure 10. Examples of changes in the position of point M in the profile graph for hand waving and

Figure 10. Examples of changes in the position of point M in the profile graph for hand waving and punching. (a) Hand waving (the figure on the right shows one profile graph among the two graphs punching. (a) Hand waving (the figure on the right shows one profile graph among the two graphs representing the waving of both left and right hands); (b) punching (case 1); (c) punching (case 2). representing the waving of both left and right hands); (b) punching (case 1); (c) punching (case 2).

The proposed PbD method enables us to rapidly find the correct positions of the tips of the hand and leg without using a complicated that performs boundary trackingofofthe binarized The proposed PbD method enables usalgorithm to rapidly find the correct positions tips of the human area. hand and leg without using a complicated algorithm that performs boundary tracking of binarized

human area.

Sensors 2016, 16, 1010

12 of 23

3. Experimental Results 3.1. Description of Database Open databases exist for behavior recognition of visible light images [27,47,48] or that of thermal images [49]. However, there is no open database (for behavior recognition in outdoor environments) with images collected by both visible light and thermal cameras at the same time. Therefore, we used the database that was built by acquiring images with our dual camera system. Data acquisition for the experiments was performed by using a laptop computer and the dual cameras shown in Figure 1. All the images were acquired based on the simultaneous use of visible light and thermal cameras. The laptop computer was equipped with a 2.50 GHz CPU (Intel (R) Core (TM) i5-2520M, Intel Corp., Santa Clara, CA, USA) and 4 GB RAM. The proposed algorithm was implemented using a C++ program using Microsoft foundation class (MFC, Microsoft Corp., Redmond, DC, USA) and OpenCV library (version 2.3.1, Intel Corp., Santa Clara, CA, USA). We built large datasets that include 11 different types of behavior (explained in Section 2.2.1) as shown in Table 2. Datasets were collected in six different places during the day and at night in environments with various temperatures of four seasons with the different camera setup positions. The database includes both thermal and visible images. The total number of images in the database is 194,094. The size of the human area varies from 28 to 117 pixels in width and from 50 to 265 pixels in height, respectively. Table 2. Description of 12 datasets. Dataset

Condition

Detail Description -

˝

I (see in Figure 11a)

1.2 C, morning, humidity 73.0%, wind 1.6 m/s

-

˝

The intensity of background is influenced by the window of building Human shadow is reflected on the window, which is detected as another object in thermal image The intensity of background is influenced by the window of building Object is not seen in visible light image Human shadow is reflected on window, which is detected as another object in thermal image

II (see in Figure 11b)

´1.0 C, evening, humidity 73.0%, wind 1.5 m/s

-

III (see in Figure 11c)

1.0 ˝ C, afternoon, cloudy, humidity 50.6%, wind 1.7 m/s

-

The intensity of background is influenced by leaves and trees

IV (see in Figure 11d)

´2.0 ˝ C, dark night, humidity 50.6%, wind 1.8 m/s

The intensity of background is influenced by leaves and trees Object is not seen in visible light image

˝

V (see in Figure 11e)

14.0 C, afternoon, sunny, humidity 43.4%, wind 3.1 m/s

VI (see in Figure 11f)

5.0 ˝ C, dark night, humidity 43.4%, wind 3.1 m/s

VII (see in Figure 11g)

´6.0 ˝ C, afternoon, cloudy, humidity 39.6%, wind 1.9 m/s

VIII (see in Figure 11h)

´10.0 ˝ C, dark night, humidity 39.6%, wind 1.7 m/s

-

Difference between background and human diminishes because of the high temperature of background

-

The air heating system of building increases the temperature of part of the building in background Object is not seen in visible light image

-

Halo effect is shown near the human area in thermal image, which makes it difficult to detect the correct human area

-

Halo effect is shown near the human area in thermal image, which makes it difficult to detect the correct human area Object is not seen in visible light image

-

IX (see in Figure 11i)

21.9 ˝ C, afternoon, cloudy, humidity 62.6%, wind 1.3 m/s

-

˝

X (see in Figure 11j)

´10.9 C, dark night, humidity 48.3%, wind 2.0 m/s

-

Halo effect is shown near the human area in thermal image, which makes it difficult to detect the correct human area Difference between background and human diminishes due to the high temperature of background The dataset was collected at night during winter. Therefore, the background in thermal image is too dark because of low temperature Object is not seen in visible light image

Sensors 2016, 16, 1010

13 of 23

Table 2. Cont. Dataset

Condition

Detail Description -

˝

XI (see in Figure 11k)

27.0 C, afternoon, sunny, humidity 60.0%, wind 1.0 m/s

XII (see in Figure 11l) Sensors 2016, 16, 1010

20.2 ˝ C, dark night, humidity 58.6%, wind 1.2 m/s

-

-

Human is darker than road because the temperature of the road is much higher than that of a human in summer Leg is not clear when kicking behavior happens because the woman in the image wore a long skirt Human is darker than road because the temperature of the road is much higher than that of a human in summer 13 of 23 Object is not seen in visible light image

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

Figure 11. Examples from each of the 11 datasets. (a) Dataset I; (b) dataset II; (c) dataset III; (d)

Figure 11. Examples from each of the 11 datasets. (a) Dataset I; (b) dataset II; (c) dataset III; dataset IV; (e) dataset V; (f) dataset VI; (g) dataset VII; (h) dataset VIII; (i) dataset IX; (j) dataset X; (k) (d) dataset IV; (e) dataset V; (f) dataset VI; (g) dataset VII; (h) dataset VIII; (i) dataset IX; (j) dataset X; dataset XI; (l) dataset XII. (k) dataset XI; (l) dataset XII. As shown in Figure 12 and Table 3, we collected the 11 datasets with a camera setup of various heights, horizontal distances, and Z distances. We then classified the datasets according to the kind of behavior, again, and the numbers of frames and types of behavior are shown in Table 4.

Sensors 2016, 16, 1010

14 of 23

As shown in Figure 12 and Table 3, we collected the 11 datasets with a camera setup of various heights, horizontal distances, and Z distances. We then classified the datasets according to the kind of Sensors 2016,again, 16, 1010and the numbers of frames and types of behavior are shown in Table 4. 14 of 23 behavior,

Figure 12. Example of camera setup.

Table 3. 3. Camera setup setup used used to to collect collect the the 11 11 datasets datasets (unit: (unit: meters). meters). Table

Datasets Height Height Datasets I and II 8 Datasets I and II 8 Datasets III and IV 7.7 Datasets III and IV 7.7 Datasets 5 Datasets V and VI V and VI 5 Datasets VII and VIII Datasets VII and VIII 10 10 Datasets IX and X IX and X 10 10 Datasets Datasets XI and XII 6 Datasets XI and XII 6 Datasets

Horizontal Distance Horizontal Distance 10 10 11 11 15 15 15 15 15 15 11 11

Z Distance Z Distance 12.8 12.8 13.4 13.4 15.8 15.8 18 18 18 18 12.5 12.5

Table 4. Numbers of frames and the types of behavior in each dataset. Table 4. Numbers of frames and the types of behavior in each dataset. #Frame #Frame Behavior Day Night Behavior Day Night 1504 23782378 Walking Walking 1504 Running 608 21962196 Running 608 Standing Standing 604 604 812812 Sitting 418 Sitting 418 488488 Approaching 1072 1032 1072 5581032 Leaving Approaching 508 Leaving 29588 508 14090558 Waving with two hands Waving with one hand 24426 2958815428 Waving with two hands 14090 Punching 21704 13438 Waving with one hand 24426 15428 Lying down 7728 5488 Punching 21704 13438 Kicking 27652 22374 Lying down 7728 5488 Total 194094 Kicking 27652 22374 Total 194094 3.2. Accuracies of Behavior Recognition

#Behavior #Behavior Day Day Night Night 763763 1245 1245 269269 355 355 792 584584 792 468 378378 468 356 354 356163 354 188 163 1752 188 870 1209 870 885 1752 1739 885 1078 1209 2621 2022 1739 1078 2942 3018 2621 2022 24051 2942 3018 24051

3.2. Accuracies of Behavior We used the datasetsRecognition in Table 4 to evaluate the accuracy of behavior recognition by our method. The accuracies were measured based on the following equations [50]: We used the datasets in Table 4 to evaluate the accuracy of behavior recognition by our method. The accuracies were measured based on the following equations [50]: #TP Positive predictive value pPPVq “ (8) #TP ` #FP #TP (8) Positive predictive value (PPV) = #TP + #FP #TP True positive rate pTPRq “ (9) #TP#TP ` #FN (9) True positive rate (TPR) = #TP #TP ` #TN + #FN Accuracy pACCq “ (10) #TP ` #TN ` #FP ` #FN #TP + #TN (10) Accuracy(ACC) = #TP + #TN + #FP + #FN F_score = 2

PPV TPR PPV + TPR

(11)

Sensors 2016, 16, 1010 Sensors 2016, 16, 1010

15 of 23 15 of 23

PPV¨ where #TP is the number of true positives (TPs), and TPTPR represents the cases in which the behavior F_score “ 2¨ (11) PPV TPR included in input image is correctly recognized. #TN is` the number of true negatives (TNs), and TN represents cases in which the positives behavior (TPs), not included input image correctly unrecognized. In where #TP the is the number of true and TPinrepresents the iscases in which the behavior addition, in #FP is theimage number of false positives (FPs), and case in which the behavior included input is correctly recognized. #TN is FP therepresents number ofthe true negatives (TNs), and TN not included input recognized. is the number of falseunrecognized. negatives (FNs), represents theincases in image which is theincorrectly behavior not included #FN in input image is correctly In and FN represents the case in which the behavior included in input image is incorrectly addition, #FP is the number of false positives (FPs), and FP represents the case in which the behavior unrecognized. allimage the measures of PPV, TPR, ACC, F_score, the and (FNs), minimum not included in For input is incorrectly recognized. #FNand is the number of maximum false negatives and values are 100(%) and where 100(%) and 0(%) are the highest and lowest FN represents the case in 0(%), which respectively, the behavior included in input image is incorrectly unrecognized. For accuracies, respectively. all the measures of PPV, TPR, ACC, and F_score, the maximum and minimum values are 100(%) and shown in Table the accuracies behavior recognition by our method respectively. are higher than 95% 0(%),As respectively, where5,100(%) and 0(%)ofare the highest and lowest accuracies, in most cases, irrespective of day and night and kinds of behaviors. In Figure 13, wehigher show examples As shown in Table 5, the accuracies of behavior recognition by our method are than 95% of most correct behavior recognition datasets collected in various environments. The examples results of in cases, irrespective of daywith andour night and kinds of behaviors. In Figure 13, we show behavior can show various information of emergency situation. For example,The in Figure of correctrecognition behavior recognition with our datasets collected in various environments. results13l, of one human (object 2) is leaving the other human (object 1), who is lying on the ground, in which case behavior recognition can show various information of emergency situation. For example, in Figure 13l, an emergency situation would be one human (object 2) is leaving thesuspected. other human (object 1), who is lying on the ground, in which case In the next experiment, measured the processing time of our method. The procedure of an emergency situation wouldwe be suspected. human shown in Figure 1 took an average processing time 27.3 ms/frame (about In detection the next experiment, we measured the processing time of of ourabout method. The procedure 27.7 ms/frame for dayshown imagesinand about 26.9 ms/frame forprocessing night images). the detected human of human detection Figure 1 took an average timeWith of about 27.3 ms/frame area, the average processing of behavior recognition of Figure 3 is shown in With Tablethe 6. Based on (about 27.7 ms/frame for daytime images and about 26.9 ms/frame for night images). detected these results, weaverage can confirm that time our of system can recognition be operated a speed of about 33.7 frames/s human area, the processing behavior of at Figure 3 is shown in Table 6. Based (1000/29.7) including theconfirm procedures of human detection and behavior recognition. Considering on these results, we can that our system can be operated at a speed of about 33.7 frames/s only the procedure behavior recognition, thisdetection procedure be operated at a speed of 416.7 (1000/29.7) includingofthe procedures of human andcan behavior recognition. Considering (1000/2.4) frames/s. of behavior recognition, this procedure can be operated at a speed of 416.7 only the procedure (1000/2.4) frames/s. Table 5. Accuracies of behavior recognition by our method (unit: %). Table 5. Accuracies of behavior recognition by our method (unit: %).

Behavior Walking Behavior Running Walking Standing Running Sitting Standing Approaching Sitting Approaching Leaving WavingLeaving with two hands Waving with two hands Waving with one hand Waving with one hand Punching Punching Lying Lyingdown down Kicking Kicking Average Average

TPR 92.6 TPR 96.6 92.6 100 96.6 92.5 100 100 92.5 100 100 100 95.9 95.9 93.6 93.6 87.8 87.8 99.4 99.4 90.6 90.6 95.4 95.4

PPV 100 PPV 100 100 100 100 100 100 100 100 100 100 100 98.8 98.8 99.4 99.4 99.5 99.5 99.0 99.0 95.8 95.8 99.3 99.3

Day ACC F_Score Day 92.7 96.2 ACC F_Score 96.7 98.3 92.7 96.2 100 100 96.7 98.3 92.5 96.1 100 100 100 100 92.5 96.1 100 100 100 100 100 100 99.4 97.3 99.4 97.3 99.3 96.4 99.3 96.4 98.0 93.3 98.0 93.3 98.9 99.2 98.9 99.2 97.2 93.1 97.2 93.1 97.7 97.3 97.7 97.3

(a)

TPR 98.5 TPR 94.6 98.5 97.3 94.6 96.5 97.3 100 96.5 100 100 100 97.0 97.0 90.0 90.0 77.4 77.4 97.3 97.3 88.5 88.5 94.3 94.3

Night PPV Night ACC F_Score 100 98.5 99.2 PPV ACC F_Score 100 94.7 97.2 100 98.5 99.2 100 97.3 98.6 100 94.7 97.2 100 100 96.5 97.3 98.2 98.6 100 100 100 100 96.5 98.2 100 100100 100 100 100 100 99.6 100 100 99.5 98.2 99.5 99.6 98.2 100 99.0 94.7 100 99.0 94.7 99.6 99.6 96.4 96.4 87.1 87.1 98.0 96.3 96.3 97.6 97.6 98.0 90.3 94.6 94.6 89.4 89.4 90.3 98.9 97.5 96.4 98.9 97.5 96.4

(b) Figure 13. 13. Cont. Figure Cont.

Sensors 2016, 16, 1010

16 of 23

Sensors 2016, 16, 1010

16 of 23

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

(m) Figure 13. Examples of correct behavior recognition. In (a–m), the images on the left and right are Figure 13. Examples of correct behavior recognition. In (a–m), the images on the left and right are obtained by thermal and visible light camera, respectively. (a) Waving with two hands; (b) waving obtained by thermal and visible light camera, respectively. (a) Waving with two hands; (b) waving with with one hand; (c) punching; (d) kicking; (e) lying down; (f) sitting; (g) walking; (h) standing; (i) one hand; (c) punching; (d) kicking; (e) lying down; (f) sitting; (g) walking; (h) standing; (i) running; running; (j)approaching and (m) approaching (nighttime); and (l) leaving (nighttime). (j) and (m) (nighttime); (k) and (l)(k) leaving (nighttime).

Sensors 2016, 16, 1010

17 of 23

Table 6. Processing time of our method for each behavior dataset (unit: ms/frame). Processing Time Behavior

Day

Night

Walking Running Standing Sitting Approaching Leaving Waving with two hands Waving with one hand Punching Lying down Kicking

2.3 1.5 3.2 1.9 3.3 2.9 3.2 2.6 2.7 1.2 1.9

2.6 1.5 3.1 2.0 2.9 2.9 3.1 2.1 2.3 1.0 2.0

Average

2.4

3.3. Comparison between the Accuracies Obtained by Our Method and Those Obtained by Previous Methods In the next experiment, we performed comparisons between the accuracies of behavior recognition by our method and those by previous methods. These comparisons tested the following methods: Fourier descriptor-based method by Tahir et al. [22], GEI-based method by Chunli et al. [33], and convexity defect-based methods by Youssef [39]. As shown in Tables 5 and 7, our method outperforms the previous methods for datasets of all types of behavior. Table 7. Accuracies of other methods (unit: %) *. Fourier Descriptor-Based

GEI-Based

Convexity Defect-Based

Behavior

TPR

PPV

ACC

F_Score

TPR

PPV

ACC

F_Score

TPR

PPV

ACC

F_Score

Walking Running Standing Sitting Waving with two hands Waving with one hand Punching Lying down Kicking Average

0 13.0 85.9 59.7 81.4 78.2 27.1 12.9 61.1 46.6

0 97.2 100 100 13.7 13.8 20.3 72.0 42.0 51.0

1.6 16.0 85.9 59.7 47.8 50.3 74.6 38.6 78.7 50.4

22.9 92.4 74.8 23.5 23.5 23.2 21.9 49.8 48.7

83.9 0 n/a n/a 96.5 21.2 62.2 n/a 24.7 55.5

97.7 0 n/a n/a 19.1 10.1 16.5 n/a 80.7 46.3

82.4 3.5 n/a n/a 59.6 73.8 50.1 n/a 86.0 65.1

90.3 n/a n/a 31.9 13.7 26.1 n/a 37.8 50.5

17.4 23.4 n/a n/a 89.0 34.9 39.7 n/a 29.3 47.7

98.4 98.4 n/a n/a 68.6 73.4 55.7 n/a 47.6 77.4

18.4 27.6 n/a n/a 94.9 92.4 86.9 n/a 82.2 71.8

29.6 37.8 n/a n/a 77.4 47.3 46.4 n/a 36.3 59.0

* n/a represents “not available” (the method was unable to produce a result for behavior recognition).

In Table 7, because all the previous methods are able to recognize individual types of behavior (rather than interaction-based behavior), interaction-based behaviors such as leaving and approaching were not compared. The reason for the difference between the ACC and F_score in some cases (for example, punching, by the Fourier descriptor-based method) is that #TN is large in these cases, which increases the ACC as shown in Equation (10). Since the GEI-based method is based on human motion, motionless behaviors, such as standing, sitting, and lying down, cannot be recognized. The convexity defect-based method is based on a defect point, which cannot be obtained when the distance between two hands and two feet is small. In the case of standing, sitting, and lying down, the distance becomes small; hence, the defect points are not produced. Therefore, these types of behavior cannot be recognized and the recognition results are indicated as “n/a”. In Fourier descriptor-based methods, walking and lying down are recognized with very low accuracy. This is because walking, running, and kicking behaviors have a very similar pattern of the Fourier descriptor, which means that walking is recognized as kicking or running. Moreover, standing and lying down have similar patterns, such that lying down has been recognized as a standing behavior (see Figure 14a–c). In addition, because of the rotation in variant characteristics of the Fourier

Sensors 2016, 2016, 16, 16, 1010 1010 Sensors Sensors 2016, 16, 1010

18 18 of of 23 23 18 of 23

Sensors 2016, 16, 1010 (see Figure 14a–c). In addition, because of the rotation in variant characteristics 18 of 23 of standing behavior standing behavior (seedescriptor Figure 14a–c). Instanding addition, because thebehavior rotation inaFigure variant characteristics descriptor, the Fourier of the inofFigure 14d has pattern14d similar that of of the Fourier descriptor, the Fourier descriptor ofbehavior the standing in has atopattern standing behavior (seethe Figure 14a–c). In addition, because of thebehavior rotation in variant characteristics of the Fourier descriptor, Fourier descriptor of the standing in Figure 14d has a pattern lying down behavior in Figure 14e. similar to that of the lying down behavior in Figure 14e. the Fourier Fourier descriptor of the14e. standing behavior in Figure 14d has a pattern similar to that descriptor, of the lyingthe down behavior in Figure similar to that of the lying down behavior in Figure 14e.

(a) (b) (c) (d) (e) (a)(a) (b) (c) (d) (e) (b) (c) (d) (e) Figure 14. Examples of cases in which the Fourier descriptor-based method produced an erroneous

Figure 14. Examples of cases in which the Fourier descriptor-based method produced an erroneous Figure 14.14. Examples ofofcases which the descriptor-based method produced erroneous recognition result. (a) (b) running; (c) kicking; kicking; (d) standing; standing; (e) lying lying down. anan Figure Examples casesinin which the Fourier Fourier descriptor-based method produced erroneous recognition result. (a) Walking; Walking; (b) running; (c) (d) (e) down. recognition result. (a)(a)Walking; (d) standing; standing;(e) (e)lying lyingdown. down. recognition result. Walking;(b) (b)running; running; (c) (c) kicking; kicking; (d)

GEI-based methods have similar patterns for walking, running, and kicking behaior. Therefore, GEI-based methods have similar patterns for walking, walking,running, running,and andkicking kicking behaior. Therefore, GEI-based methods have similar patterns running, and kicking behaior. Therefore, GEI-based methods have similar patterns for walking, behaior. Therefore, running was recognized as walking (see Figurefor 15c,d). In addition, a kicking image produced by GEI running was recognized as walking (see Figure 15c,d). In addition, a kicking image produced by GEI running was recognized as walking (see Figure 15c,d). addition, a kicking image produced by GEI running was recognized as walking (see Figure 15c,d). In addition, a kicking image produced by GEI is not much different from those showing walking or running as shown in Figure 15. Therefore, the is not not much different from those showing walking or running runningas asshown showninin inFigure Figure15. 15.Therefore, Therefore, the is not much different fromthose those showing walking or thethe is much different from showing running as shown Figure 15. Therefore, PPV and consequent F_score of kicking arewalking low as shown in Table 7. PPV and consequent F_score ofofkicking kicking in Table 7. PPV and consequentF_score F_scoreof kickingare arelow low as as shown PPV and consequent are low as shown in in Table Table7. 7.

(a)(a) (a)

(b) (b) (b)

(c) (c) (c)

(d) (d) (d)

Figure 15.15. Examples method produced producedananerroneous erroneousrecognition recognition Figure Examplesofofcases casesininwhich which the the GEI-based GEI-based method Figure 15. Examples ofofcases inin which thethe GEI-based method produced an erroneous recognition result. Figure 15. Examples cases which GEI-based method produced an erroneous recognition result. Kicking images by (a) GEI; and (b) EGEI; running or walking image by (c) GEI; and EGEI. result. Kicking images by (a) GEI; and (b) EGEI; running or walking image by (c) GEI; and (d)(d) EGEI. Kicking images by (a) GEI; and (b) EGEI; running or walking image by (c) GEI; and (d) EGEI. result. Kicking images by (a) GEI; and (b) EGEI; running or walking image by (c) GEI; and (d) EGEI.

In In thethe case ofofa asmall (Figure 16a), 16a),GEI-based GEI-basedmethods methods not provide case smallamount amountof ofhand hand waving waving (Figure dodo not provide methods In the case of a small amount of hand waving (Figure 16a), GEI-based methods do not provide correct information becauseour ourcamera cameraset setcaptures captures images correct informationfor forbehavior behaviorrecognition. recognition. In addition, addition, because images correct information for behavior recognition. In addition, addition, because our camera set captures images correct In because our camera from an elevatedposition position (height of 5–10 5–10 m) m) (see Figures and and Table information from aninformation elevated (height of Figures 12 12 and16, 16, and Table3), 3),the the information elevated position (height of 5–10 m) (see Figures Figures 12not and 16, and and as 3), in the information from an elevated position (height 5–10 (see 12 and 16, Table produced GEI-based methods the casem) of hand hand waving distinctive, Figure 16b,c. produced byby GEI-based methods ininof the case of waving isis not distinctive, asshown shown in Figure 16b,c. methods in the case of waving is distinctive, as shown in Figure 16b,c. produced by GEI-based methods in the case of hand waving is not distinctive, as shown in Figure 16b,c. Therefore, the accuracies obtained for images showing waving with one or two hands by GEI-based Therefore, the accuracies obtained for images showing waving with one or two hands by GEI-based methods are low,asasshown shown Table Therefore, the accuracies obtained for7.7.images showing waving with one or two hands by GEI-based methods are low, ininTable methods are low, as shown in Table Table 7. 7.

(a) (b) (c) (a) (b) (c) Figure 16. Examples of method produced (c) an erroneous recognition (a)cases in which the GEI-based (b)

Figure of cases of in waving; which the GEI-based method produced produced an erroneous recognition result.16.(a)Examples A small amount (b,c) cases in which information by GEI is not Figure 16. Examples Examples of casesin which GEI-based method produced an erroneous recognition result. (a) A small amount ofin waving; (b,c) cases in which information GEI is not Figure 16. of which thethe GEI-based method produced an erroneous result. distinctive because thecases image is captured by our camera system installed at produced a heightrecognition of by 5–10 m (see result. (a)12A small amount (b,c) cases in which information by 5–10 GEI is (see not distinctive because the imageofiswaving; captured by camera system installed a height of m Figure and Table (a) A small amount of3). waving; (b,c) cases in our which information producedatproduced by GEI is not distinctive distinctive because the image isby captured by our camera systematinstalled height 5–10 m (see Figure and Tableis 3).captured because12the image our camera system installed a heightatofa 5–10 m of (see Figure 12 Figure 12 and Table 3). and Table 3).

Sensors 2016, 16, 1010

19 of 23

Sensors 2016, 16, 1010

19 of 23

The convexity defect-based method produces the errors for behavior recognition of waving, The convexity defect-based method produces the errors for behavior recognition of waving, punching, and kicking because of the error of detection of the positions of the hand and leg tip. As punching, and kicking because of the error of detection of the positions of the hand and leg tip. As shown inin Figure andleg legtip, tip,other otherpoints pointsare areincorrectly incorrectly detected shown Figure17, 17,ininaddition additionto tothose those of of the the hand hand and detected byby thethe convexity defect-based method, which causes an error of behavior recognition for waving, convexity defect-based method, which causes an error of behavior recognition for waving, punching, and kicking. punching, and kicking.

(a)

(b)

(c)

(d)

(e)

Figure 17. Examples of cases in which the convexity defect-based method produced an erroneous Figure 17. Examples of cases in which the convexity defect-based method produced an erroneous recognition result. (a–d) Kicking; (e) waving with two hands. recognition result. (a–d) Kicking; (e) waving with two hands.

Table 8 presents the behavior recognition results of our method as a confusion matrix. In this Table presents the behavior recognition results of ourwhereas methodpredicted as a confusion matrix. In this table, the8actual behavior means the ground-truth behavior, behavior represents table, actual behavior means the ground-truth behavior, behavior represents thatthe recognized by our method. Therefore, the higher values whereas shown in predicted the diagonal cell of the table that recognized by both our method. the higher values shown in the diagonal cell of the tableof (i.e., (i.e., “standing” in termsTherefore, of actual and predicted behaviors) indicate the higher accuracies “standing” in termsInofsome actual andinpredicted behaviors) indicate higher of means behavior behavior both recognition. cases Table 8, the sum of values in athe row is notaccuracies 100%, which that an “undetermined case” scenario, as shown Figurein3,aoccurs. for Table 8, inthat the an recognition. In some cases in Table 8, the sum ofinvalues row isFor notexample, 100%, which means case of “punching”, the percentage of “undetermined cases” is 15.3% (100 − (0.3 + 1.6 + 0.3 + 82.5)) (%). “undetermined case” scenario, as shown in Figure 3, occurs. For example, for Table 8, in the case of “punching”, the percentage of “undetermined cases” is 15.3% (100 ´ (0.3 + 1.6 + 0.3 + 82.5)) (%). Table 8. Confusion matrix of the results of behavior recognition by our method (unit: %).

Table 8. Confusion matrix of the results of behavior recognition by our method (unit: %). Predicted Predicted Actual Actual Walking

Waving Waving with Running Standing Walking Running Walking StandingSitting Sitting two with two hands hands 95.5 0.3 0.4

Running

Walking Standing

0.4

95.6

95.5 0.4

Waving Waving withLyingLying Punching Kicking with Kicking Punching oneone hand downdown hand 0.5 0.3

98.50.3

0.4

3.3

0.4

0.5 3.3

Running 95.6 0.3 0.4 Sitting 0.5 94.6 Standing 98.5 Waving with two hands 96.4 0.3 Sitting 0.5 94.6 Waving with one hand 0.2 0.1 91.8 0.1 Waving with 96.4 0.3 Lying down 0.3 0.6 98.3 two hands Waving with Kicking 0.2 3.0 0.1 89.5 0.2 91.80.3 0.1 one hand Punching 0.3 1.6 0.3 82.5 Lying down 0.3 0.6 98.3 Kicking 0.2 3.0 0.3 89.5 Table 9. Summarized comparisons of accuracies and processing time obtained by previous methods Punching 0.3 1.6 0.3 82.5

and by our method.

In Table 9, we present a summarized comparison of the accuracies and processing time Processingthe Time obtained by previous methods and by our method. The processing time only represents time Method TPR (%) PPV (%) ACC (%) F_score (%) taken by behavior recognition, and excludes the time for detection of the human box(ms/frame) because this is theFourier same descriptor-based in each method. Based in Table 46.6 on the results 51.0 50.4 9, we can 48.7confirm that our 16.1 method GEI-based 55.5 46.3 65.1 50.5 4.9 outperforms other methods both in terms of accuracy and processing time. Convexity defect-based Our method

47.7 94.8

77.4 99.1

71.8 97.6

59.0 96.8

5.2 2.4

In Table 9, we present a summarized comparison of the accuracies and processing time obtained by previous methods and by our method. The processing time only represents the time taken by behavior recognition, and excludes the time for detection of the human box because this is the same

Sensors 2016, 16, 1010

20 of 23

in each method. Based on the results in Table 9, we can confirm that our method outperforms other methods both in terms of accuracy and processing time. Our method is mainly focused on behavioral recognition based on frontal-viewing human body, like most previous studies [17–20,26,27,30,32–36,38,39,41,42,51]. The change of view direction (from frontal view to side view) does not affect the recognition accuracy of “standing” and “sitting” of Figure 3. Because our method already considers “walking” and “running” in various directions (horizontal, vertical, or diagonal), as shown in Figure 3, the change of view direction does not have much effect on the recognition accuracy of “walking” and “running” in Figure 3, either. However, “lying down” is mainly recognized based on the average ratio of height to width of the detected human box, as shown in Figure 3. Therefore, the recognition accuracy can be highly affected by a change in view direction because the average ratio of height to width of the detected box is changed according to the change of view direction. In the case of behaviors such as “kicking”, “waving”, and “punching”, they are recognized based on the detected tip positions of hands and feet. Therefore, the movement of these positions is difficult to detect in side-viewing images, and the recognition accuracy can be highly affected by the change of view direction. The recognition of “lying down”, “kicking”, waving”, and “punching” only by one side-viewing camera is s challenging problem that has not been solved in previous studies. Therefore, our method is also focused on behavioral recognition based on a frontal view of the human body, like most previous studies. 4. Conclusions In this research, we propose a robust system for recognizing behavior by fusing data captured by thermal and visible light camera sensors. We performed behavior recognition in an environment in which a surveillance camera would typically be used, i.e., where the camera is installed at a position elevated in relation to that of the human. In this case, behavior recognition usually becomes more difficult because the camera looks down on people. In order to solve the shortcomings of previous studies of behavior recognition based on GEI and a region-based method, we propose the PbD method. The PbD method employs a binarized human blob, which it uses to detect the tip positions of hands and legs. Then, various types of behavior are recognized based on the information obtained by tracking these tip positions and our decision rules. We constructed multiple datasets collected in various environments, and compared the accuracies and processing time obtained by our method to those obtained by other researchers. The experiments enabled us to confirm that our method outperforms other methods both in terms of accuracy and processing time. In the future, we plan to increase the number of behavioral types by including those associated with emergency situations such as kidnapping, etc., for our experiments. In addition, we plan to research a method for enhancing the accuracy of determining emergency situations by combining our vision-based method with other sensor-based methods such as audio sensors. Acknowledgments: This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2015R1D1A1A01056761). Author Contributions: Ganbayar Batchuluun and Kang Ryoung Park designed the overall system and made the behavior recognition algorithm. In addition, they wrote and revised the paper. Yeong Gon Kim, Jong Hyun Kim, and Hyung Gil Hong helped to develop the method of human detection and collected our database. Conflicts of Interest: The authors declare no conflict of interest.

References 1. 2.

Besbes, B.; Rogozan, A.; Rus, A.-M.; Bensrhair, A.; Broggi, A. Pedestrian Detection in Far-Infrared Daytime Images Using a Hierarchical Codebook of SURF. Sensors 2015, 15, 8570–8594. [CrossRef] [PubMed] Zhao, X.; He, Z.; Zhang, S.; Liang, D. Robust Pedestrian Detection in Thermal Infrared Imagery Using a Shape Distribution Histogram Feature and Modified Sparse Representation Classification. Pattern Recognit. 2015, 48, 1947–1960. [CrossRef]

Sensors 2016, 16, 1010

3.

4.

5. 6.

7.

8.

9.

10.

11. 12.

13. 14. 15. 16.

17. 18. 19. 20. 21. 22. 23.

21 of 23

Li, Z.; Zhang, J.; Wu, Q.; Geers, G. Feature Enhancement Using Gradient Salience on Thermal Image. In Proceedings of the International Conference on Digital Image Computing: Techniques and Applications, Sydney, Australia, 1–3 December 2010; pp. 556–562. Chang, S.L.; Yang, F.T.; Wu, W.P.; Cho, Y.A.; Chen, S.W. Nighttime Pedestrian Detection Using Thermal Imaging Based on HOG Feature. In Proceedings of the International Conference on System Science and Engineering, Macao, China, 8–10 June 2011; pp. 694–698. Lin, C.-F.; Chen, C.-S.; Hwang, W.-J.; Chen, C.-Y.; Hwang, C.-H.; Chang, C.-L. Novel Outline Features for Pedestrian Detection System with Thermal Images. Pattern Recognit. 2015, 48, 3440–3450. [CrossRef] Bertozzi, M.; Broggi, A.; Rose, M.D.; Felisa, M.; Rakotomamonjy, A.; Suard, F. A Pedestrian Detector Using Histograms of Oriented Gradients and a Support Vector Machine Classifier. In Proceedings of the IEEE Conference on Intelligent Transportation Systems, Seattle, WA, USA, 30 September–3 October 2007; pp. 143–148. Li, W.; Zheng, D.; Zhao, T.; Yang, M. An Effective Approach to Pedestrian Detection in Thermal Imagery. In Proceedings of the International Conference on Natural Computation, Chongqing, China, 29–31 May 2012; pp. 325–329. Wang, W.; Wang, Y.; Chen, F.; Sowmya, A. A Weakly Supervised Approach for Object Detection Based on Soft-Label Boosting. In Proceedings of the IEEE Workshop on Applications of Computer Vision, Tampa, FL, USA, 15–17 January 2013; pp. 331–338. Wang, W.; Zhang, J.; Shen, C. Improved Human Detection and Classification in Thermal Images. In Proceedings of the IEEE International Conference on Image Processing, Hong Kong, China, 26–29 September 2010; pp. 2313–2316. Takeda, T.; Kuramoto, K.; Kobashi, S.; Haya, Y. A Fuzzy Human Detection for Security System Using Infrared Laser Camera. In Proceedings of the IEEE International Symposium on Multiple-Valued Logic, Toyama, Japan, 22–24 May 2013; pp. 53–58. Sokolova, M.V.; Serrano-Cuerda, J.; Castillo, J.C.; Fernández-Caballero, A. A Fuzzy Model for Human Fall Detection in Infrared Video. J. Intell. Fuzzy Syst. 2013, 24, 215–228. Ghiass, R.S.; Arandjelovi´c, O.; Bendada, H.; Maldague, X. Infrared Face Recognition: A Literature Review. In Proceedings of the International Joint Conference on Neural Networks, Dallas, TX, USA, 4–9 August 2013; pp. 1–10. Davis, J.W.; Sharma, V. Background-subtraction Using Contour-based Fusion of Thermal and Visible Imagery. Comput. Vis. Image Underst. 2007, 106, 162–182. [CrossRef] Arandjelovi´c, O.; Hammoud, R.; Cipolla, R. Thermal and Reflectance Based Personal Identification Methodology under Variable Illumination. Pattern Recognit. 2010, 43, 1801–1813. [CrossRef] Leykin, A.; Hammoud, R. Pedestrian Tracking by Fusion of Thermal-Visible Surveillance Videos. Mach. Vis. Appl. 2010, 21, 587–595. [CrossRef] Kong, S.G.; Heo, J.; Boughorbel, F.; Zheng, Y.; Abidi, B.R.; Koschan, A.; Yi, M.; Abidi, M.A. Multiscale Fusion of Visible and Thermal IR Images for Illumination-Invariant Face Recognition. Int. J. Comput. Vis. 2007, 71, 215–233. [CrossRef] Rahman, S.A.; Li, L.; Leung, M.K.H. Human Action Recognition by Negative Space Analysis. In Proceedings of the IEEE International Conference on Cyberworlds, Singapore, 20–22 October 2010; pp. 354–359. Rahman, S.A.; Leung, M.K.H.; Cho, S.-Y. Human Action Recognition Employing Negative Space Features. J. Vis. Commun. Image Represent. 2013, 24, 217–231. [CrossRef] Rahman, S.A.; Cho, S.-Y.; Leung, M.K.H. Recognising Human Actions by Analyzing Negative Spaces. IET Comput. Vis. 2012, 6, 197–213. [CrossRef] Rahman, S.A.; Song, I.; Leung, M.K.H.; Lee, I.; Lee, K. Fast Action Recognition Using Negative Space Features. Expert Syst. Appl. 2014, 41, 574–587. [CrossRef] Fating, K.; Ghotkar, A. Performance Analysis of Chain Code Descriptor for Hand Shape Classification. Int. J. Comput. Graph. Animat. 2014, 4, 9–19. [CrossRef] Tahir, N.M.; Hussain, A.; Samad, S.A.; Husain, H.; Rahman, R.A. Human Shape Recognition Using Fourier Descriptor. J. Electr. Electron. Syst. Res. 2009, 2, 19–25. Toth, D.; Aach, T. Detection and Recognition of Moving Objects Using Statistical Motion Detection and Fourier Descriptors. In Proceedings of the IEEE International Conference on Image Analysis and Processing, Mantova, Italy, 17–19 September 2003; pp. 430–435.

Sensors 2016, 16, 1010

24. 25. 26.

27. 28. 29. 30.

31.

32.

33.

34. 35. 36.

37. 38.

39. 40.

41.

42.

43.

44.

22 of 23

Harding, P.R.G.; Ellis, T. Recognizing Hand Gesture Using Fourier Descriptors. In Proceedings of the IEEE International Conference on Pattern Recognition, Washington, DC, USA, 23–26 August 2004; pp. 286–289. Ismail, I.A.; Ramadan, M.A.; El danaf, T.S.; Samak, A.H. Signature Recognition Using Multi Scale Fourier Descriptor and Wavelet Transform. Int. J. Comput. Sci. Inf. Secur. 2010, 7, 14–19. Sun, X.; Chen, M.; Hauptmann, A. Action Recognition via Local Descriptors and Holistic Features. In Proceedings of the Workshop on Computer Vision and Pattern Recognition for Human Communicative Behavior Analysis, Miami, FL, USA, 20–26 June 2009; pp. 58–65. Schüldt, C.; Laptev, I.; Caputo, B. Recognizing Human Actions: A Local SVM Approach. In Proceedings of the IEEE International Conference on Pattern Recognition, Cambridge, UK, 23–26 August 2004; pp. 32–36. Laptev, I.; Lindeberg, T. Space-time Interest Points. In Proceedings of the IEEE International Conference on Computer Vision, Nice, France, 13–16 October 2003; pp. 432–439. Laptev, I. On Space-Time Interest Points. Int. J. Comput. Vis. 2005, 64, 107–123. [CrossRef] Wang, L.; Qiao, Y.; Tang, X. Motionlets: Mid-level 3D Parts for Human Motion Recognition. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Porland, OR, USA, 23–28 June 2013; pp. 2674–2681. Arandjelovi´c, O. Contextually Learnt Detection of Unusual Motion-Based Behaviour in Crowded Public Spaces. In Proceedings of the 26th International Symposium on Computer and Information Sciences, London, UK, 26–28 September 2011; pp. 403–410. Eum, H.; Lee, J.; Yoon, C.; Park, M. Human Action Recognition for Night Vision Using Temporal Templates with Infrared Thermal Camera. In Proceedings of the International Conference on Ubiquitous Robots and Ambient Intelligence, Jeju Island, Korea, 30 October–2 November 2013; pp. 617–621. Chunli, L.; Kejun, W. A Behavior Classification Based on Enhanced Gait Energy Image. In Proceedings of the IEEE International Conference on Network and Digital Society, Wenzhou, China, 30–31 May 2010; pp. 589–592. Liu, J.; Zheng, N. Gait History Image: A Novel Temporal Template for Gait Recognition. In Proceedings of the IEEE International Conference on Multimedia and Expo, Beijing, China, 2–5 July 2007; pp. 663–666. Kim, W.; Lee, J.; Kim, M.; Oh, D.; Kim, C. Human Action Recognition Using Ordinal Measure of Accumulated Motion. Eur. J. Adv. Signal Process. 2010, 2010, 1–12. [CrossRef] Han, J.; Bhanu, B. Human Activity Recognition in Thermal Infrared Imagery. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, San Diego, CA, USA, 25 June 2005; pp. 17–24. Lam, T.H.W.; Cheung, K.H.; Liu, J.N.K. Gait Flow Image: A Silhouette-based Gait Representation for Human Identification. Pattern Recognit. 2011, 44, 973–987. [CrossRef] Wong, W.K.; Lim, H.L.; Loo, C.K.; Lim, W.S. Home Alone Faint Detection Surveillance System Using Thermal Camera. In Proceedings of International Conference on Computer Research and Development, Kuala Lumpur, Malaysia, 7–10 May 2010; pp. 747–751. Youssef, M.M. Hull Convexity Defect Features for Human Action Recognition. Ph.D. Thesis, University of Dayton, Dayton, OH, USA, August 2011. Zhang, D.; Wang, Y.; Bhanu, B. Ethnicity Classification Based on Gait Using Multi-view Fusion. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, San Francisco, CA, USA, 13–18 June 2010; pp. 108–115. Rusu, R.B.; Bandouch, J.; Marton, Z.C.; Blodow, N.; Beetz, M. Action Recognition in Intelligent Environments Using Point Cloud Features Extracted from Silhouette Sequences. In Proceedings of the IEEE International Symposium on Robot and Human Interactive Communication, Munich, Germany, 1–3 August 2008; pp. 267–272. Kim, S.-H.; Hwang, J.-H.; Jung, I.-K. Object Tracking System with Human Extraction and Recognition of Behavior. In Proceedings of the International Conference on Management and Artificial Intelligence, Bali, Indonesia, 1–3 April 2011; pp. 11–16. Lee, J.H.; Choi, J.-S.; Jeon, E.S.; Kim, Y.G.; Le, T.T.; Shin, K.Y.; Lee, H.C.; Park, K.R. Robust Pedestrian Detection by Combining Visible and Thermal Infrared Cameras. Sensors 2015, 15, 10580–10615. [CrossRef] [PubMed] Tau 2. Available online: http://mds-flir.com/datasheet/FLIR_Tau2_Family_Brochure.pdf (accessed on 21 March 2016).

Sensors 2016, 16, 1010

45. 46.

47. 48. 49. 50. 51.

23 of 23

Infrared Lens. Available online: http://www.irken.co.kr/ (accessed on 31 March 2016). Jeon, E.S.; Choi, J.-S.; Lee, J.H.; Shin, K.Y.; Kim, Y.G.; Le, T.T.; Park, K.R. Human Detection Based on the Generation of a Background Image by Using a Far-Infrared Light Camera. Sensors 2015, 15, 6763–6788. [CrossRef] [PubMed] Gorelick, L.; Blank, M.; Shechtman, E.; Irani, M.; Basri, R. Actions as Space-Time Shapes. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 2247–2253. [CrossRef] [PubMed] Ikizler, N.; Forsyth, D.A. Searching for Complex Human Activities with No Visual Examples. Int. J. Comput. Vis. 2008, 80, 337–357. [CrossRef] The LTIR Dataset v1.0. Available online: http://www.cvl.isy.liu.se/en/research/datasets/ltir/version1.0/ (accessed on 21 February 2016). Precision and Recall. Available online: https://en.wikipedia.org/wiki/Precision_and_recall (accessed on 7 February 2016). Gehrig, D.; Kuehne, H.; Woerner, A.; Schultz, T. HMM-based Human Motion Recognition with Optical Flow Data. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots, Paris, France, 7–10 December 2009; pp. 425–430. © 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

Robust Behavior Recognition in Intelligent Surveillance Environments.

Intelligent surveillance systems have been studied by many researchers. These systems should be operated in both daytime and nighttime, but objects ar...
7MB Sizes 0 Downloads 6 Views