sensors Article

Build a Robust Learning Feature Descriptor by Using a New Image Visualization Method for Indoor Scenario Recognition Jichao Jiao *

, Xin Wang and Zhongliang Deng

School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China; [email protected] (X.W.); [email protected] (Z.D.) * Correspondence: [email protected]; Tel.: +86-010-6119-8559 Received: 21 May 2017; Accepted: 26 June 2017; Published: 4 July 2017

Abstract: In order to recognize indoor scenarios, we extract image features for detecting objects, however, computers can make some unexpected mistakes. After visualizing the histogram of oriented gradient (HOG) features, we find that the world through the eyes of a computer is indeed different from human eyes, which assists researchers to see the reasons that cause a computer to make errors. Additionally, according to the visualization, we notice that the HOG features can obtain rich texture information. However, a large amount of background interference is also introduced. In order to enhance the robustness of the HOG feature, we propose an improved method for suppressing the background interference. On the basis of the original HOG feature, we introduce a principal component analysis (PCA) to extract the principal components of the image colour information. Then, a new hybrid feature descriptor, which is named HOG–PCA (HOGP), is made by deeply fusing these two features. Finally, the HOGP is compared to the state-of-the-art HOG feature descriptor in four scenes under different illumination. In the simulation and experimental tests, the qualitative and quantitative assessments indicate that the visualizing images of the HOGP feature are close to the observation results obtained by human eyes, which is better than the original HOG feature for object detection. Furthermore, the runtime of our proposed algorithm is hardly increased in comparison to the classic HOG feature. Keywords: image feature extraction; indoor scenario recognition; HOG feature descriptor; feature visualization; sparse representation

1. Introduction Scene recognition is a highly valuable perceptual ability for indoor navigation, unmanned vehicles, UAV, or other kinds of mobile robots. However, current approaches for scene recognition present a significant drop in performance in the case of indoor scenes. Early work in computer vision attempted to achieve scene recognition by using supervised classifiers that directly operate over low-level image features, such as colour, texture, and shape. The main problem with these approaches has been their inability to choose reasonable image features from complex scenes. Sande and his colleagues used colour descriptors for object and scene recognition, which was efficient. However, this method was invariant to light intensity [1]. Pandey and Lazebnik utilized deformable part-based models for scene recognition, which outperformed some other state-of-the-art supervised approaches. However, this method cannot classify similar objects in low intensity indoor scenes [2]. Silberman and Fergus used a structured light depth sensor for assisting indoor scene segmentation. However, the depth sensor will fail to detect the accurate depth when the sensor is blocked by a wall or a pillar [3]. Li and Sumanaphan used the scale-invariant feature transformer

Sensors 2017, 17, 1569; doi:10.3390/s17071569

www.mdpi.com/journal/sensors

Sensors 2017, 17, 1569

2 of 16

(SIFT) to recognize bathrooms. However, they also indicated that the accuracy of their approach was low, making it difficult to detect a bathroom in low-resolution images [4]. According to our research, feature extraction is an important step in order to improve scene recognition. The histogram of oriented gradient (HOG) is an excellent feature that describes the local image texture information, which is widely used in pedestrian detection. However, a large amount of background noise is introduced while obtaining the abundant texture information using HOG features, which induces the computer to make wrong judgments. In order to analyse the reasons for this, people began to understand the vision world of the computer from the view of a computer, which is achieved by visualizing image features. It is an important method for understanding the minds of computers and for locating the problems in computer vision. During our research into finding a useful feature for scene recognition, we found that it was difficult to find the reasons that result in errors in object classification and recognition. At present, visualization is a popular method for analysing progress and discovering the shortcomings of feature extraction methods [5,6]. The goal of information visualization is generally defined as providing useful tools and techniques for gaining insight into and understanding of the computer vision system or, more generally, to amplify cognition [7]. These are high-level cognitive issues that are difficult to measure with quantitative user studies. Tory and Moller, in their summary of expert reviews, recommend the use of visualization evaluation for analysing computer vision systems [8]. Moreover, inspired by further studies [9,10], we introduce this method to evaluate and find the problem resulting in recognition error by using the descriptor we built. Despite this encouraging progress, there is still little insight into the internal operation and behaviour of these complex models, or how they achieve such good performance. From a scientific standpoint, this is deeply unsatisfactory. Without a clear understanding of how and why they work, the development of better feature descriptors is reduced to trial and error. Therefore, we introduce the visualization approach for evaluating our proposed feature descriptor. In this paper, we introduce a visualization technique that reveals the input stimuli that excite individual image features. It also allows us to observe the evolution of features during detection and recognition in indoor scenarios. The visualization technique we propose uses a dictionary library and a new HOG-based feature to project the feature activations back to the input pixel space. The rest of this paper is organized as follows: Section 2 discusses relevant previous work on the HOG-based method and the visualization-based evaluation method; Section 3 presents the mathematical framework behind our model to achieve HOGP extraction and its visualization; Section 4 presents an evaluation of the proposed method and a comparison with state-of-the-art approaches; Finally, Section 5 presents the main conclusions of this work and future avenues of our research. 2. Related Work In recent decades, image classification technology has been widely used in the scenes and tasks of computer vision, which includes object recognition, scene classification, object tracking, pedestrian detection, and so on. Several of the most widely used feature descriptors are local binary pattern (LBP) [11], HOG [12], and scale-invariant feature transformation (SIFT) [13]. LBP features are usually applied to face detection and recognition [14,15]. HOG features are often used for pedestrian detection and tracking [12,16]. SIFT features are more widely used in target recognition and location. Moreover, the research for improving those descriptors has been achieved. Through the combination of HOG and LBP, the pedestrian detection accuracy was improved [17,18]. Additionally, PCA was used to reduce the dimensions and computational complexity of the SIFT feature descriptor [19]. These features and their improved versions are able to express the local image texture features. However, those feature descriptors were underdeveloped in image classification and recognition against complex backgrounds. Therefore, more details are needed to achieve high accuracy image classification and recognition. SIFT can express most of the image region information, but its computational complexity is still a problem. In contrast, HOG is more effective than SIFT in large-scale target detection, because

Sensors 2017, 17, 1569

3 of 16

HOG indicates less region information than SIFT. However, HOG is hardly beneficial in small-scale target detection [20]. According to our previous work, we found that image colour is useful in describing image features. In this paper, we introduce PCA to extract the principal colour information from image pixels. Then the PCA information is combined with the HOG descriptor. Consequently, the new feature descriptor, named HOGP, can simultaneously contain the regional texture and colour information, which makes it clearer for expressing small and dim objects than other HOG-based methods. Trung and his co-authors project the HOG features into the linear subspace by PCA in order to locate the objects in images [21]. However, the runtime of this method will decrease for high-resolution images. López and his colleagues proposed an approach for vehicle–pedestrian object classification based on the HOG–PCA feature space, instead of the frame coordinates [22]. Savakis and his co-authors created an efficient eye detector based on HOG–PCA features that reduces feature dimensionality compared to the dimensionality of the original HOG feature [23]. Kim and his colleagues introduced the pedestrian detection system by using the HOG–PCA feature, which improved the detection rate [24]. Agrawal and Singh developed a novel framework for a face recognition system that outperforms in an unconstrained environment, in which the HOG–PCA feature was used [25]. Dhamsania and Ratanpara implied that action recognition can be estimated by computing the PCA–HOG descriptor of the tracking region in frames. The experiments were performed for two games: hockey and soccer [26]. However, those methods indicated that the computational time should be considered carefully. Moreover, the image visualization method is widely used to evaluate the performance of image feature descriptors. In a further study [27], the authors proposed an image feature visualizing method for assessing their improved SIFT features by finding similar blocks in the input database and then seamlessly splicing and smoothing them. Vondrick visualized HOG features and convolutional neural network (CNN) features through a dictionary-based sparse representation method [28]. Mahendran visualized CNN features, SIFT features, and the features combined with CNN and HOG [29]. In their study [30], based on a qualitative visualization approach, Chu and his colleagues argued that the residual networks challenged our understanding that CNNs learn layers of local features. Therefore, we introduced the visualization method to evaluate the performance of the different features that are used in our experiment. 3. Algorithm According to a previous study [28], we visualized Figure 1a. Then, we found that the contour detected by the computer was like a car from the computer’s perspective. Moreover, according to our test, the recognition result could not be changed, even by increasing the training samples or using other classification methods. After careful analysis, we suggested that the choice of the image feature descriptor did not distinguish the water waves and the car’s textures. That was because the Euclidean distance between the feature of the water wave and the feature of the car was closer than the distance between the feature of the water wave and other kinds of image features during the image classification. On the contrary, the human eye can easily distinguish the water wave from the noisy water surface. The flowchart of our algorithm is shown in Figure 2. In the first place, the HOG features are extracted from the input image. Meanwhile, the PCA features of the image’s colourful information are achieved [31]. After that, those two features are combined by cascade, which results in the formation of the HOGP features. Finally, the sparse dictionary method is introduced to visualize the HOGP features. It is noted that HOGP generation is inspired by Bajwa [32]. Bajwa argued that the pixel value of the image was a kind of feature and was also the most direct formation of the feature. Almost all of the image information can be expressed by the pixel values, which are named “direct features”, in our paper. The existing feature descriptors are representative and are powerful in certain backgrounds, but some of the image details are lost. In this paper, we propose HOGP for increasing feature robustness without increasing the computation cost in the feature extraction.

Sensors 2017, 17, 1569 Sensors 2016, 2016, 16, 16, 1569 1569 Sensors

4 of 16 of 17 17 44 of

(a) (a)

(b) (b)

(c) (c)

Figure 1. Comparison Comparison of our visualization visualization with other other visualization visualization methods. methods. (a) (a) Original Original image; image; (b) (b) Figure Figure 1. 1. Comparisonofofour our visualizationwith with other visualization methods. (a) Original image; HOG visualization visualization [28]; [28]; and and (c) (c) HOGP feature feature visualization. visualization. HOG (b) HOG visualization [28]; and HOGP (c) HOGP feature visualization.

Figure 2. Flowchart of our proposed algorithm. Figure 2. 2. Flowchart Flowchart of of our our proposed proposed algorithm. algorithm. Figure

3.1. Grey Value 3.1. Grey Grey Value Value 3.1. To increase illumination invariance and discriminative power, colour descriptors have been To increase increase illumination illumination invariance invariance and and discriminative power, power, colour colour descriptors descriptors have have been been To proposed recently [1]. Therefore, we obtain greydiscriminative values of images by arranging the image’s block pixel proposed recently [1]. [1]. Therefore, we we obtain obtain grey grey values of of images images by by arranging arranging the the image’s block block proposed values to arecently dimensionalTherefore, vector. We visualize directvalues features to validate that this featureimage’s could restore pixel values to a dimensional vector. We visualize direct features to validate that this feature could pixel values to a dimensional vector. WeThe visualize direct features tobriefly validate that thisasfeature could the image information to a high degree. visualization method is described follows: restore the the image image information information to to aa high high degree. degree. The The visualization visualization method method is is briefly briefly described described as as restore Step 1: Build a dictionary library for the direct feature; follows: follows: Step 2: Find the first p blocks with the smallest Euclidean distance by the K-nearest neighbour Step 1: 1: Build Build aa dictionary dictionary library library for for the direct direct feature; feature; Step (KNN) algorithm that was proposed by a the previous study [33]; Step 2: Find the first p blocks with the smallest Euclidean distance distance by by the the K-nearest K-nearest neighbour neighbour Step 2: Compute Find the first p blocks with the smallest Euclidean Step 3: the weighted average of the p image blocks to obtain a visualization image. (KNN) algorithm algorithm that was was proposed proposed by by aa previous previous study study [33]; [33]; (KNN) According tothat our experimental results, the method can highly restore the colour information of Step 3: 3: Compute Compute the the weighted weighted average average of the the pp image image blocks blocks to to obtain obtain aa visualization visualization image. image. Step the original image. However, it is clear thatof this colour feature does not have good robustness, but According to to our our experimental experimental results, results, the the method method can can highly highly restore restore the colour colour information information of of According rather poor noise immunity. Moreover, the direct feature shows little rewardthe in obtaining the gradient the original image. However, it is clear that this colour feature does not have good robustness, but the original image. it isfeature. clear that this colour feature does good robustness, but information, like theHowever, HOG-based Therefore, we extracted the not PCAhave information of the colour rather poor poor noise noise immunity. immunity. Moreover, Moreover, the the direct direct feature feature shows shows little little reward reward in in obtaining obtaining the the rather features, but did not use the colour features directly. gradient information, information, like like the the HOG-based HOG-based feature. feature. Therefore, Therefore, we we extracted extracted the the PCA PCA information information of of gradient the colour features, but did not use the colour features directly. the features, but did not use the colour features directly. 3.2. colour HOG Feature Descriptor The HOG feature uses the image’s local gradient information to describe an image. The specific 3.2. HOG HOG Feature Descriptor 3.2. Feature Descriptor steps for extracting HOG features are outlined as follows: The HOG HOG feature uses uses the image’s image’s local gradient information toillumination describe an an image. image. The specific specific The feature the gradient to describe Step 1: Normalize the gamma spacelocal to reduce theinformation impact of the change:The steps for for extracting extracting HOG HOG features features are are outlined outlined as as follows: follows: steps Step 1: Normalize the gamma space to reduce impact of the the illumination illumination change: change: I ( x, y) = the Ithe ( x,impact y)λ (1) Step 1: Normalize the gamma space to reduce of 

,, yy)) and II((xx,λ, yyis)) assumed to be 0.5 in this paper. where I is the image, ( x, y) is the location ofII(a(xxpixel, where II is is the the image, image, where

((xx,, yy))

is the the location location of of aa pixel, pixel, and and  is is assumed assumed to to be be 0.5 0.5 in in this this paper. paper. is

(1) (1)

Sensors 2016, 16, 1569

5 of 17

Step 2: Compute the image gradient:

Sensors 2017, 17, 1569

5 of 16 2

G ( x, y )  ( I ( x  1, y )  I ( x  1, y ))  ( I ( x , y  1)  I ( x , y  1))

Step 2: Compute the image gradient: G ( x, y) =

where G ( x, y ) and respectively.

q

 I ( x, y  1)  I ( x, y  1) 

2

 ( x, y )  tan 1    I1,( xy))1,2 y+)  ( I ( x + 1, y) − I ( x − ( II((x,x y 1, +y1))− I ( x, y − 1))2

(2)

(3) (2)

 ( x, y ) represent thegradient magnitude and  the direction of the pixel,

I ( x, y + 1) − I ( x, y − 1) (3) I ( x + 1, y) − I ( x − 1, y) Step 3: Divide the image into cells whose size are wcell  ncell . It is noted that each cell is where G ( x, y) and α( x, y) represent the gradient magnitude and the direction of the pixel, respectively. composed of 8  8 pixels in our experiment. Additionally, the stride of the moving block was chosen Step 3: Divide the image into cells whose size are wcell × ncell . It is noted that each cell is composed to be eight pixels in our proposed method. The gradient directions of the region pixels are divided of 8 × 8 pixels in our experiment. Additionally, the stride of the moving block was chosen to be eight [0,180of] .the into eight bins, which method. are evenly over According to the the pixels in our proposed Thespaced gradient directions region pixels aregradient divided directions, into eight bins, α( x, y) = tan−1

gradient of over the regional pixels are to voted into those eight bins. Then, the gradient which aremagnitudes evenly spaced [0, 180◦ ]. According the gradient directions, the gradient magnitudes magnitudes of pixels the eight areeight added, and the the binsgradient are arranged to form of the regional are direction voted intobins those bins. Then, magnitudes of an theeighteight dimensional feature vector. direction bins are added, and the bins are arranged to form an eight-dimensional feature vector. Step cellsinto intoaablock. block. In In this this paper, paper, each each cell cell contains contains an eight-dimensional  22 cells Step 4: 4: Combine Combine 22 × feature vector. Then, four cells that include eight-dimensional feature vectors are arranged to form a 32-dimensional feature vector. In this thisway, way,thethe spatial relationship among in aregion largeris region erected. The 32spatial relationship among pixelspixels in a larger erected.isThe 32-dimensional dimensional vector istonormalized to form a HOG feature. feature vectorfeature is normalized form a HOG feature. 3.3..HOGP 3.3 HOGPFeature FeatureDescriptor Descriptor PCA provides provides better spatial We extracted the PCA PCA information information of of the the pixel pixel grey grey values. values. PCA information. PCA uses an orthogonal transformation to convert a set of observations of information. PCA uses an orthogonal transformation to convert a set of observations of possiblypossibly-correlated a set of linearly uncorrelated that are called correlated variables variables into a set into of values of values linearlyofuncorrelated variables variables that are called principal principal components, which are shown in Figure 3. Therefore, in our work, we applied PCA along components, which are shown in Figure 3. Therefore, in our work, we applied PCA along with the with the selection of maximum pixel intensity create a new feature descriptor. Theyielded method selection of maximum pixel intensity to create atonew feature descriptor. The method anyielded image an image with less structural to theimages sourcealong images along low contrast and luminescence. with less structural similaritysimilarity to the source with lowwith contrast and luminescence.

Figure 3. Projecting data in two orthogonal directions. Figure 3. Projecting data in two orthogonal directions.

Let the the dataset dataset be matrix DD= { X2 ,,..., . . .X , X N}}, ,NNisisthe thenumber numberof ofsamples, samples, and and each each {X X11,, X Let be an an M M× N N matrix 2 N sample contains an M-dimensional feature vector Xi = { x1 , x2 , . . . , x M }. The calculation of principal sample contains an M-dimensional feature components consists the following three steps.vector Xi {x1, x2,..., xM} . The calculation of principal components consists the three steps. Step 1: Compute thefollowing covariance matrix C by using the following function: Step 1: Compute the covariance matrix C by using the following function: 1 T C= ( D − D )( D − D ) (4) N−1

Sensors 2016, 16, 1569

Sensors 2017, 17, 1569

6 of 17

C 

1 ( D  D )( D  D ) T N 1

6(4) of 16

Step 2: Obtain the transformation matrix W . First of all, the eigenvalues and eigenvectors of the covariance matrixthe C are computed. Then, eigenvectors areall, arranged accordingand to the eigenvalues.ofInthe Step 2: Obtain transformation matrix W. First of the eigenvalues eigenvectors the following, theCcontribution rate Then, of the eigenvectors eigenvalue is are computed. Finally, the eigenvectors of the covariance matrix are computed. arranged according to the eigenvalues. first K eigenvalues are selected to compose the PCA feature. In the following, the contribution rate of the eigenvalue is computed. Finally, the eigenvectors of the Step 3: Transform dataset D from an first K eigenvalues are selected to compose the M-dimensional PCA feature. space to a K-dimensional space by transforming matrix W : Step 3: Transform dataset D from an M-dimensional space to a K-dimensional space by transforming matrix W: (5) Y WT T D Y=W D (5) where Y is a dataset after dimension reduction. where YThe is a dataset dimension reduction. × h after pixels corresponding to each HOG block are selected as the input dataset, w Thewidth w pca × pixelsblock, corresponding to the each HOGofblock are selected as the input of dataset, w pca is pca HOG is the ofhthe and h is height the HOG block. Each column the dataset theiswidth of the HOG and hispcaa is the height of the HOG block. Each columnh of the dataset a sample, and eachblock, pixel value feature. In other words, each sample includes features. It is a sample, and each pixel value is a feature. In other words, each sample includes h features. It is is noted that both w and h are 16 in our experiment. pca noted that both w pca arethe 16 in our experiment. Moreover, weand findh pca that contribution rate of the first two-dimensional eigenvalues is Moreover,80% we find that the contribution ratecontribution of the first two-dimensional eigenvalues is commonly commonly larger after calculating the rate of the eigenvalue of the dataset. Consequently, we select the eigenvectors’ first two-dimensional eigenvalues as the transformation 80% larger after calculating the contribution rate of the eigenvalue of the dataset. Consequently, we matrix. After that, thefirst dataset Y is called the eigenvalues PCA feature.as the transformation matrix. After that, the select the eigenvectors’ two-dimensional we the combine HOG feature with the PCA feature in cascade, in order to constitute a dataset Finally, Y is called PCA the feature. new feature named HOGP, using Equation Finally, we combine the by HOG feature with(6): the PCA feature in cascade, in order to constitute a

new feature named HOGP, by using Equation HOGPp ( x , y(6): )  Y p ( x , y ), H p ( x , y ) (6)  HOGP yth ) =image Yp ( x, y), Hand p ( x, p p ( x, y where Yp ( x, y) is the PCA feature of the patch, H) p ( x, y ) is the HOG feature of the(6)

pth Yimage patch. Therefore, we can find that HOGP is a coupling feature vector which includes two where p ( x, y ) is the PCA feature of the pth image patch, and H p ( x, y ) is the HOG feature of the pthelements. image patch. Therefore, we can find that HOGP is a coupling feature vector which includes Equation (6) indicates that HOPG can achieve better performance than HOG in object two elements. recognition of thethat introduction the PCA feature. Therefore,than compared to object HOG, recognition HOGP is Equation because (6) indicates HOPG canofachieve better performance HOG in more salient, which means that HOGP is more efficient than HOG in complex environments that because of the introduction of the PCA feature. Therefore, compared to HOG, HOGP is more salient, include rotation, scale, and illumination. The process of creating a HOGP feature is shown in which means that HOGP is more efficient than HOG in complex environments that include rotation, Figure scale, and4.illumination. The process of creating a HOGP feature is shown in Figure 4.

Figure HOGPfeature. feature. Figure4.4.The Theframework framework for for building building aa HOGP

3.4.3.4. Visualization Method Visualization Method WeWe useuse anan effective sparse the image imagefeatures. features.The Thespecific specific steps effective sparsedictionary dictionarypair pair to to visualize visualize the steps areare expressed follows: expressed as as follows: M Step 1: Obtain image features. LetI be an RGB original image which is divided I anbeRGB Step 1: Obtain thethe image features. Let original image which is divided intointo M patches I =patches m Irepresents the pixelthe number of eachofpatch Ii thatIiisthat an m-dimensional space. { I1 , I2 , .I.. ,{IIM }2.,..., pixel number each patch is an m-dimensional 1, I M }. m represents n means the n P= f ( I ) ∈ R image is packed into an n-dimensional descriptor vector. The transformation space.i P  f ( I i )  R means the image is packed into an n-dimensional descriptor vector. The method is to extract image features. There are many methods to extract image features, such as HOG, transformation method is to extract image features. There are many methods to extract image SIFT, or HOGP. features, such as HOG, SIFT, or HOGP. Step 2: 2: Construct to obtain obtainthe thevisualization visualizationimage image using Step Constructa adictionary dictionarylibrary. library. Our Our aim aim was was to byby using the image features, so we built a feature dictionary FD = { P1 , P2 , . . . , PK } where each element Pi the image features, so we built a feature dictionary F D  { P , P2 , ..., PK } where each element Pi corresponds to an image patch, and K represents the number of1 the element. ID = { I1 , I2 , . . . , IK } is the image patch dictionary of the corresponding feature dictionary. Step 3: Complete the visualization by transferring the coefficients, which is shown in Equation (7). From Figure 5, we find that the combination of the features of each image patch that need to be visualized and the features in the dictionary are types of linear weighted processes.

Sensors 2016, 16, 1569

7 of 17

Sensors 2017, 17, 1569

7 of 16

corresponds to an image patch, and K represents the number of the element. ID  { I 1 , I 2 , ..., I K } is the image patch dictionary of the corresponding feature dictionary. p = αFD, i = αID (7) Step 3: Complete the visualization by transferring the coefficients, which is shown in Equation (7). From Figure 5, we find that the combination of the features of each image patch that where α = {α1 , α2 , . . . , αK } is a weight vector, p is the feature of a visualized image patch, and i is the need to be visualized and the features in the dictionary are types of linear weighted processes. visualization image.

=

1

+

1

+

2

2

+...+  k

+...+  k

=

Figure 5. Process of the sparse representation of a dictionary pair. Figure 5. Process of the sparse representation of a dictionary pair.

We obtain the feature visualization by solving an optimization problem, which includes two p coefficient FD , i  through (7)  ID optimizing procedures. One is to find out the the feature dictionary:

where

  {1,2 ,...,K }

p 2 feature of a visualized image patch, and i is is a weight vector, argmin s.t.||α||1 ≤ ε ||αFD − pis||the 2 (8) K

α∈ R the visualization image. We obtain the featureisvisualization by solving an optimization problem, which includes two Another optimization to use the obtained coefficient α to visualize the image patch. In order to optimizing One is to find out the coefficient through the feature dictionary: express the procedures. process of the optimization problem, we re-wrote Equation (8) and the new function is

shown as follows:

argmin ||  FD  p ||22 M

 R K



s.t.||  ||1  

(8)



arg min ∑ ||αFD − p||22 + ||αID − i ||22 Another optimization Kis to i =1use the obtained coefficient α∈ R

s.t.||α||1 ≤ ε (9) to visualize the image patch. In order to express the process of the optimization problem, we re-wrote Equation (8) and the new function is According to Equation (9) we find that α can be obtained by minimizing the above function. shown as follows: 4. Results

M

arg min  ||  FD  p ||22  ||  ID  i ||22   R K

i 1

s.t.||  ||1  

(9)

4.1. Experiment Environment According to Equation (9) we find that  can be obtained by minimizing the above function. We used a visualization method to assess the analysis of the experimental results. Part of our dataset comes from PASCAL [34] and another part comes from our collection based on a smartphone 4. Results camera. Example images from the database are shown in Figure 6. 4.1. Experiment Environment 4.2. HOGP Feature Visualization We used a visualization method to assess the analysis of the experimental results. Part of our Based on from the direct features, the visualization resultsfrom of the direct imagebased features shown in dataset comes PASCAL [34] and another part comes our collection on aare smartphone Figure According to Figure wedatabase find thatare theshown method highly camera.6.Example images from6,the incan Figure 6. restore the colour information of the original image. From Figure we find that the colour features coupled with PCA can recover a large amount of 4.2. HOGP Feature6,Visualization the original colour information. Figure 6a is the image that was used by a previous study [28], and this on the with directthe features, the visualization imageselected featuresfrom are shown in resultBased is compared result obtained by HOGP.results Figureof 6bthe is adirect baby image PASCAL. Figure 6. toimage. Figure 6,Figure we find theimage method canwith highly restore information of Figure 6cAccording is the Lena 6dthat is an filled noise duethe to colour being taken at night. the original image. Figure 6e is a desktop, where there is much foreground information and a simple background, since a From Figure 6,easily we find that the colour features coupled with PCA recover large amount of simple background introduces noise into a visualization. Figure 6fcan is an indooraimage. However, the shortcomings original colourofinformation. is also the image thatwhich was used bythe a previous [28], and the this kind of Figure feature6a are obvious, is that regional study connectivity is this result is compared with the result obtained by HOGP. Figure 6b is a baby image selected from underdeveloped and the dimension is too high because of the lack of texture information. PASCAL. Figure 6c is the Lena image. Figure 6d is an image filled with noise due to being taken at night. Figure 6e is a desktop, where there is much foreground information and a simple background,

Sensors 2016, 16, 1569

8 of 17

since a simple background easily introduces noise into a visualization. Figure 6f is an indoor image. However, the shortcomings of this kind of feature are also obvious, which is that the regional connectivity is underdeveloped and the dimension is too high because of the lack of texture Sensors 2017, 17, 1569 8 of 16 information.

(a)

(b)

(c)

(d)

(e)

(f)

Figure 6. Direct feature visualization results. The upper row is the original image and the next row is

Figure 6. Direct feature visualization results. The upper row is the original image and the next row is the visualized image. (a) Man; (b) baby; (c) Lena; (d) boy; (e) desktop; (f) indoor. the visualized image. (a) Man; (b) baby; (c) Lena; (d) boy; (e) desktop; (f) indoor.

4.3. Qualitative Results Based on Feature Visualization

4.3. Qualitative Results Based on Feature Visualization

Based on the sparse representation of the dictionary pair, we obtain the visualization results of the HOGP-based and HOG-based features. Comparing thepair, HOGP image to the of Based on the sparse representation of the dictionary wefeature obtainvisualization the visualization results HOG feature visualization image, the results indicate that HOG and HOGP features can both obtain the HOGP-based and HOG-based features. Comparing the HOGP feature visualization image to the texture informationimage, and have good robustness. However, HOGP features can highlight HOGrich feature visualization the results indicate that HOG and HOGP features can boththe obtain theme and retain better colour information, which achieves better performance than the HOG rich texture information and have good robustness. However, HOGP features can highlight the theme features in image classification and recognition. Additionally, the HOGP features introduce less noise and retain better colour information, which achieves better performance than the HOG features in than the HOG feature. Therefore, the HOGP features can make the computer see objects more clearly imagethan classification and recognition. Additionally, the HOGP features introduce less noise than the the HOG features.

HOG feature. Therefore, the HOGP features can make the computer see objects more clearly than the Quantitative Evaluation in the Image Feature Matching Rate HOG4.4. features.

As mentioned in the previous section of this paper, HOGP can improve performance in scene recognition. Therefore, in order to demonstrate the robustness of our proposed feature descriptor, HOGP, the image matchingsection rate is introduced in our paper,can which was inspired by a previous As mentioned infeature the previous of this paper, HOGP improve performance in scene study [35]. Feature matching is the comparison of two sets of feature descriptors obtained from recognition. Therefore, in order to demonstrate the robustness of our proposed feature descriptor, different images to provide point rate correspondences between which is shown in Equation (9). HOGP, the image feature matching is introduced in our images, paper, which was inspired by a previous

4.4. Quantitative Evaluation in the Image Feature Matching Rate

study [35]. Feature matching is the comparison ofNtwo sets of feature descriptors obtained from corr rate  100% matching (10) different images to provide point correspondences minbetween n , n images, which is shown in Equation (9).



ref

sen



N features, where corr is the number of the pair of=matched is the number of features of the (10)  corr  ×ref100% rate matching reference image, and sen is the number of features off the sensed image. min nre , nsen In this test, HOG and HOGP were used to extract the image features from the database. In our test, the matching method, the features, Hammingndistance, was implemented. Additionally, where Ncorr isexisting the number of the pair ofnamed matched re f is the number of features of the reference a statistically robust method, named RANSAC, was used to filter outliers in matched feature sets, image, and nsen is the number of features of the sensed image. which useful in object detection. In this istest, HOG and HOGP were used to extract the image features from the database. In our HOGP is robust to luminous intensity because it uses PCA to find key colour features. From test, the existing matching method, named the Hamming distance, was implemented. Additionally, Figure 7, we find that the middle figure was obtained by visualizing the HOG features of the left a statistically robust method, named RANSAC, was used to filter outliers in matched feature sets, figure, and the right figure was obtained by visualizing the HOGP features. Comparing those two whichvisualization is useful in results, object detection. we found that HOGP makes the computer see the image more clearly. From HOGP is robust luminous because uses PCAinto find key colour features. Table 1, we find thatto both HOG and intensity HOGP obtained gooditperformance different scenes. Especially From Figure 7, we find that the middle figure was obtained by visualizing the HOG features of the left figure, and the right figure was obtained by visualizing the HOGP features. Comparing those two visualization results, we found that HOGP makes the computer see the image more clearly. From Table 1, we find that both HOG and HOGP obtained good performance in different scenes. Especially in pedestrian detection, the feature matching rate can achieve more than 76.3% with different illumination, which indicates that HOGP is robust to illumination change. Moreover, we find that HOGP achieved better performance than HOG in almost all of the scenes, even if the illumination intensity changed.

Sensors 2016, 16, 1569

9 of 17

in pedestrian detection, the feature matching rate can achieve more than 76.3% with different illumination, which indicates that HOGP is robust to illumination change. Moreover, we find that HOGP achieved better performance than HOG in almost all of the scenes, even if the illumination Sensors 2017, 17, 1569 9 of 16 intensity changed.

Figure 7. Cont.

Sensors 2017, 17, 1569 Sensors 2016, 16, 1569

10 of 16 10 of 17

Figure Comparisonofofthe thevisualization visualization results results of of HOG column is the Figure 7. 7. Comparison HOG and andHOGP HOGPfeatures. features.The Thefirst first column is the original image, images the secondcolumn columnare arevisualized visualizedby bythe theHOG HOGfeatures, features, and and the the images images in original image, thethe images inin the second the third column are visualized by the HOGP features. theinthird column are visualized by the HOGP features.

It is noted that we divided our test images into four scenes: human facial, pedestrian, indoor It is noted that divided our 6test images into fourofscenes: indoor environment, and we animals. Figure shows an example our testhuman image.facial, In ourpedestrian, test, each scene environment, and animals. Figure 6 shows an example of our test image. In our test, each included 100 images. Moreover, different illumination changes were added to the test images. Itscene is included 100three images. different were added the testan images. It is noted that kindsMoreover, of luminous intensityillumination are added tochanges our test image. Figure 8todisplays example noted that three with kindsdifferent of luminous intensity are added to our test image. Figure 8 displays an example of the images luminous intensity. of the images with different luminous intensity. Given the feature-matching results, Figure 9 shows the comparison of the results between HOG and HOGP. Figure 9 shows the object detection results in different scenarios, which indicates that our proposed method can detect all of the objects in the images. Figure 9a–d implies that our HOGP feature descriptor is robust to illumination change. Figure 9e displays the performance of detecting similar and small objects based on HOGP features. Figure 9f gives the detection result in a complex indoor scenario that includes humans, computers, furniture, and so on.

Sensors 2017, 17, 1569 Sensors 2016, 16, 1569 Sensors 2016, 16, 1569

11 of 16

11 of 17 11 of 17

(a) (b) (c) (d) (a) (b) (c) (d) Figure 8. An example of an image with different luminous intensities. This is the indoor environment Figure 8. An example of an image with different luminous intensities. This is the indoor environment Figure 8. An example of an image withbrightness; different luminous intensities. (d) This140% is thebrightness. indoor environment of our lab room. (a) Baseline; (b) 80% (c) 120% brightness; of of our lablab room. 120% brightness; brightness;(d) (d)140% 140%brightness. brightness. our room.(a)(a)Baseline; Baseline;(b) (b)80% 80%brightness; brightness; (c) (c) 120%

(a) (a)

(b) (b)

(c) (c)

(d) (d)

(e) (f) (e) (f) Figure 9. Object detection under different backgrounds based on our proposed method. (a) Baseline; Figure Object detection under different backgrounds based (e) on our proposed (a)(f) Baseline; (b) 80%9.brightness; (c) 120% brightness; 140% brightness; objectmethod. detection; indoor Figure 9. Object detection under different(d) backgrounds based ondesktop our proposed method. (a) Baseline; (b) 80% brightness; (c) 120% brightness; (d) 140% brightness; (e) desktop object detection; (f) indoor object detection. (b) 80% brightness; (c) 120% brightness; (d) 140% brightness; (e) desktop object detection; (f) indoor object detection. object detection.

Given the feature-matching results, Figure 9 shows the comparison of the results between HOG Given the feature-matching Figure 9 shows comparison of the results and HOGP. Figure 9 shows the results, object detection results the in different scenarios, whichbetween indicatesHOG that summary of the results is displayed in Figure 10 images. of scenes with implies three kinds of luminous and HOGP. Figure 9 shows the object detection results in four different scenarios, which indicates that ourAproposed method can detect all of the objects in the Figure 9a–d that our HOGP intensity. In the Figure 10can and Table each isin the average ofFigure 100 independent with different our proposed method detect all1,of the point objects the images. 9a–d impliesruns thatof our HOGP feature descriptor is robust to illumination change. Figure 9e displays the performance detecting feature descriptor is robust to illumination change. Figure 9e gives displays the performance detecting random samples. Thebased horizontal axis features. represents the scene that classified. The axis similardata and small objects on HOGP 9f thewas detection result inofavertical complex similar and objects based on HOGP features. Figure 9f and gives detection result a complex represents thesmall average results given bycomputers, the feature matching score Figurein10a indicates indoor scenario that includes humans, furniture, sothe on.criterion. indoor scenario includes humans,the computers, furniture, and so on. rate. Those results imply that that the larger thethat luminous intensity, lower the feature matching illumination change is an important factor that affects feature matching performance. Figure 10b

Sensors 2017, 17, 1569

12 of 16

shows that HOG obtained a better performance than HOGP only in pedestrian detection. However, the feature-matching rate of HOGP (86.9%) is close to that of HOG (87.3%). Figure 10d shows that HOG feature matching was 0.8% larger than that of HOGP in human facial detection when the brightness Sensors 16, 1569 12 of 17 was 1.42016, times that of the baseline, which implies that HOGP is more efficient than HOG.

(a)

(b)

(c)

(d)

Figure 10. The comparison of results between HOG and HOGP for four kinds of scene. (a) Matching Figure 10. The comparison of results between HOG and HOGP for four kinds of scene. (a) Matching result for baseline; (b) matching result for 80% brightness; (c) matching result for 120% brightness; result for baseline; (b) matching result for 80% brightness; (c) matching result for 120% brightness; matchingresult resultfor for140% 140%brightness. brightness. (d)matching (d)

A summary of the results is displayed in Figure 10 of four scenes with three kinds of luminous Furthermore, we assessed proposed on rotated andindependent scaled images, which intensity. In the Figure 10 andour Table 1, each method point isbased the average of 100 runs with are used to improve the efficiency performance of HOGP in a complex environment. In order to different random data samples. The horizontal axis represents the scene that was classified. The improve the robust performance of HOGP, we rotated the test images with five different rotation vertical axis represents the average results given by the feature matching score criterion. Figure 10a o , 15o , 20o , 25o , 30o }. Additionally, we scaled the test images with five different values θ = , 10larger {5othe indicates that the luminous intensity, the lower the feature matching rate. Those results values = {illumination 0.6, 0.8, 0.9, change 1.2, 1.4,is1.5 Figures 11factor and 12 show the feature object detection for the imply κthat an}.important that affects matchingresults performance. o ), and scaled images (κ = 0.8) by using HOGP, respectively. rotated images (θ = 5 Figure 10b shows that HOG obtained a better performance than HOGP only in pedestrian detection. Figurethe 11 indicates the objectrate detection results of theisrotated images based on(87.3%). HOGP. Compared However, feature-matching of HOGP (86.9%) close to that of HOG Figure 10d to Figure 9a,e, and f we find that all of the same objects in Figure 9 were also detected image shows that HOG feature matching was 0.8% larger than that of HOGP in human when facialthe detection o was rotated to the right 5 times . Additionally, to Figure 13a, we thatisHOGP obtains when the brightness wasby1.4 that of theaccording baseline, which implies thatfind HOGP more efficient better performance than HOG in all of the scaled images in the matching rate. Figure 12 displays than HOG. the object detectionwe results for the images withbased κ = 0.8. Compared to Figure 9a,e, which and f, are all Furthermore, assessed our scaled proposed method on rotated and scaled images, of theto same objects Figure 9performance were detected, even though the image is zoomedInout. Furthermore, used improve theinefficiency of HOGP in a complex environment. order to improve Figure 13b shows that HOGP achieves higher matching rates than HOG in all of the scaled the robust performance of HOGP, we rotated the test images with five different rotationimages. values Furthermore, Figure 13 also shows that the curve of HOGP’s matching rate is smoother than that of o o , ,  = 5which , 10o, 15 20o, 25oHOGP 30o . isAdditionally, we scaled theobject test images with five different values HOG, implies that more robust than HOG in detection from scaled and rotated images. Considering these results, we can conclude that HOGP performs well and provides a robust 0.6, 0.8, 0.9, 1.2, 1.4, 1.5 . Figure 11 and Figure 12 show the object detection results for the rotated and powerful alternative to object detection in complex scenes. o images (  =5 ) , and scaled images ( 0.8) by using HOGP, respectively.





Sensors 2016, Sensors 2017, 17, 16, 15691569 Sensors 2016, 16, 1569

1316 of 17 13 of

13 of 17

Sensors 2016, 16, 1569

13 of 17

(a) (b)(b) (c) (c) (a) (a) (b) (c) Figure 11. The object detection results using HOGP for rotated scenes. (a) Indoor scene; (b) desktop Figure object detection results using HOGP rotated scenes. Indoor scene; desktop Figure 11. 11. TheThe object detection results using HOGP forfor rotated scenes. (a) (a) Indoor scene; (b)(b) desktop Figure 11. The object detection results using HOGP for rotated scenes. (a) Indoor scene; (b) desktop object detection; (c) indoor object detection. object detection; (c) indoor object detection. object detection; (c) indoor object detection. object detection; (c) indoor object detection.

(a) (b) (c) (a) (a) (b)(b) (c) (c) Figure 12. The object detection results using HOGP for scaled scenes. (a) Indoor scene; (b) desktop Figuredetection; 12.The Theobject object detection results using HOGP scaled scenes. (a) Indoor scene; (b) desktop Figure 12. detection results using HOGP for for scaled scenes. (a) Indoor scene; (b) desktop object (c) indoor object detection. Figure 12. The object detection results using HOGP for scaled scenes. (a) Indoor scene; (b) desktop object detection. objectdetection; detection;(c) (c)indoor indoorobject object detection. object detection; (c) indoor object detection.

(a) (a)

(a) Figure 13. Cont.

Sensors 2017, 17, 1569 Sensors 2016, 16, 1569

14 of 1416 of 17

(b) Figure comparison result between HOG and HOGP with respect to the feature matching rate. Figure 13. 13. TheThe comparison result between HOG and HOGP with respect to the feature matching rate. (a) (a) Rotated images; (b) scaled images. Rotated images ; (b) scaled images.

Figure 11 indicates the object detection results of the rotated images based on HOGP. Compared 4.5. Runtime Comparison to Figure 9a,e, and f we find that all of the same objects in Figure 9 were also detected when the image In this section, we evaluate the running time of our algorithm by comparing it to state-of-the-art o was rotated to Our the right by 5 . Additionally, Figure 13a, we were find that obtains feature methods. implementation is coded inaccording MATLAB.toAll experiments doneHOGP on a single better than HOG in12 allGB of the scaled images in the matching rate. Figure 12 displays the core withperformance 3.40 GHz Intel CPU with of RAM. 0.8. Compared to Figure 9a,e, and f, all of the object detection results for the scaled images with Table 1.even Runtime comparison. same objects in Figure 9 were detected, though the image is zoomed out. Furthermore, Figure 13b shows that HOGP achieves higher matching rates than HOG in all of the scaled images. Pixel/Time(s) Man (333 × 5000) Desktop (1969 × 1513) Lena (256 × 256) Boy (443 × 543) Furthermore, Figure 13 also shows that the curve of HOGP’s matching rate is smoother than that of HOG + sparse dictionary 92.191 1423.114 36.532 123.521 HOG, is more robust 1378.683 than HOG in object detection from scaled HOGP which + sparseimplies dictionarythat HOGP 91.342 29.563 111.049and rotated images. Considering these results, we can conclude that HOGP performs well and provides a robust and powerful alternative to object detection in complex scenes. In order to display the robust performance of our proposed method in object detection, we tested a number of different resolution images in different scenes, and the means of the runtime were calculated 4.5. Runtime Comparison for HOGP and HOG. Table 1 shows the results for recovering the colour image from the original image. The runtime of the HOG feature compared to the HOGP feature increased by 4.07%. Consequently, we In this section, we evaluate the running time of our algorithm by comparing it to state-of-the-art find that the runtime of HOGP has hardly increased, while the visualization result and the recognition feature methods. Our implementation is coded in MATLAB. All experiments were done on a single rate is better than with HOG. Moreover, the runtime can be reduced to one-third for recovering the core with 3.40 GHz Intel CPU with 12 GB of RAM. grey image. Comparing the runtime between the two descriptors, the HOGP-based algorithm is more efficient than the HOG-based method.Table This 1.isRuntime becausecomparison. the runtime for extracting the direct image feature is reduced because PCA is used. Additionally, it is noted the(256 HOGP-based visualization Pixel/Time(s) Man (333 × 5000) Desktop (1969 × 1513)that Lena × 256) Boy (443 × 543) HOG +than sparsethat dictionary 92.191 1423.114 36.532 123.521 results are better of the HOG-based method that was introduced in this paper. HOGP + sparse dictionary

91.342

1378.683

29.563

111.049

5. Conclusions In order to display the robust performance of our proposed method in object detection, we tested To tackle the problem that HOG features introduce a greater amount of noise in complex scenarios, a number of different resolution images in different scenes, and the means of the runtime were we proposed a new feature descriptor named HOGP. We used the visualization method to compare calculated for HOGP and HOG. Table 1 shows the results for recovering the colour image from the the performance of image features and intuitively understand the world that the computer sees. original image. The runtime of the HOG feature compared to the HOGP feature increased by 4.07%. A qualitative and quantitative evaluation of the performance of HOG and HOGP under a variety of Consequently, we find that the runtime of HOGP has hardly increased, while the visualization result occasions was achieved. The experimental results showed that HOGP was more robust than other and the recognition rate is better than with HOG. Moreover, the runtime can be reduced to one-third HOG-based features. Moreover, HOGP can make the computer see the image in the same way as the for recovering the grey image. Comparing the runtime between the two descriptors, the HOGP-based human vision system, which is proved by comparing the visualization performance to a state-of-the-art algorithm is more efficient than the HOG-based method. This is because the runtime for extracting the direct image feature is reduced because PCA is used. Additionally, it is noted that the HOGP-

Sensors 2017, 17, 1569

15 of 16

descriptor. In another word, HOGP can also improve the correct rate of computer classification and recognition. In the next work, we intend to apply HOGP to image classification and object detection after reducing the runtime of our proposed algorithm. Acknowledgments: The project sponsored by the National Natural Science Foundation of China (no. 61401040), and the National Key Research and Development Program (no. 2016YFB0502002). Author Contributions: Jichao Jiao conceived and designed the experiments; Xin Wang performed the experiments; Zhongliang Deng analyzed the data; Jichao Jiao contributed reagents/materials/analysis tools; Jichao Jiao and Xin Wang wrote the paper. Conflicts of Interest: The authors declare no conflict of interest.

References 1. 2.

3.

4. 5. 6. 7. 8. 9. 10. 11. 12.

13. 14. 15. 16. 17.

Van De Sande, K.; Gevers, T.; Snoek, C. Evaluating colour descriptors for object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1582–1596. [CrossRef] [PubMed] Pandey, M.; Lazebnik, S. Scene recognition and weakly supervised object localization with deformable part-based models. In Proceedings of the 2011 IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; pp. 1307–1314. Silberman, N.; Fergus, R. Indoor scene segmentation using a structured light sensor. In Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops, Barcelona, Spain, 6–13 November 2011; pp. 601–608. Li, L.; Sumanaphan, S. Indoor Scene Recognition. 2011. Available online: https://pdfs.semanticscholar.org/ 0392/ae94771ba474d75242c28688e8eeb7bef90d.pdf (accessed on 26 June 2017). Chang, W.Y.; Chen, C.S.; Jian, Y.D. Visual tracking in high-dimensional state space by appearance-guided particle filtering. IEEE Trans. Image Process. 2008, 17, 1154–1167. [CrossRef] [PubMed] Oh, H.; Won, D.Y.; Huh, S.S.; Shim, D.H.; Tahk, M.J.; Tsourdos, A. Indoor UAV control using multi-camera visual feedback. J. Intell. Robot. Syst.: Theory Appl. 2011, 61, 57–84. [CrossRef] Card, S.K.; Mackinlay, J.D.; Shneiderman, B. Readings in Information Visualization: Using Vision to Think; Morgan Kaufmann: Burlington, MA, USA, 1999. Tory, M.; Moller, T. Evaluating visualizations: Do expert reviews work? IEEE Comput. Graph. Appl. 2005, 25, 8–11. [CrossRef] [PubMed] Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the 13th European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; pp. 818–833. Vondrick, C.; Khosla, A.; Pirsiavash, H.; Malisiewicz, T.; Torralba, A. Visualizing object detection features. Int. J. Comput. Vision 2016, 119, 145–158. [CrossRef] Pietikäinen, M. Local binary patterns. Scholarpedia. Scholarpedia 2010, 5, 9775. Available online: http://www.scholarpedia.org/article/Local_Binary_Patterns (accessed on 26 June 2017). Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June 2005; pp. 886–893. Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; pp. 1150–1157. Ahonen, T.; Hadid, A.; Pietikainen, M. Face description with local binary patterns: Application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 2037–2041. [CrossRef] [PubMed] Tan, X.; Triggs, B. Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Trans. Image Process. 2010, 19, 1635–1650. [CrossRef] [PubMed] Dollar, P.; Wojek, C.; Schiele, B.; Perona, P. Pedestrian detection: An evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 743–761. [CrossRef] [PubMed] Zhang, J.; Huang, K.; Yu, Y.; Tan, T. Boosted local structured HOG-LBP for object localization. In Proceedings of the 2011 IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA, 20–25 June 2011; pp. 1393–1400.

Sensors 2017, 17, 1569

18.

19.

20.

21. 22.

23.

24. 25. 26.

27.

28.

29.

30. 31. 32. 33. 34. 35.

16 of 16

Zeng, C.; Ma, H. Robust head-shoulder detection by PCA-based multilevel HOG-LBP detector for people counting. In Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR), Istanbul, Turkey, 23–26 August 2010; pp. 2069–2072. Ke, Y.; Sukthankar, R. PCA-SIFT: A More Distinctive Representation for Local Image Descriptors. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Washington, DC, USA, 27 June–2 July 2004; pp. 506–513. Yan, J.; Zhang, X.; Lei, Z.; Liao, S.; Li, S.Z. Robust multi-resolution pedestrian detection in traffic scenes. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013; pp. 3033–3040. Trung, Q.N.; Phan, D.; Kim, S.H.; Na, I.S.; Yang, H.J. Recursive Coarse-to-Fine Localization for Fast Object Detection. Int. J. Control Autom. 2014, 7, 235–242. [CrossRef] López, M.M.; Marcenaro, L.; Regazzoni, C.S. Advantages of dynamic analysis in HOG-PCA feature space for video moving object classification. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brisbane, QLD, Australia, 19–24 April 2015; pp. 1285–1289. Savakis, A.; Sharma, R.; Kumar, M. Efficient eye detection using HOG-PCA descriptor. Imaging Multimed. Anal. Web Mob. World 2014, 9027. Available online: https://www.researchgate.net/profile/ Andreas_Savakis/publication/262987479_Efficient_eye_detection_using_HOG-PCA_descriptor/links/ 564a6d2108ae44e7a28db727/Efficient-eye-detection-using-HOG-PCA-descriptor.pdf (accessed on 26 June 2017). Kim, J.Y.; Park, C.J.; Oh, S.K. Design & Implementation of Pedestrian Detection System Using HOG-PCA Based pRBFNNs Pattern Classifier. Trans. Korean Inst. Electr. Eng. 2015, 64, 1064–1073. [CrossRef] Agrawal, A.K.; Singh, Y.N. An efficient approach for face recognition in uncontrolled environment. Multimed. Tools Appl. 2017, 76, 3751–3760. [CrossRef] Dhamsania, C.J.; Ratanpara, T.V. A survey on Human action recognition from videos. In Proceedings of the 2016 Online International Conference on Green Engineering and Technologies (IC-GET), Coimbatore, India, 19 November 2016; pp. 1–5. Weinzaepfel, P.; Jégou, H.; Pérez, P. Reconstructing an image from its local descriptors. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA, 20–25 June 2011; pp. 337–344. Vondrick, C.; Khosla, A.; Malisiewicz, T.; Torralba, A. Hoggles: Visualizing object detection features. In Proceedings of the 2013 IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, 1–8 December 2013; pp. 1–8. Mahendran, A.; Vedaldi, A. Understanding deep image representations by inverting them. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 5188–5196. Chu, B.; Yang, D.; Tadinada, R. Visualizing Residual Networks. arXiv 2017, arXiv:1701.02362. Available online: https://arxiv.org/abs/1701.02362 (accessed on 26 June 2017). Pandey, P.K.; Singh, Y.; Tripathi, S. Image processing using principle component analysis. Int. J. Comput. Appl. 2011, 15. Bajwa, I.S.; Naweed, M.; Asif, M.N.; Hyder, S.I. Feature based image classification by using principal component analysis. ICGST Int. J. Graph. Vis. Image Process. 2009, 9, 11–17. Keller, J.M.; Gray, M.R.; Givens, J.A. A fuzzy k-nearest neighbor algorithm. IEEE Trans. Syst. Man Cybern. 1985, 15, 580–585. [CrossRef] Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vision 2010, 88, 303–338. [CrossRef] Kuo, B.C.; Landgrebe, D.A. Nonparametric weighted feature extraction for classification. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1096–1105. © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Build a Robust Learning Feature Descriptor by Using a New Image Visualization Method for Indoor Scenario Recognition.

In order to recognize indoor scenarios, we extract image features for detecting objects, however, computers can make some unexpected mistakes. After v...
12MB Sizes 2 Downloads 14 Views