Polar Embedding for Aurora Image Retrieval.

3332

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 11, NOVEMBER 2015

Polar Embedding for Aurora Image Retrieval Xi Yang, Xinbo Gao, Senior Member, IEEE, and Qi Tian, Senior Member, IEEE

Abstract— Exploring the multimedia techniques to assist scientists for their research is an interesting and meaningful topic. In this paper, we focus on the large-scale aurora image retrieval by leveraging the bag-of-visual words (BoVW) framework. To refine the unsuitable representation and improve the retrieval performance, the BoVW model is modified by embedding the polar information. The superiority of the proposed polar embedding method lies in two aspects. On the one hand, the polar meshing scheme is conducted to determine the interest points, which is more suitable for images captured by circular fisheye lens. Especially for the aurora image, the extracted polar scaleinvariant feature transform (polar-SIFT) feature can also reflect the geomagnetic longitude and latitude, and thus facilitates the further data analysis. On the other hand, a binary polar deep local binary pattern (polar-DLBP) descriptor is proposed to enhance the discriminative power of visual words. Together with the 64-bit polar-SIFT code obtained via Hamming embedding, the multifeature index is performed to reduce the impact of false positive matches. Extensive experiments are conducted on the large-scale aurora image data set. The experimental result indicates that the proposed method improves the retrieval accuracy significantly with acceptable efficiency and memory cost. In addition, the effectiveness of the polar-SIFT scheme and polar-DLBP integration are separately demonstrated. Index Terms— Polar embedding, aurora image retrieval, polar-SIFT, polar-DLBP.

I. I NTRODUCTION

C

ONTENT-BASED image retrieval (CBIR) [1]–[3] has achieved great advance in the field of multimedia, especially for the natural image applications, e.g., the near duplicate image search in large-scale databases [4]–[10].

Manuscript received October 9, 2014; revised February 25, 2015 and May 25, 2015; accepted May 31, 2015. Date of publication June 9, 2015; date of current version June 23, 2015. This work was supported in part by the National Natural Science Foundation of China under Grant 61125204, Grant 61432014, Grant 61172146, and Grant 61429201, in part by the Fundamental Research Funds for the Central Universities under Grant BDZ021403 and Grant JB149901, in part by the Program for Changjiang Scholars and Innovative Research Team in University of China under Grant IRT13088, in part by the Shaanxi Innovative Research Team for Key Science and Technology under Grant 2012KCT-02, in part by the Microsoft Research Asia Project based Funding under Grant FY13-RES-OPP-034, in part by the U.S. Army Research Office under Grant W911NF-12-1-0057, and in part by the Faculty Research Awards by NEC Laboratories of America. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Aydin Alatan. X. Yang is with the State Key Laboratory of Integrated Services Networks, School of Telecommunications Engineering, Xidian University, Xi’an 710071, China (e-mail: [email protected]). X. Gao is with the State Key Laboratory of Integrated Services Networks, School of Electronic Engineering, Xidian University, Xi’an 710071, China (e-mail: [email protected]). Q. Tian is with the Department of Computer Science, University of Texas at San Antonio, San Antonio, TX 78249, USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2015.2442913

Fig. 1. Related images of YRS, ASI and the captured aurora image. (a) YRS; (b) ASI; (c) Aurora image.

Besides the natural image retrieval which is utilized for commercial applications, the topic of specific image retrieval for scientific research [11] is also interesting and valuable. This paper considers the task of aurora image retrieval in the large-scale aurora image database. Aurora is a display of natural lights in the sky, particularly in the high latitude (Arctic or Antarctic) regions. It is caused by the collision of solar energetic charged particles (often called “solar wind”) with atoms in the thermosphere. As a phenomenon which can be observed by naked eyes, aurora is acted as a mirror reflecting the solar activities and changes in the Earth’s magnetosphere. Thus, scientists utilize various facilities to capture the aurora observation data, including multiband imagers on board numerous satellites [12], [13], e.g., NASA’s THEMIS, IMAGE, Polar, etc., and optical imaging devices installed in the polar research stations. In this paper, we explore aurora images captured by the all-sky imager (ASI) installed in the Chinese Arctic Yellow River Station (YRS). Fig. 1 shows related images of YRS, ASI and the captured aurora image. By analyzing the morphological characteristics of aurora images, scientists can construct specific model to forecast solar activities, and thus some disastrous space weather caused by strong disturbance in the magnetosphere (e.g., the magnetospheric substorm which seriously interferes the communication, electricity supply, aviation and global positioning system (GPS)) can be avoided. However, traditional aurora data study conducting via visual inspection is limited and inefficient. On one hand, this manual way is performed on a small database due to the tedious work burden, making the analysis result uncomprehensive. On the other hand, subjective errors are easily introduced because of visual fatigue. Thus, along with the increasing amount of aurora data, how to exploit the multimedia techniques for its analysis is an urgent problem. This paper focuses on the topic of aurora image retrieval. Our goal is to retrieve similar aurora images in a large-scale database by means of the CBIR technique, and CBIR is performed as a supplementary means to help scientists for their further manual analyses. On one hand, CBIR benefits the statistical analysis on

1057-7149 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

YANG et al.: POLAR EMBEDDING FOR AURORA IMAGE RETRIEVAL

large-scale aurora database by selecting candidate images. To be specific, when scientists want to make study on the occurrence rule of a specific aurora form, they can provide a query image and leverage the CBIR system to roughly select similar images as candidates. Together with the time information of these candidates, scientists can manually choose the real ones and study their occurrence rules, e.g., when does this kind of form often occur, at noon or afternoon? On the other hand, CBIR facilitates the analysis of new data by providing reference information from similar images in the large-scale historical data. For example, when scientists are uncertain about whether the new data contains a left vortex, they can find historical images with similar forms via CBIR system and refer to their research results. Currently, most state-of-the-art CBIR approaches are based on the Bag-of-Visual Words (BoVW) model [14]. There are two main stages in the BoVW-based image retrieval framework, i.e., offline indexing and online retrieval. In the offline stage, local features of images in the database are extracted and quantized to a visual vocabulary, and thus each image can be represented as a “bag” of visual words. Generally, each visual word is weighted using the term frequency-inverse document frequency (tf-idf ) scheme. Then, an inverted file is leveraged to index the information of images. In the online stage, the distribution of visual words in the query image is first determined, and then images with similar distributions are regarded as the retrieval results. To improve the retrieval accuracy, some post processing techniques, such as spatial verification [4] and query expansion [15], [16] are explored. One of the vital factors determining the retrieval accuracy is the discriminative ability of visual vocabulary. Typically, conventional methods usually exploit the SIFT [17] or Dense-SIFT [18] features to generate the visual vocabulary. However, both of them are not suitable for the aurora images. Different from the natural image which contains abundant interest points, the number of interest points in aurora image is relatively small, which results in the lack of local descriptors. Although the Dense-SIFT addresses this problem by extracting SIFT descriptors on a dense rectangular grid, its meshing way ignores the unique characteristics of aurora image. Specifically, the natural image is captured using the normal lens without spherical aberration, while the aurora image is obtained via the circular fisheye lens with spherical aberration. This fact implies the inappropriateness of rectangular meshing for aurora image description. Additionally, SIFT-based BoVW model ignores other characteristics, e.g., the texture of an image. This problem, together with the information loss during quantization, inevitably leads to false positive matches and thus reduces the retrieval accuracy. In this paper, we propose a polar embedding (PE) model for aurora image retrieval. The embedding of polar information primarily lies in two aspects. On one hand, to refine the Dense-SIFT descriptor, we conduct the polar meshing instead of rectangular meshing to determine the interest points, and thus forms the Polar-SIFT descriptor. The Polar-SIFT is more suitable for images captured by circular fisheye lens. Especially for the aurora image, the polar meshing also

3333

embeds the information of geomagnetic longitude and latitude, which is in accordance with the physics behind the aurora data and helpful for its further study. On the other hand, based on the well-known texture descriptor local binary pattern (LBP) [19], we propose an improved version, i.e., deep LBP (DLBP), to enhance the discriminative power of visual words. The “deep” in DLBP implies a more in-depth representative ability, which is achieved by extending the LBP computation from the interest point to its neighbors with deeper and more detailed comparisons. Combining with the polar meshing, the Polar-DLBP is applied to support the Polar-SIFT descriptor. The DLBP has properties of robustness against illumination changes and computational simplicity, which can effectively complement the disadvantages of SIFT descriptors. Also, to reduce the information loss during quantization, we adopt the Hamming embedding (HE) [4] to quantize the Polar-SIFT feature, which yields 64 bit binary signature. Together with the proposed 64 bit Polar-DLBP signature, we finally describe an interest point with 128 bit local signature. Thus, the proposed PE model is actually a combination of Polar-SIFT, HE and Polar-DLBP schemes, i.e., Polar-SIFT+HE+Polar-DLBP. It is worth noting that the proposed PE approach can be generalized to the applications of other omnidirectional images captured by circular fisheye lens, including the image representation, classification and retrieval. The main contributions of this paper are summarized below. 1) We propose a Polar-SIFT feature to represent images captured by circular fisheye lens. Compared with traditional SIFT and Dense-SIFT, the proposed Polar-SIFT utilizes the polar meshing to determine the interest points, which is more in accordance with the imaging principle. Especially for the aurora image, Polar-SIFT feature is able to reflect its physical information. 2) We make a fusion with the PolarSIFT and Polar-DLBP for image indexing. To strengthen the discriminative power of visual vocabulary and meanwhile reduce the quantization error, we present a Polar-DLBP feature which describes each interest point with 64 bit signature, and index each image together with the 64 bit signature of Polar-SIFT after HE. 3) We construct a large-scale aurora image retrieval system to support the scientific research with multimedia technique. With the proposed polar embedding system, we have accomplished the aurora image retrieval on a database with 1 million images. To the best of our knowledge, it is the first work to apply BoVW-based CBIR on such a large-scale aurora database. The retrieval system will benefit scientists for their polar atmosphere research. The remainder of this paper is organized as follows. In Section II, the background of aurora image retrieval and related works are reviewed. Section III presents the proposed polar embedding model for aurora image retrieval. The experimental results and discussion are presented in Section IV, and Section V is conclusion. II. BACKGROUND This section first introduces the ASI aurora image data we explore, and then reviews related works including the BoVW-based image retrieval and LBP with its variants.

3334


A. ASI Aurora Image Data The data we explore in this paper are captured by the ASI installed at YRS, Ny-Alesund, Svalbard. Located in geographic coordinates 78.921N, 11.931E, YRS is one of the few stations capable to perform perennial optical aurora observations in the dayside on Earth. The optical instrument ASI was put into use in 2003, and continues to generate aurora images on a multiple wavelengths (427.8 nm, 557.7 nm and 630.0 nm) for 24 hours every day with a temporal resolution of 10 seconds. In this paper, we concentrate on the green wavelength (557.7 nm) aurora data from 2003 to 2008. Originally, there are totally 2,910,919 (about 3M) images. After eliminating images captured under bad weather conditions or without the information of aurora, 1,003,428 (about 1M) informative images are selected to test our image retrieval system. Actually, researchers have surveyed the ASI aurora data mainly on its classification [20], [21]. The current assumption is that there are four categories including the arc, radial, drapery and hotspot. To evaluate the retrieval performance quantitatively, we create a query image database with more detailed category information. There are 10 categories in our query database, and each category contains 800 similar images. The category information of each image is labelled by aurora experts. Fig. 2 gives examples of the query images. To make a detailed analysis of the retrieval performance with respect to the size of dataset, we manually construct several datasets according to the image generation time. ASIG8K: The above mentioned query image dataset. ASIG14K: This dataset includes 14,224 images which are formed in the overwintering 19 days from December 2003 to January 2004. During this time, the ASIs in the YRS were just put into use, and this is the first overwintering observation. ASIG100K: This dataset includes 119,860 images produced in the whole month of January 2005. ASIG500K: This dataset includes 579,332 images which are captured in the whole year of 2006. ASIG1M: This dataset includes 1,003,428 images and is used to test the scalability of the proposed method.

Fig. 2.

Examples of the query images.

B. BoVW-Based Image Retrieval We make a brief review of the related works on the BoVW-based image retrieval. Generally, there are four key components deciding the retrieval performance: local feature extraction, feature quantization, indexing strategy and post processing. 1) Local Feature Extraction: Extracting local features includes the procedures of interest point detection and feature description. The purpose of interest point detection is to find points with high repeatability over scale, rotation, and other changes. Popular detectors comprise the Difference of Gaussian (DoG) [17], MSER [22], and Hessian affine [23]. Meanwhile, to ensure sufficient interest points to strengthen the discriminative power, [18] applies a dense rectangular grid with fixed scale and orientation to determine the locations of interest points. Then comes to the procedure of feature description, it utilizes a descriptor to represent the local region

centered at the interest point. The most commonly used is the SIFT feature [17] which is invariant to scale, rotation and affine distortion. Similar choices include SURF [24], PCA-SIFT [25]. Recently, binary features such as BRIEF [26], BRISK [27], ORB [28], Edge-SIFT [29] and COGE [30], have achieved a high efficiency and attracted numerous researchers’ attention. Also, feature fusion of multiple cues is proven to be effective in various tasks, e.g., Zhang et al. [31] performed feature fusion between the global attribute feature and local SIFT feature via graph fusion. In particular, to better represent images captured by circular fisheye lens, Arican and Frossard [32] proposed a log-polar descriptor according to the imaging principle and successfully preserved the geometry of visual information in images. Inspired by the same idea, Song and Li [33] presented a compact descriptor based on the local polar DCT features (LPDF) to improve


the robustness against image transformations. Other polar-like features, such as GLOH [34], RIFT [35], RIFF [36], MROGH [37], LIOP [38], OSID [39] and Daisy [40], have achieved competitive representative ability and are promising to process omnidirectional images. In this paper, we introduce this polar sampling idea into the famous SIFT descriptor to combine their advantages for the representation of aurora images. 2) Feature Quantization: The local feature extraction usually produces several hundred or thousand features for one image. To save memory cost, these huge features are quantized to visual words for compact representation, and thus an image can be represented as a “bag” of visual words. The quantized visual words are the nearest centers to the feature vectors in the feature space, and are often collected to generate a visual vocabulary via flat k-means [14], hierarchical k-means (HKM) [41] or approximate k-means (AKM) [42]. With the generated visual vocabulary, feature quantization is performed to assign a visual word ID to each local feature. Usually, hard quantization such as approximate nearest neighbor (ANN) [43], k-d tree [17] and k-d forest approximation [44] is utilized. To reduce the quantization loss, soft assignment [45] is proposed to quantize a feature vector as a weighted combination of several visual words. Recently, a vocabulary free method, called Scalar Quantization (SQ) [46], is proposed as an alternative choice of feature encoding without the training of visual vocabulary. 3) Indexing Strategy: In large-scale image retrieval, offline indexing is a memory cost procedure. Motivated by the text retrieval framework, inverted file structure [14] is utilized and significantly promotes the efficiency. Actually, the inverted file structure is a compact representation of sparse matrix, and only those images sharing similar visual words with the query image need to be visited. Therefore, compared with the linear scan approach, the number of visiting images is reduced greatly, and thus results in a more efficient response. In the inverted file, each entry stores the information associated with each visual word, including the image ID, term frequency (TF) score and other clues for verification or similarity measurement. For example, Hamming embedding [4] generates 64 bit binary signature for each local feature to verify the descriptor matching. Some geometric clues, such as the location, scale and orientation of local feature, are stored for the verification of geometric consistency. Recent indexing strategies focus on the division of visual vocabulary. The joint inverted index [6] jointly optimizes different visual words in different vocabularies, while the multi-index [47] decomposes the SIFT feature into different parts in multi-vocabulary. Besides, the improved versions of multi-index, such as the Bayes merging of multi-vocabulary [5], are proposed to further refine the retrieval performance. 4) Post Processing: To improve the retrieval quality, the initial results are often re-ranked by means of post processing modules, such as spatial verification [4] and query expansion [15], [16]. The spatial verification filters false positives by checking the geometric consistency of matched features, and related works include RANSAC-based global spatial verification [42] and weak geometric consistency [4].

3335

Fig. 3. An example of naive LBP and CS-LBP for a neighborhood with 8 pixels (R = 1, N = 8).

The query expansion reissues the initial top-ranked results to form new queries with valuable features which are not presented in the original query, and thus a high recall can be achieved. C. LBP and Its Variants The LBP is an efficient texture descriptor which is invariant to illumination changes and various monotonic transformations in the gray scale. Due to its powerful discriminative ability for gray images, the LBP has been explored to describe the aurora image for its classification or segmentation and has achieved satisfactory results [48]–[51]. The naive LBP [19] describes a local region by measuring the relative gray contrast between the central pixel and its neighboring pixels. If the gray value of the neighboring pixel is higher than the central pixel, the value is set to one, otherwise to zero. Then, the binary sequences are treated as the local feature, i.e., LBP R,N (x, y) =

N−1

s(gi − gc )2i,

i=0 1, x ≥ 0 s(x) = 0, other wi se,

(1)

where gc is the gray value of the central pixel with location (x, y), gi is the gray value of the i th pixel in the neighborhood consisting of N equally spaced pixels on a circle of radius R. By assigning the weight 2i for each pixel, the binary sequences can be converted to a decimal number, and there are totally 2 N possible values. Fig. 3(a) gives an example of the naive LBP for a neighborhood with 8 pixels (R = 1, N = 8). The naive LBP can be further simplified by applying the center-symmetric local binary patterns (CS-LBP) [52]. Instead of comparing the neighboring pixels with the central pixel, the CS-LBP compares center-symmetric pairs of pixels. Also, to improve the robustness on flat image regions, a small value is

3336

Fig. 4.


The diagram of the proposed polar embedding method for aurora image retrieval. P and PE stand for Polar and Polar Embedding, respectively.

utilized to replace zero as the threshold for differences between gray values, and the CS-LBP descriptor can be defined as LBPCS R,N,T (x, y) =

N/2−1

s(gi − gi+N/2 )2i,

i=0 1, x ≥ T s(x) = 0, other wi se,

(2)

where gi and gi+N/2 represent the gray values of centersymmetric pairs, and T is the threshold. Fig. 3(b) shows the CS-LBP for a neighborhood with 8 pixels (R = 1, N = 8). It can be seen that the number of binary patterns of CS-LBP for an interest region is reduced by half compared with the naive LBP, and there are totally 2 N/2 possible values. In practice, other variants of LBP, such as the robust LBP [53] and completed LBP [54], are proposed to achieve better robustness against rotation or gradient direction with little loss of the discriminative capability. III. T HE P ROPOSED M ETHOD Traditional BoVW-based image retrieval is not suitable for aurora images, especially the modules of local feature extraction and indexing strategy. Taking the characteristics of aurora image into account, we propose a polar embedding method to refine the SIFT descriptor and to combine it with a Polar-DLBP feature for indexing. This section first overviews the proposed method, and then details each module. A. Overview of the Proposed Method The proposed method consists of two main components (see Fig. 4): offline indexing and online retrieval. Specially, the offline indexing includes procedures of polar embedding

interest points detection, Polar-SIFT feature quantization, Polar-DLBP feature extraction, and indexing with multi-features. 1) Offline Indexing: After inputting the large-scale aurora image database, polar embedding interest points detection is conducted for each image. These interest points are selected under a dense grid via polar meshing, which conforms to the imaging principle and geomagnetic implication of the aurora image. Subsequently, each interest point is described by the SIFT descriptor, which yields the Polar-SIFT feature. The Polar-SIFT feature quantization is then applied with a visual vocabulary generated by AKM clustering. To reduce the information loss during quantization and achieve high discriminative ability, Hamming embedding scheme is adopted to map the Polar-SIFT feature as a binary signature. Meanwhile, Polar-DLBP feature extraction is performed to generate a texture descriptor for the interest point, and the result is also represented as a binary signature. Afterwards, the information of Polar-SIFT and Polar-DLBP is fused and saved as an entry for indexing. Finally, by inserting related entries of interest points to the line of certain visual word, the inverted file is constructed and indexing with multi-features is completed. 2) Online Retrieval: Given a query image, we first extract the local features including Polar-SIFT and Polar-DLBP for all the interest points, and the Polar-SIFT is also quantized to visual vocabulary and converted as a binary signature via HE. Then, tf-idf is computed with the information of the relationship between visual words and interest points. Together with the Hamming distance between binary signatures of interest points, the similarity score between the query image and an image in the offline database is determined. At last, a ranked list based on the similarity scores is exported as the retrieval result, and the higher the score, the higher the rank.


Fig. 5. Different interest point detection methods. (a) SIFT with Hessian affine detector (totally 117 interest points); (b) Dense-SIFT with rectangular meshing (L = 29, M = 29, both the line and column intervals are 18 pixels for the 512 × 512 aurora image, after discarding the irrelevant points, there are totally 600 interest points); (c) Polar-SIFT with polar meshing (E = 24, F = 25, the radial interval is 10 pixels and the angle interval is 15°, totally 600 interest points). The M.N. and M.S. refer to the magnetic north and magnetic south, respectively, and their connection line has an angle of 28.9° with the line of θ = 0.

B. Polar-SIFT Extraction and Quantization The SIFT feature is commonly used in BoVW-based natural image retrieval. However, for an aurora image, only one hundred or even a few tens of interest points can be detected applying the Hessian affine or other detectors (as shown in Fig. 5(a)). Such small number of interest points definitely cannot ensure the discriminative capability of local features. To address this problem, as illustrated in Fig. 5(b), Dense-SIFT makes a rectangular meshing (L × M) on the image to select interest points densely. It is worth noting that as there is no change of gradient in points located at the four corners, we discard these irrelevant points with the use of a circle mask. However, Dense-SIFT is inappropriate for images captured by circular fisheye lens with spherical aberration. On the contrary, a polar meshing may be more suitable from the view of imaging principle and able to ensure the significance of all the detected points. Thus, we propose a polar embedding interest points detection method by changing the rectangular meshing as a polar meshing. As shown in Fig. 5(c), polar meshing divides an image uniformly according to the polar coordinates system, and the location of interest point (x, y) can be determined by x = C + ρcos(θ ) (3) y = C − ρ sin(θ ), where C is the radius of image, ρ is the radial coordinate refers to the distance between the interest point and the image center, and θ is the angle coordinate. The number of interest points after polar meshing (E × F) is depend on the intervals of ρ and θ . Compared with the sparse detection SIFT explored and rectangular sampling Dense-SIFT applied, the proposed polar meshing is more suitable for images captured by circular fisheye lens, especially for the aurora image. On one hand, polar meshing selects sufficient interest points in an informative region, which ensures the discriminative capability of local features and avoids unnecessary computation on the uninformative regions. On the other hand, as circular fisheye lens generate images with spherical aberration, the peripheral region of the image contains more pixels per viewing angle than the central region. Such non-uniform structure can be

3337

described appropriately by the sampling way of polar meshing. Additionally, the polar meshing can also reflect the information of geomagnetic longitude and latitude as explained in Fig. 5(c), which facilitates polar researchers for their further study. With a certain interest point as the central pixel, the corresponding interest region with size K R × K R is described by 128-dimension SIFT vectors, yielding the Polar-SIFT feature. After extracting Polar-SIFT features of all images in the database, AKM is performed to generate the visual vocabulary. Each Polar-SIFT local feature is then quantized to the nearest centroid in the trained vocabulary via approximate nearest neighbor (ANN) algorithm. To reduce the quantization error, binary signature of Polar-SIFT are calculated via HE. In this paper, we follow the choice of the original HE paper [4], which results in 64 bit binary signature with satisfactory performance. C. Polar-DLBP Extraction Generally, multi-feature is more effective and powerful than a single feature for the representation of interest regions since more information is integrated. To this end, we utilize a Polar-DLBP feature to complement the disadvantages of Polar-SIFT feature. The Polar-DLBP feature is extracted on the same interest points determined by polar meshing, and each interest region is represented with a DLBP descriptor. Unlike the naive LBP or CS-LBP only measures the relationship among a neighborhood centered in the interest point, we implement a deeper and more detailed comparison. In specific, except the interest point, LBP computation is also conducted for the seed points which are located a certain distance from the interest point, resulting in a more in-depth representation. To balance the length of binary signature and its discriminative ability, we perform CS-LBP for the interest point and the seed points, and concatenate them to form the DLBP as DLBP R,N,T (x, y) CS = [LBPCS R11 ,N11 ,T11 (x, y), . . . , LBP R1 j ,N1 j ,T1 j (x, y), CS LBPCS Ri1 ,Ni1 ,Ti1 (x , y ), . . . , LBP Ri j ,Ni j ,Ti j (x , y )], R = K R /2 > Ri, j , N = Ni, j , T = max{Ti, j }, (4) i, j

where the concatenation between adjacent strings CS LBPCS R1 ,N1 ,T1 (x, y) and LBP R2 ,N2 ,T2 (x, y) can be defined as CS [LBPCS R1 ,N1 ,T1 (x, y), LBP R2 ,N2 ,T2 (x, y)] N1 2

=

−1 i=0

s1 (gi − gi+ N1 )2

1, x ≥ T1 s1 (x) = 0, other wi se,

2

i+

N2 2

N2 2

+

−1 j =0

s2 (x) =

s2 (g j − g j + N2 )2 j,

1, x ≥ T2 0, other wi se,

2

(5)

and (x , y ) represents the seed points of interest point (x, y), i.e., (x , y ) = (x ± r, y ± r ), r = K R /4.

(6)

The DLBP totally produces K P L = N/2 bit binary signature. A toy example of 64 bit DLBP descriptor is shown in Fig. 6.

3338


Fig. 7. Structure of the proposed inverted indexing with multi-feature. The Polar-SIFT and Polar-DLBP features are fused to build the inverted index.

Polar-DLBP are stored. Fig. 7 illustrates the structure of the proposed inverted indexing with multi-feature. E. Querying Scheme

Fig. 6. A toy example of DLBP descriptor. The interest point is marked in dark purple and its neighbors are marked in dark yellow; the seed points are marked in light purple and their neighbors are marked in light yellow. In this 16 × 16 (K R = 16) interest region, R = K R /2 = 8, N = 128, C S (x, y), T = 3, r = K R /4 = 4, and the computed CS-LBPs are L B P1,8,3 C S (x, y), L B P C S (x, y), L B P C S (x ± 4, y ± 4), L B P C S L B P2,8,3 3,16,3 1,8,3 2,8,3

C S (x ± 4, y ± 4). There are totally (x ± 4, y ± 4) and L B P3,8,3 K P L = N/2 = 64 bit binary signature generated from this DLBP descriptor.

As the extracted Polar-DLBP feature is intrinsically binary and shares the same interest point with Polar-SIFT, there is no need for the procedures of vocabulary training and feature quantization. Thus, the produced Polar-DLBP feature is simply treated as another Polar-SIFT feature after HE. Also, since the Polar-DLBP only conducts the gray value subtraction, the increased amount of calculation can be ignored compared with the improved ability of feature representation. D. Indexing With Multi-Feature Original single SIFT feature indexing is implemented by saving the image ID and SIFT feature after HE into the entry of inverted file. Since the proposed Polar-DLBP shares the same interest point and visual word with Polar-SIFT, the inverted file structure can be modified slightly to perform the indexing with multi-feature. An image database with K images can be denoted as K . For each image, there are K interest points deterD = {Ii }i=1 l mined via the proposed polar embedding detection method. Thus, the Polar-SIFT features for one image can be expressed l as { f js∗ } Kj =1 , and these local features are used to generate a Kw of size K w . The Polar-SIFT features visual vocabulary {wi }i=1 l are then converted to binary signature { f js } Kj =1 by means of HE, and the Polar-DLBP features can be also denoted as Kl { f jl } j =1 . Finally, the inverted index can be constructed as W = {W1 , W2 , ..., W K w }, where each entry Wi comprises a list of indexed keypoints (interest points or local features), in which the information of image ID, Polar-SIFT after HE,

Given a query image, the users want to acquire similar images from the large-scale database. The voting scheme BoVW-based models exploited outputs a rank list by computing the similarity scores between the database images and the query image. We adopt this querying scheme and make suitable adjustments for the proposed polar embedding method. The polar embedding interest points detection is first performed to the query image, and the corresponding Polar-SIFT features can be extracted and quantized to related visual words. With the information of indexed keypoint and related visual word, the corresponding entry Wi stored in the inverted file can be identified, and the list of indexed keypoints (including the information of image IDs) saved in Wi are regarded as the candidate images. This procedure can be formulated by a matching function as m(a, b) = f (q(a), q(b)),

(7)

where a and b represent two local features for matching, q(·) is the quantization function mapping the local feature to the corresponding visual word. Conventional voting scheme assumes that local features sharing the same visual word are regarded as the nearest neighbors, and utilizes the Kronecker delta response δ in the matching function, i.e., 1, i f q(a) = q(b) m(a, b) = δq(a),q(b) = (8) 0, other wi se. This strategy ignores the detailed information of local feature, which inevitably leads to quantization errors and various false positive matches. Inspired by the work of HE, we leverage the specific information of each multi-feature to filter the false positives and refine the retrieval result. After extracting the binary signature of Polar-SIFT fqs and Polar-DLBP f ql , we connect them as a multi-binary code fqm = [ f qs , fql ]. The matching function is then rewritten as ⎧ −h 2 ⎨ exp( 2 ), i f q( f qs ) = q( f js ), h < Th m m m( f q , f j ) = σ ⎩ 0, other wi se, (9)


where h = H ( f qm , f jm ) is the Hamming distance between the multi-binary codes of the query feature and an indexed feature, Th is a preset threshold, and exp(−h 2 /σ 2 ) is the weight measuring the matching degree with a positive parameter σ . The smaller the Hamming distance, the higher the value of the matching function. Thus, with the consideration of inverse document frequency (IDF), the similarity score between the query image Q and a database image I can be measured by the sum of matching degrees of their local multi-features i.e., m m 2 f qm ∈Q, f jm ∈I m( f q , f j ) · i d f , (10) SS(Q, I ) = √ I 2

3339

Fig. 8. Impact of polar embedding interest points number on the ASIG8K, ASIG14K and ASIG100K datasets.

where i d f = K /K c is the IDF, K and K c are the number of images in the database and the number of images containing Kw the visual word, respectively. I 2 = ( t 2 )1/2 is the l2 norm i=1

of image visual word vector, and t is the term frequency (TF) of visual word in this image. By ranking the above similarity scores from high to low, the retrieval result can be obtained, and the top image with the highest score is regarded as the most similar image. IV. E XPERIMENTS AND D ISCUSSION To demonstrate the effectiveness of the proposed polar embedding method for aurora image retrieval, we conduct massive experiments on the ASI aurora image database. All experiments are performed on a computer with 2.60 GHz Intel Core i5 CPU and 16 GB of RAM. The retrieval accuracy is measured by the mean average precision (mAP) [42]. A. Parameter Analysis In this section, we will analyze the parameters in the proposed method and determine the suitable settings for the rest experiments. All experiments are conducted on the ASIG8K, ASIG14K and ASIG100K datasets. It is worth noting that some parameters are predefined with empirical values, such as the interest region size K R = 16. 1) Polar Embedding Interest Point Number: The interest point number (E × F) determined by the polar meshing is based on the choices of radial interval ρ and angle interval θ . Too dense a gridding will result in a heavy memory cost and slow query speed, while too sparse a gridding decreases the discriminative power of local features. In this experiment, we conduct image retrieval with several candidate numbers of interest points, i.e., 16 × 13 = 208, 24 × 13 = 312, 16 × 25 = 400, 32 × 13 = 416, 24 × 25 = 600, 16 × 49 = 784, 32 × 25 = 800, 24 × 49 = 1176 and 32 × 49 = 1568. The mAP results on ASIG8K, ASIG14K and ASIG100K with these groups are presented in Fig. 8. It is worth noting that to simplify the testing procedure and make a fair comparison, all features are clustered to generate visual vocabularies with a size of 20K. Also, only the PolarSIFT features are quantized and converted to 64 bit binary signature via HE, the Polar-DLBP features are not taken into consideration. It is shown that polar embedding with 24×25 = 600 interest points performs favorably, while further

Fig. 9. Impact of Polar-DLBP feature size on the ASIG8K, ASIG14K and ASIG100K datasets.

increasing the number at the expense of memory cost does not improve the mAP significantly, sometimes even leads to a decrease. 2) Polar-DLBP Feature Size: By connecting different CS-LBPs of the interest point and the corresponding seed points, we can generate Polar-DLBPs with different sizes. Generally, the yielding binary signature with length K P L is stored to the inverted file with K P L /8 bytes. Similar to the choice of the number of interest points, we conduct image retrieval with various sizes of Polar-DLBP (20 bit, 24 bit, 44 bit, 64 bit, 80 bit and 96 bit) on the three datasets. Fig. 9 shows the corresponding mAP result, which demonstrates that Polar-DLBP with 64 bit is a tradeoff between mAP performance and memory cost. In this experiment, the 64 bit Polar-SIFT binary signature is exploited with the Polar-DLBP binary signature to compute the similarity scores. 3) Hamming Distance Parameters: There are two parameters in the Hamming distance computation: Hamming distance threshold Th and the weighting factor σ . Fig. 10(a) shows the mAP results under different thresholds for the ASIG8K, ASIG14K and ASIG100K datasets. We can see that no matter the weight exp(−h 2 /σ 2 ) is employed to the matching function or not, the mAP first rises to the peak at Th = 44, and then gradually drops to a relative stable state. The suitable value of weighting factor σ can be determined by Fig. 10(b), which indicates that the best performance is achieved at σ = 26. B. Evaluation 1) Comparison Methods: The comparison methods are selected based on the BoVW framework with visual vocabulary generated by SIFT feature vectors after approximate

3340


Fig. 11. Comparison of mAPs using different methods on datasets with increasing sizes. D, P and PE stand for Dense, Polar and Polar Embedding, respectively.

Fig. 10. Impact of Hamming distance parameters on the ASIG8K, ASIG14K and ASIG100K datasets. (a) Hamming distance threshold; (b) Weighting factor.

k-means clustering, and thus the SIFT-like scheme is an indispensable part which determines the main body of the inverted indexing structure (i.e., the indexed keypoint belongs to which visual word). HE is a scheme to convert the SIFTlike feature vector into binary signature and save it after the Image ID to filter the false positive matches, and it must be applied with the SIFT-like schemes. DLBP feature is inborn binary to complement the SIFT+HE binary signature to achieve further improvement on discriminative ability, and it cannot be used alone because of the lack of visual vocabulary. Therefore, we compare the proposed polar embedding method for large-scale aurora image retrieval with the following methods. 1) Baseline: the BoVW approach based on the naive SIFT feature, and the interest points are determined by Hessian affine detector. 2) SIFT+DLBP: the DLBP binary feature is employed to complement the naive SIFT feature. 3) SIFT+HE: the Hamming embedding scheme is adopted for SIFT and generates 64 bit binary signature to filter candidate features which are quantized to the same visual word but have large Hamming distance. 4) SIFT+HE+DLBP: the DLBP is combined with the SIFT+HE binary signature. 5) Dense-SIFT: the BoVW approach based on the Dense-SIFT feature, and the interest points are chosen on a 29 × 29 rectangular gridding. 6) Dense-SIFT+Dense-DLBP: the Dense-DLBP binary feature is extracted to complement the Dense-SIFT feature. 7) Dense-SIFT+HE: the approach combining the Dense-SIFT feature and HE scheme. 8) Dense-SIFT+HE+Dense-DLBP: the Dense-DLBP is combined with the Dense-SIFT+HE binary signature. 9) Polar-SIFT: the proposed Polar-SIFT feature is applied to the BoVW framework, and the interest

points are determined on a 24 × 25 polar gridding. 10) Polar-SIFT+Polar-DLBP: the Polar-DLBP binary feature is applied to complement the Polar-SIFT feature. 11) PolarSIFT+HE: the approach exploits both the Polar-SIFT feature and HE scheme. In essence, the proposed Polar Embedding (PE) method equals to the Polar-SIFT+HE+Polar-DLBP approach. In practice, the retrieval performance lies to three aspects: accuracy, efficiency and memory cost. A favorable method for large-scale image retrieval is the approach able to achieve high accuracy accompanied with low query time and acceptable memory cost. Therefore, the performance of different methods are measured by the above three aspects. All datasets are tested in this experiment with the parameters determined in the previous section. 2) Accuracy: The mAP results using different methods on datasets with increasing sizes are shown in Fig. 11. Accordingly, we can draw the following conclusions. 1) The proposed Polar Embedding holds the highest retrieval accuracy on all datasets. The best mAP of 62.88% is achieved on the ASIG8K dataset, which improves the baseline by about 14%. 2) With the increasing size of datasets, the retrieval accuracy of the proposed approach maintains a high value, which is much different from the rapid decrease of mAPs in other methods. Specifically, a 6% − 10% decrease exists in other methods when the size of datasets changes from 8K to 1M, while the mAP decrease of the proposed method is only 4.89%. 3) Table I summarizes the mAP improvements of methods with DLBP feature over methods without DLBP feature. We can see that compared with the result of Polar Embedding and Polar-SIFT+HE which only lacks the combination of Polar-DLBP, a significant improvement of mAP (up to +5.96%) is achieved. This fact implies that the fusion of Polar-DLBP with Polar-SIFT contributes a lot for the final result, and the multi-feature strategy is applicable and effective. Other comparison results conform to the same conclusion. 4) Table II gives the mAP improvements of methods with polar meshing over methods with other interest point determination schemes (rectangular meshing and Hessian affine detector). The improvements of Polar Embedding


3341

TABLE I mAP I MPROVEMENTS OF M ETHODS W ITH DLBP F EATURE OVER M ETHODS W ITHOUT DLBP F EATURE . T HE D ATASETS A RE S IMPLIFIED AS T HEIR S IZES

TABLE II mAP I MPROVEMENTS OF M ETHODS W ITH P OLAR M ESHING OVER M ETHODS W ITH O THER I NTEREST P OINT D ETERMINATION S CHEMES

TABLE III

TABLE IV

C OMPARISON OF AVERAGE Q UERY T IMES (s) U SING

C OMPARISON OF M EMORY C OSTS U SING D IFFERENT M ETHODS

D IFFERENT M ETHODS ON A LL D ATASETS

over Dense-SIFT+HE +Dense-DLBP (up to +6.40%), Polar-SIFT+HE over Dense-SIFT+HE (up to +3.01%), Polar-SIFT+Polar-DLBP over Dense-SIFT+Dense-DLBP (up to +2.38%) and Polar-SIFT over Dense-SIFT (up to +2.01%) demonstrate that the proposed polar meshing performs better than the rectangular meshing. The gaps of Polar Embedding over SIFT+HE+DLBP (up to +10.30%), PolarSIFT+HE over SIFT+HE (up to +8.40%), Polar-SIFT+ Polar-DLBP over SIFT+DLBP (up to +8.66%) and Polar-SIFT over Baseline (up to +7.86%) also verify the superiority of the proposed polar meshing over Hessian affine detector. 3) Efficiency: Since the offline indexing can be performed without the interaction with users. The users are more concerned about the efficiency of online retrieval. We compare the average query time for five datasets using different methods, and the comparison result is presented in Table III. It should be noted that the time cost of feature extraction is not included for all approaches in the reported result. We can see that Dense-SIFT is the most time-consuming method, while

PER

F EATURE (PF) (B YTES ) AND ON A LL D ATASETS (GB)

SIFT+HE is the most efficient method due to the filtering effect of Hamming threshold. Although the proposed method costs more time than SIFT+DLBP, SIFT+HE, Dense-SIFT+ Dense-DLBP, Dense-SIFT+HE, Polar-SIFT+Polar-DLBP and Polar-SIFT+HE because of the increased exclusive-or operations, it still meets the requirement for user’s fast response. 4) Memory Cost: We discuss the memory costs of different methods in Table IV. For each indexed feature, as illustrated in Fig. 7, 4 bytes are allocated to store the image ID. When exploiting the HE strategy, additional 8 bytes are needed to store the 64 bit binary SIFT feature. The proposed Polar Embedding adds another 64 bits (8 bytes) to save the PolarDLBP binary signature. Generally, less than two hundred naive SIFT features can be extracted for one aurora image, while the numbers for Dense-SIFT and Polar-SIFT are 600. For the 1M dataset, the proposed method totally consumes 11.4 GB of memory, which is still acceptable. In conclusion, our method achieves high retrieval accuracy with acceptable efficiency and memory cost, demonstrating its effectiveness for large-scale aurora image retrieval.

3342


the proposed method improves the mAP from 65.42% to 84.76%, which is a remarkable achievement. Due to the lack of DLBP feature or ignoring the polar meshing, the returned images of other methods are polluted by irrelevant images, while our method ranks more relevant images to the top. D. Discussion From both the statistical results and sample results, we can see that the proposed polar embedding method achieves high retrieval accuracy compared with other methods. This improvement is due to two aspects: the usage of polar meshing scheme and the combination of DLBP feature. The polar meshing scheme determines the interest points based on the imaging principle of circular fisheye lens, and thus the generated Polar-SIFT feature is more appropriate for the aurora image than the traditional SIFT feature and Dense-SIFT feature. The proposed DLBP feature makes a good supplement for the representation of interest points, and the combination of Polar-SIFT and Polar-DLBP forms a multi-feature for indexing, which is more powerful than the single feature. However, the introduction of Polar-DLBP feature inevitably leads to the increase of memory cost. How to optimize the multi-feature indexing structure to decrease the memory cost is a problem needed to be solved in the future. V. C ONCLUSION

Fig. 12. Sample results comparing different methods on ASIG1M dataset. (a) Query image; (b) Comparison of mAPs using different methods with returned top images (rank 11 to rank 18). The false positives are marked with red dashed bounding boxes.

This paper proposes a polar embedding method for largescale aurora image retrieval. By detecting interest points with polar meshing, sufficient and representative Polar-SIFT features are extracted and utilized to generate the visual vocabulary. The Polar-SIFT feature is more suitable for images captured by circular fisheye lens than the naive SIFT and Dense-SIFT feature. To refine the discriminative ability of local feature and reduce the false positives in retrieval result, a binary Polar-DLBP feature is presented and integrated with the Polar-SIFT feature for indexing and querying. Experiments are conducted on datasets with different sizes, and the result shows that the proposed method maintains high retrieval accuracy compared with other methods. In the future, we will solve the memory consuming problem by refining the indexing structure. Also, other effective features, such as deep learning feature and semantic attribute feature, will be taken into account to further improve the retrieval performance. R EFERENCES

C. Sample Results To give a more intuitive analysis, a group of retrieval results using different methods are illustrated in Fig. 12. The query image shown in Fig. 12(a) is selected from the query set of Radial#1, and the retrieval is performed on the ASIG1M dataset. Fig. 12(b) presents the comparison of mAPs using different methods with returned images. Since the first ten returned images are true positives for all methods, we show the top images (rank 11 to rank 18) with the false positives marked in red. We can see that compared with the baseline,

[1] W. Bian and D. Tao, “Biased discriminant Euclidean embedding for content-based image retrieval,” IEEE Trans. Image Process., vol. 19, no. 2, pp. 545–554, Feb. 2010. [2] G. Quellec, M. Lamard, G. Cazuguel, B. Cochener, and C. Roux, “Adaptive nonseparable wavelet transform via lifting and its application to content-based image retrieval,” IEEE Trans. Image Process., vol. 19, no. 1, pp. 25–35, Jan. 2010. [3] S. Zhang, Q. Tian, G. Hua, Q. Huang, and W. Gao, “Generating descriptive visual words and visual phrases for large-scale image applications,” IEEE Trans. Image Process., vol. 20, no. 9, pp. 2664–2677, Sep. 2011. [4] H. Jegou, M. Douze, and C. Schmid, “Hamming embedding and weak geometric consistency for large scale image search,” in Proc. Eur. Conf. Comput. Vis., 2008, pp. 304–317.


[5] L. Zheng, S. Wang, W. Zhou, and Q. Tian, “Bayes merging of multiple vocabularies for scalable image retrieval,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 1963–1970. [6] Y. Xia, K. He, F. Wen, and J. Sun, “Joint inverted indexing,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 3416–3423. [7] W.-L. Zhao and C.-W. Ngo, “Scale-rotation invariant pattern entropy for keypoint-based near-duplicate detection,” IEEE Trans. Image Process., vol. 18, no. 2, pp. 412–423, Feb. 2009. [8] Z. Liu, H. Li, L. Zhang, W. Zhou, and Q. Tian, “Cross-indexing of binary SIFT codes for large-scale image search,” IEEE Trans. Image Process., vol. 23, no. 5, pp. 2047–2057, May 2014. [9] L. Zhang, Y. Zhang, X. Gu, J. Tang, and Q. Tian, “Scalable similarity search with topology preserving hashing,” IEEE Trans. Image Process., vol. 23, no. 7, pp. 3025–3039, Jul. 2014. [10] L. Zheng, S. Wang, and Q. Tian, “Coupled binary embedding for largescale image retrieval,” IEEE Trans. Image Process., vol. 23, no. 8, pp. 3368–3380, Aug. 2014. [11] D. Yang, G. Liao, S. Zhu, X. Yang, and X. Zhang, “SAR imaging with undersampled data via matrix completion,” IEEE Geosci. Remote Sens. Lett., vol. 11, no. 9, pp. 1539–1543, Sep. 2014. [12] X. Yang, X. Gao, J. Li, and B. Han, “A shape-initialized and intensityadaptive level set method for auroral oval segmentation,” Inf. Sci., vol. 277, pp. 794–807, Sep. 2014. [13] X. Yang, X. Gao, D. Tao, X. Li, B. Han, and J. Li, “Shapeconstrained sparse and low-rank decomposition for auroral substorm detection,” IEEE Trans. Neural Netw. Learn. Syst., doi: 10.1109/TNNLS.2015.2411613. [14] J. Sivic and A. Zisserman, “Video Google: A text retrieval approach to object matching in videos,” in Proc. IEEE Int. Conf. Comput. Vis., Oct. 2003, pp. 1470–1477. [15] O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman, “Total recall: Automatic query expansion with a generative feature model for object retrieval,” in Proc. IEEE 11th Int. Conf. Comput. Vis., Oct. 2007, pp. 1–8. [16] Y.-H. Kuo, K.-T. Chen, C.-H. Chiang, and W. H. Hsu, “Query expansion for hash-based image object retrieval,” in Proc. 17th ACM Multimedia, 2009, pp. 65–74. [17] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004. [18] A. Bosch, A. Zisserman, and X. Muoz, “Image classification using random forests and ferns,” in Proc. IEEE 11th Int. Conf. Comput. Vis., Oct. 2007, pp. 1–8. [19] T. Ojala, M. Pietikäinen, and T. Maenpaa, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 971–987, Jul. 2002. [20] Y. Wang, X. Gao, R. Fu, and Y. Jian, “Dayside corona aurora classification based on X-gray level aura matrices,” in Proc. ACM Int. Conf. Image Video Retr., 2010, pp. 282–287. [21] Q. Yang, J. Liang, Z. Hu, and H. Zhao, “Auroral sequence representation and classification using hidden Markov models,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 12, pp. 5049–5060, Dec. 2012. [22] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide-baseline stereo from maximally stable extremal regions,” Image Vis. Comput., vol. 22, no. 10, pp. 761–767, 2004. [23] K. Mikolajczyk and C. Schmid, “Scale & affine invariant interest point detectors,” Int. J. Comput. Vis., vol. 60, no. 1, pp. 63–86, 2004. [24] H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded up robust features,” in Proc. Eur. Conf. Comput. Vis., 2006, pp. 404–417. [25] Y. Ke and R. Sukthankar, “PCA-SIFT: A more distinctive representation for local image descriptors,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun./Jul. 2004, pp. II-506–II-513. [26] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “BRIEF: Binary robust independent elementary features,” in Proc. Eur. Conf. Comput. Vis., 2010, pp. 778–792. [27] S. Leutenegger, M. Chli, and R. Y. Siegwart, “BRISK: Binary robust invariant scalable keypoints,” in Proc. IEEE Int. Conf. Comput. Vis., Nov. 2011, pp. 2548–2555. [28] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: An efficient alternative to SIFT or SURF,” in Proc. IEEE Int. Conf. Comput. Vis., Nov. 2011, pp. 2564–2571. [29] S. Zhang, Q. Tian, K. Lu, Q. Huang, and W. Gao, “Edge-SIFT: Discriminative binary descriptor for scalable partial-duplicate mobile search,” IEEE Trans. Image Process., vol. 22, no. 7, pp. 2889–2902, Jul. 2013.

3343

[30] Z. Mao, Y. Zhang, and Q. Tian, “COGE: A novel binary feature descriptor exploring anisotropy and non-uniformity,” in Proc. Pacific-Rim Conf. Multimedia, 2013, pp. 359–371. [31] S. Zhang, M. Yang, X. Wang, Y. Lin, and Q. Tian, “Semantic-aware co-indexing for image retrieval,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 1673–1680. [32] Z. Arican and P. Frossard, “Scale-invariant features and polar descriptors in omnidirectional imaging,” IEEE Trans. Image Process., vol. 21, no. 5, pp. 2412–2423, May 2012. [33] T. Song and H. Li, “Local polar DCT features for image description,” IEEE Signal Process. Lett., vol. 20, no. 1, pp. 59–62, Jan. 2013. [34] K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 10, pp. 1615–1630, Oct. 2005. [35] S. Lazebnik, C. Schmid, and J. Ponce, “A sparse texture representation using local affine regions,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8, pp. 1265–1278, Aug. 2005. [36] G. Takacs, V. Chandrasekhar, S. Tsai, D. Chen, R. Grzeszczuk, and B. Girod, “Unified real-time tracking and recognition with rotationinvariant fast features,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2010, pp. 934–941. [37] B. Fan, F. Wu, and Z. Hu, “Aggregating gradient distributions into intensity orders: A novel local image descriptor,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2011, pp. 2377–2384. [38] Z. Wang, B. Fan, and F. Wu, “Local intensity order pattern for feature description,” in Proc. IEEE Int. Conf. Comput. Vis., Nov. 2011, pp. 603–610. [39] F. Tang, S. H. Lim, N. L. Chang, and H. Tao, “A novel feature descriptor invariant to complex brightness changes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2009, pp. 2631–2638. [40] E. Tola, V. Lepetit, and P. Fua, “DAISY: An efficient dense descriptor applied to wide-baseline stereo,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 5, pp. 815–830, May 2010. [41] D. Nister and H. Stewenius, “Scalable recognition with a vocabulary tree,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2006, pp. 2161–2168. [42] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Object retrieval with large vocabularies and fast spatial matching,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2007, pp. 1–8. [43] S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu, “An optimal algorithm for approximate nearest neighbor searching fixed dimensions,” J. ACM, vol. 45, no. 6, pp. 891–923, 1998. [44] C. Silpa-Anan and R. Hartley, “Optimised KD-trees for fast image descriptor matching,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2008, pp. 1–8. [45] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Lost in quantization: Improving particular object retrieval in large scale image databases,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2008, pp. 1–8. [46] W. Zhou, Y. Lu, H. Li, and Q. Tian, “Scalar quantization for large scale image search,” in Proc. ACM Multimedia, 2012, pp. 169–178. [47] A. Babenko and V. Lempitsky, “The inverted multi-index,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 3069–3076. [48] R. Fu, X. Gao, and Y. Jian, “Patchy aurora image segmentation based on ALBP and block threshold,” in Proc. IEEE Int. Conf. Pattern Recognit., Aug. 2010, pp. 3380–3383. [49] Q. Wang et al., “Spatial texture based automatic classification of dayside aurora in all-sky images,” J. Atmos. Solar-Terrestrial Phys., vol. 72, no. 5, pp. 498–508, 2010. [50] X. Yang, X. Gao, D. Tao, and X. Li, “Improving level set method for fast auroral oval segmentation,” IEEE Trans. Image Process., vol. 23, no. 7, pp. 2854–2865, Jul. 2014. [51] X. Yang, X. Gao, D. Tao, X. Li, and J. Li, “An efficient MRF embedded level set method for image segmentation,” IEEE Trans. Image Process., vol. 24, no. 1, pp. 9–21, Jan. 2015. [52] M. Heikkilä, M. Pietikäinen, and C. Schmid, “Description of interest regions with local binary patterns,” Pattern Recognit., vol. 42, no. 3, pp. 425–436, 2009. [53] M. Heikkilä, M. Pietikäinen, and J. Heikkilä, “A texture-based method for modeling the background and detecting moving objects,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 4, pp. 657–662, Apr. 2006. [54] Z. Guo, L. Zhang, and D. Zhang, “A completed modeling of local binary pattern operator for texture classification,” IEEE Trans. Image Process., vol. 19, no. 6, pp. 1657–1663, Jun. 2010.

3344


Xi Yang received the B.Eng. degree in electronic information engineering and the Ph.D. degree in pattern recognition and intelligence systems from Xidian University, Xi’an, China, in 2010 and 2015, respectively. From 2013 to 2014, she was a Visiting Ph.D. Student with the Department of Computer Science, University of Texas at San Antonio, San Antonio, TX, USA. She is currently with the State Key Laboratory of Integrated Services Networks, Xidian University. Her current research interests include image/video processing, computer vision, and multimedia information retrieval.

Xinbo Gao (M’02–SM’07) received the B.Eng., M.Sc., and Ph.D. degrees in signal and information processing from Xidian University, Xi’an, China, in 1994, 1997, and 1999, respectively. From 1997 to 1998, he was a Research Fellow with the Department of Computer Science, Shizuoka University, Shizuoka, Japan. From 2000 to 2001, he was a Post-Doctoral Research Fellow with the Department of Information Engineering, Chinese University of Hong Kong, Hong Kong. Since 2001, he has been with the School of Electronic Engineering, Xidian University. He is currently a Cheung Kong Professor with the Ministry of Education, a Professor of Pattern Recognition and Intelligent System, and the Director of the State Key Laboratory of Integrated Services Networks, Xi’an. His current research interests include multimedia analysis, computer vision, pattern recognition, machine learning, and wireless communications. He has published five books and around 200 technical articles in refereed journals and proceedings, including the IEEE T RANSACTIONS ON I MAGE P ROCESSING, the IEEE T RANSACTIONS ON N EURAL N ETWORKS AND L EARNING S YSTEMS , the IEEE T RANSACTIONS ON C IRCUITS AND S YSTEMS FOR V IDEO T ECHNOLOGY , the IEEE T RANSACTIONS ON S YSTEMS , M AN , AND C YBERNETICS , the International Journal of Computer Vision, and Pattern Recognition. Prof. Gao is on the Editorial Boards of several journals, including Signal Processing (Elsevier) and Neurocomputing (Elsevier). He served as the General Chair/Co-Chair, Program Committee Chair/Co-Chair, or a PC Member for around 30 major international conferences. He is currently a fellow of the Institution of Engineering and Technology.

Qi Tian (M’96–SM’03) received the B.E. degree in electronics engineering from Tsinghua University, China, in 1992, the M.S. degree in electrical and computer engineering from Drexel University, in 1996, and the Ph.D. degree in electrical and computer engineering from the University of Illinois at Urbana–Champaign, in 2002. He is currently a Professor with the Department of Computer Science, University of Texas at San Antonio (UTSA). He took a one-year faculty leave with Microsoft Research Asia from 2008 to 2009. He has authored over 290 refereed journal and conference papers. His research interests include multimedia information retrieval and computer vision. His research projects were funded by NSF, ARO, DHS, SALSI, CIAS, and UTSA. He received faculty research awards from Google, the NEC Laboratories of America, FXPAL, Akiira Media Systems, and HP Labs. He was the co-author of the ACM ICMCS 2012 Best Paper, the MMM 2013 Best Paper, the PCM 2013 Best paper, the Top 10% Paper Award in MMSP 2011, the Best Student Paper in ICASSP 2006, and the Best Paper Candidate in PCM 2007. He received the 2010 ACM Service Award. Dr. Tian has served as the Program Chair, an Organization Committee Member, and TPCs for numerous IEEE and ACM conferences, including ACM Multimedia, SIGIR, ICCV, and ICME. He is on the Editorial Board of the IEEE T RANSACTIONS ON M ULTIMEDIA, the IEEE T RANSACTIONS ON C IRCUIT AND S YSTEMS FOR V IDEO T ECHNOLOGY , Multimedia System Journal, Journal of Multimedia, and Journal of Machine Visions and Applications. He is also a Guest Editor of the IEEE T RANSACTIONS ON M ULTIMEDIA, Journal of Computer Vision and Image Understanding, Pattern Recognition Letter, the EURASIP Journal on Advances in Signal Processing, and the Journal of Visual Communication and Image Representation.

Multiview locally linear embedding for effective medical image retrieval.

An image retrieval framework for real-time endoscopic image retargeting.

Pareto-depth for multiple-query image retrieval.

Software suite for image archiving and retrieval.

Spectral embedded hashing for scalable image retrieval.

Multiscale distance coherence vector algorithm for content-based image retrieval.

Dictionary Pruning with Visual Word Significance for Medical Image Retrieval.

Towards large-scale histopathological image analysis: hashing-based image retrieval.

Local mesh quantized extrema patterns for image retrieval.

Image Retrieval Method for Multiscale Objects from Optical Colonoscopy Images.

L(p) -norm IDF for scalable image retrieval.

Literature-based biomedical image classification and retrieval.

QR images: optimized image embedding in QR codes.

Medical Image Retrieval: A Multimodal Approach.

Content-based histopathology image retrieval using CometCloud.

Vividness of image and retrieval time.

Evaluating performance of biomedical image retrieval systems--an overview of the medical image retrieval task at ImageCLEF 2004-2013.

Dual-geometric neighbor embedding for image super resolution with sparse tensor.

A Local Structural Descriptor for Image Matching via Normalized Graph Laplacian Embedding.

Extreme learning machine based optimal embedding location finder for image steganography.

Computer-aided diagnosis of mammographic masses using scalable image retrieval.

Parallel content-based sub-image retrieval using hierarchical searching.

Accelerating content-based image retrieval via GPU-adaptive index structure.

Automatic Detection of Galaxy Type From Datasets of Galaxies Image Based on Image Retrieval Approach.