1180

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 7, JULY 2014

Spectral Embedded Hashing for Scalable Image Retrieval Lin Chen, Dong Xu, Senior Member, IEEE, Ivor Wai-Hung Tsang, and Xuelong Li, Fellow, IEEE

Abstract—We propose a new graph based hashing method called spectral embedded hashing (SEH) for large-scale image retrieval. We first introduce a new regularizer into the objective function of the recent work spectral hashing to control the mismatch between the resultant hamming embedding and the low-dimensional data representation, which is obtained by using a linear regression function. This linear regression function can be employed to effectively handle the out-of-sample data, and the introduction of the new regularizer makes SEH better cope with the data sampled from a nonlinear manifold. Considering that SEH cannot efficiently cope with the high dimensional data, we further extend SEH to kernel SEH (KSEH) to improve the efficiency and effectiveness, in which a nonlinear regression function can also be employed to obtain the low dimensional data representation. We also develop a new method to efficiently solve the approximate solution for the eigenvalue decomposition problem in SEH and KSEH. Moreover, we show that some existing hashing methods are special cases of our KSEH. Our comprehensive experiments on CIFAR, Tiny-580K, NUS-WIDE, and Caltech-256 datasets clearly demonstrate the effectiveness of our methods. Index Terms—Spectral embedded, hashing, scalable, image retrieval.

I. Introduction ONTENT-based image retrieval, which aims to search visually similar database images for a given query image, has attracted substantial attention over the past few decades. However, with the explosive increase of web images, we need to cope with new challenges such as how to store the high dimensional image descriptors and how to calculate the similarities between the query image and database images. There is an increasing research interest in developing new

C

Manuscript received February 23, 2013; revised June 17, 2013; accepted July 23, 2013. Date of publication November 19, 2013; date of current version June 12, 2014. This work was supported in part by the National Basic Research Program of China (973 Program) under Grant 2012CB316400, in part by the National Natural Science Foundation of China under Grant 61125106 and Grant 61072093, and in part by the Shaanxi Key Innovation Team of Science and Technology under Grant 2012KCT-04. This work was also supported in part by the Singapore National Research Foundation under its IDM Futures Funding Initiative and administered by the Interactive and Digital Media Program Office, Media Development Authority. This paper was recommended by Associate Editor B. W. Schuller. L. Chen, D. Xu, and I. W.-H. Tsang are with the School of Computer Engineering, Nanyang Technological University, Singapore 639798 (e-mail: [email protected]; [email protected]; [email protected]). X. Li is with the Center for Optical Imagery Analysis and Learning (OPTIMAL), State Key Laboratory of Transient Optics and Photonics, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, China (e-mail: xuelong− [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCYB.2013.2281366

hashing methods for large-scale image retrieval [1]–[16]. These hashing methods aim to map the high dimensional image descriptors into compact binary codes to efficiently calculate the hamming distances between the query image and the database images. Most of the existing hashing methods can provide constant or sublinear search time such that the efficiency of nearest neighbor search in image retrieval can be significantly improved. While the seminal work locality-sensitive hashing (LSH) [17] and its extensions [6], [9] enjoy asymptotic properties in theory, these LSH-related methods can only achieve reasonable precision and recall when using more binary codes from multiple hash tables. After comparing LSH with the hash functions learnt by using boosting and restricted Boltzmann machines, Torralba et al. [18] experimentally demonstrated that the learning based approaches can achieve better image retrieval performance. Wang et al. [12] developed a semisupervised hashing method for image retrieval by effectively utilizing both labeled and unlabeled data. Norouzi and Fleet [19] developed a hashing method by minimizing a hinge loss-like function. Inspired by multiclass spectral clustering, Gong et al. [1] proposed a new method referred to as iterative quantization (ITQ) to minimize the quantization error by rotating the data points. Their algorithm can be combined with the unsupervised dimension reduction methods such as principal component analysis (PCA) and supervised dimension reduction algorithm such as canonical correlation analysis (CCA). Song et al. [20] proposed a multiple feature hashing (MFH) method for nearduplicate video retrieval by preserving the local structure for each individual feature and by considering the local structures of all the features in a global fashion. Moreover, kernel based hashing methods were also developed. Joly and Buisson [8] proposed a method called random maximum margin hashing by iteratively learning a set of SVM classifiers with the corresponding positive and negative training data randomly chosen from the training dataset. He et al. [7] developed a new hashing method called optimized kernel hashing (OKH) to handle large dataset, which is applicable for general types of data with any kernel. Recently, Zhu et al. [16] proposed a sparse hashing method. In their work [16], the original data is first converted into the low-dimensional representation by using a nonnegative sparse coding method, and then a binarization method is employed to obtain the hamming embedding. Graph-based hashing methods have become a popular research direction in hashing. Motivated by spectral graph

c 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. 2168-2267  See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

CHEN et al.: SPECTRAL EMBEDDED HASHING FOR SCALABLE IMAGE RETRIEVAL

partitioning, Weiss et al. [21] proposed spectral hashing (SH) to seek the data-dependent compact binary codes such that the neighboring data points in the original Euclidean space are still neighbors in the Hamming space. In their out-of-sample extension, they employed a separable Laplacian eigenfunction formulation to obtain the analytical solution by assuming that the data points are uniformly distributed in a high dimensional rectangle. With this strong assumption, the manifold structure of the original data points may not be well discovered, which may significantly degrade the performance in real image retrieval applications [11]. Weiss et al. [5] extended spectral hashing to multidimensional spectral hashing. Liu et al. [11] proposed a method called anchor graph hashing (AGH) to approximate the similarity between a pair of data points in the graph by leveraging a small number of cluster centers (called anchors in [11]) from k-means clustering. To further improve the retrieval performance, they also developed a hierarchical thresholding learning method to generate multiple bits for each eigenfunction. In many existing hashing methods such as [1] and [12], without considering the bias term, the hashing function h(x) is constrained as sign(W x) where the function sign (v) returns a vector of the sign of each entry in the vector v. However, this can be over-strict for the data sampled from a nonlinear manifold. Thus, in this paper, we develop a new graph based hashing method called spectral embedded hashing (SEH) for large-scale image retrieval by introducing a new flexible regularizer y − W x2 into the objective function of SH. This new regularizer controls the mismatch between the resultant hamming embedding representation y and the low dimensional data representation W x of a data point x, which is obtained by using a linear regression function f (x) = W x. In contrast to [1] and [12] that enforce a hard constraint, our method can better cope with the data sampled from a nonlinear manifold by using the flexible regularizer. Moreover, this linear regression function can be readily employed to handle the unseen data points (e.g., the query images), and it can be further extended to a corresponding nonlinear regression function in our kernel extension called kernel SEH (KSEH). We show that we need to solve an eigenvalue decomposition problem in SEH (respectively KSEH), in which the size of the corresponding matrix for eigenvalue decomposition is n × n with n being the total number of data points. To efficiently solve this eigenvalue decomposition problem in SEH and KSEH for scalable image retrieval, we additionally develop a new method to obtain the approximate solution. We conduct comprehensive experiments using four datasets CIFAR, Tiny580K, NUS-WIDE, and Caltech-256, and the results clearly demonstrate that our SEH and KSEH achieve promising results when compared with the state-of-the-art unsupervised hashing methods. II. Spectral Embedded Hashing Throughout the rest of this paper, a superscript  denotes the transpose of a vector or a matrix, and 0n , 1n ∈ Rn denote the zero vector and the vector of all ones, respectively. In ∈ Rn×n

1181

is the identity matrix. Let us denote the training dataset as x = {xi |ni=1 } with xi ∈ Rd being the ith data point in the input space and n being the total number of data points. We denote the binary codes for xi in the hamming space as yi ∈ {−1, 1}r , where r is the length of hashing codes. For better presentation, we also denote X = [x1 , ..., xn ] ∈ Rd×n and Y = [y1 , ..., yn ] ∈ {−1, 1}n×r . A ∈ Rn×n is defined as the affinity matrix with Aij representing the similarity between xi and xj . Finally, we define f (xi ) ∈ Rr as a new representation of the data xi in the low dimensional space and F = [f (x1 ), ..., f (xn )] ∈ Rn×r . A. Proposed Formulation One of the most prominent hashing methods is to find a hash function that maps any data point x into binary codes using the following formula: h(x) = sign(f (x)).

(1)

Let us assume that a linear regression function f (x) = W x + b is used where W ∈ Rd×r is the projection matrix, b = [b1 ,· · · ,br ] ∈ Rr is the bias term. Then, h(x) = sign(W x + b) maps x into r binary codes. The hash function h(x) can be defined by using different choices of W and b subject to various criteria. For example, LSH [17] uses a randomly chosen W and b. Instead of using an explicit hash function in (1), another popular approach is graph-based hashing. Motivated by spectral graph partitioning, Weiss et al. [21] proposed SH to directly seek the data-dependent compact binary codes for representing the data by solving the following objective function: min Y

s.t.

tr(Y LY), Y ∈ {−1, 1}n×r , 1  Y Y = Ir , Y 1n = 0r , n

(2) (3) (4)

where L is either an unnormalized or normalized graph Laplacian matrix [21]. Hence, the original neighborhood of the data in the Euclidean space can be preserved in the learnt hamming space. Constraint (3) for the binary solution is always relaxed. However, in order to cater for out-ofsample data, their approach has to assume that the data points are uniformly distributed in a high dimensional rectangle. With such a strong assumption, the manifold structure of the original data points may not be well discovered, which may significantly degrade the retrieval performance on large-scale image datasets [11]. To combine the advantages in these  two approaches, in this paper, we introduce a new regularizer ni=1 yi − f (xi )2 into the objective function of SH in (2) to control the mismatch between the resultant hamming embedding representation yi and the low dimensional data representation f (xi ) of a data point xi . This mapping function f (x) can also be employed to readily cope with the unseen data points (e.g., the query images). When a linear regression function f (x) = W x + b is

1182

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 7, JULY 2014

used, the hash function can be learned by solving the following optimization problem: n   min tr(Y LY) + μ ξ i 2 + γg tr(W W) , (5) Y,W,b,ξ

i=1 

s.t. ξ i = yi − (W xi + b), 1  Y Y = Ir , Y 1n = 0r , Y ∈ {−1, 1}n×r . n

(6)

Here, the first term tr(Y LY) is the same as that in SH, which enforces that neighboring data points in the original Euclidean space have similar hamming embeddings. The second term is our new regularizer and it is introduced to penalize the slack variables that measure the mismatch between the hamming embedding representation Y and the low dimensional data representation defined as follows: 



F = X W + 1n b .

Substituting the above equations back into (5), we arrive at the following optimization problem: arg min

tr(Y MY)

(10)

1   n Y Y=Ir ,Y 1n =0r

where (11) M = L + μLg  −1 with Lg = Hn −X XX + γg Id X and Hn = In − n1 1n 1 n is the centering matrix. Since M in (11) considers both the manifold structure (defined by L) and the discriminative subspace structure (defined by Lg ) [23], the objective in (10) would find a discriminative low dimensional data representation that captures the neighborhood information for the hash function. Moreover, (7) gives a natural embedding for unseen data; thus, we call our proposed method SEH. The optimal solution Y∗ of the optimization problem (10) can be solved by using the eigenvalue decomposition on M, and Y∗ can be constructed by using the eigenvectors of M corresponding to the r smallest eigenvalues after discarding  

(Y∗ R) 1n = R Y∗ 1n = 0r .

(7)

The third one is the regularization term tr(W W), which regulates the complexity of W and is used to avoid overfitting. Also, μ and γg are two tradeoff parameters. Note that, unlike many existing hashing methods [1], [12] that enforce a hard constraint, i.e., ξ i = 0r , we ninstead2 adopt a more flexible constraint by minimizing i=1 ξ i  such that our method SEH can better cope with the data sampled from a nonlinear manifold. The same regularizer has been used in the context of clustering [22]. Now, we show how to solve the optimization problem in (6). As in SH [21], we also relax the constraint for the binary solution as a real-valued solution. By substituting the constraint in (6) back into the objective function (5), setting the derivatives of the objective function with respect to W and b to zeros, and assuming the data are centered, i.e., X1n = 0d , we have 1 b = Y  1n , (8) n −1   XY. (9) W = XX + γg Id

Y∗ =

the trivial solution. Moreover, the optimal W∗ and b∗ can be recovered according to (9) and (8), respectively. Lemma 1: Let Y∗ ∈ Rn×r be the optimal solution of (10) and R be any r × r orthogonal matrix, then Y∗ R is also an optimal solution of the optimization problem (10). Proof: Lemma 1 can easily be proved by realizing that   tr((Y∗ R) MY∗ R) = tr R Y∗ MY∗ R   = tr Y∗ MY∗ RR   = tr Y∗ MY∗  1 ∗  ∗ 1 ∗ ∗ (Y R) Y R = R Y Y R n n  = R Ir R = Ir

Based on Lemma 1, the hashing function can be represented as    h(x) = sign R W∗ x + b∗ . (12) In order to recover the optimal binary codes, we learn an optimal rotation matrix R∗ by using ITQ [1], in which the optimal R∗ is determined as the one that minimizes the quantization error on the training data. Namely, we solve for R∗ by minimizing the following objective function: n 

∗  

Rf (xi ) − sign Rf ∗ (xi ) 2 (13) R∗ = arg min R R=Ir



i=1

∗

with f (x) = W x + b∗ . Intuitively, the resultant hamming embedding can better preserve the neighborhood of the original data if the quantization error is smaller. B. Kernel Extension When the feature dimension d is high, it is very time consuming to compute M in (11), as it requires an inverse operation for a d-by-d matrix XX + γg Id . To address this computational issue and to further improve the retrieval performance by using a nonlinear regression function to obtain the low dimensional data representation, we employ the kernel trick [24] to handle the high dimensional data. In this subsection, we will present the kernel extension of SEH (KSEH). The time complexity of the proposed KSEH is independent of the feature dimension, making it very efficient to compute the hashing codes when the feature dimension is very high. Specifically, we first perform k-means clustering to obtain p cluster centers (i.e., {z1 , ..., zp }), which are used for set representation of W = p forming the reduced p α φ(z ), · · · , α φ(z i1 i ir i ) , where φ(·) is the kernel k i=1 i=1 induced feature mapping such that k(zi , zj ) = φ(zi ) φ(zj ) [24]. Then, the low p dimensional data representation can be written as fj (x) = i=1 αij φ(zi ) φ(x)+bj for j = 1, ..., r. In the matrix  form, we have F = Kpn α + 1n b , where α = [αij ] ∈ Rp×r , p×n is the kernel matrix defined between the p cluster Kpn ∈ R centers in the reduced set and the n data points. Similar to

CHEN et al.: SPECTRAL EMBEDDED HASHING FOR SCALABLE IMAGE RETRIEVAL

the linear case, we need to solve the following optimization problem: n   ξ i 2+γg tr(α Kpp α) , (14) min tr(Y LY)+μ Y,α,b,ξ

s.t.

ξ i = yi −

p 

1183

Algorithm 1: Algorithm of SEH 1: 2:

i=1



α j· k(zj , xi ) + b ,

(15)

3:

j=1

1  Y 1n = 0r , Y ∈ {−1, 1}n×r , Y Y = Ir , n where Kpp ∈ Rp×p is the kernel matrix defined among the p cluster centers in the reduced set, and αi· denotes the ith row of α. Similar to SEH, we relax the constraint for the binary solution as the real-valued solution, substitute the constraint in (15) back into the objective function (14) and then set the derivatives of the objective function with respect to α and b to be zeros. We assume that Kpn is centered, i.e., Kpn 1n = 0p ; then, we have 1 (16) b = Y 1n , n   −1  α = Kpn Kpn + γg Kpp Kpn Y. (17) Substituting α and b back into the objective function in (14) and performing simple matrix operations, we can arrive at the following problem: Y∗ =

arg min

˜ tr(Y MY),

(18)

1   n Y Y=Ir ,Y 1n =0r

which is the same as (10), except that ˜ = L + μL ˜g M (19)   −1   ˜ g = Hn − Kpn Kpn Kpn + γg Kpp Kpn . Note that, with L  the inverse of the matrix Kpn Kpn + γg Kpp generally exists, because this matrix is positive definite when setting γg > 0 and employing the commonly used kernels such as RBF kernel and Laplacian kernel. Note that we only need to calculate ˜ which is the inverse of a p-by-p matrix to compute M, independent of the feature dimension. Again, the optimal solutions of α∗ and b∗ can be recovered according to (17) and (16) based on the optimal solution Y∗ ∗ in (18). In addition, the optimal rotation matrix p R can be also ∗ computed by solving (13) with f ∗ (x) = i=1 α∗ i· k(zi , x) + b . The hashing function of KSEH can then be represented as p    ∗ , (20) α∗ h(x) = sign R∗ i· k(zi , x) + b i=1

where

α∗i·

denotes the ith row of α∗ .

C. Scalability and Complexity Since we need to conduct an eigenvalue decomposition on ˜ ∈ Rn×n in (19), which the matrix M ∈ Rn×n in (11) or M 3 requires O(n ) time complexity. Thus, it is computationally prohibitive when n becomes very large. Thus, solving the optimization problems (10) and (18) prohibits SEH and KSEH from scaling up to large datasets. We observe that the ranks ˜ g are intrinsically low if the feature dimension of Lg and L d is low in the linear case and p is small in the kernel

4: 5:

6: 7: 8: 9:

Input: Training data X = [x1 , ..., xn ] ∈ Rd×n and number of hashing bits r; Construct p anchor points using k-means and calculate Z according to [11] and  = diag(Z1p ); Conduct eigenvalue decomposition, i.e., 1  2  ˜ = − XX + γ g Id = Vd  d Vd , and set X d Vd X;  1 √ μ − (p+d+1)×n  ˆ = ˜ , Set X μX 1 , Z 2 ∈ R ; n n  ˆ ˆ Compute the r eigenvectors U·i of nXX with the smallest eigenvalues σi , i = 1, ..., r and denote U = [U·1 , ..., U  = diag{σ1 , ..., σr }; √·r ]ˆand 1  Obtain Y∗ = nX U− 2 ; Recover W∗ and b∗ according to (9) and (8); Learn the optimal rotation R∗ by solving (13) with f ∗ (x) = W∗ x + b∗ ; Output: hashing   function   h(x) = sign R∗ W∗ x + b∗ .

case, respectively. When the rank of L is also low, there exist efficient ways to perform eigenvalue decomposition for M and ˜ A natural way to deal with this problem is to approximate M. L using low rank approximation methods. In the following, we will describe how to deal with the linear case in detail. Assuming that the normalized graph Laplacian matrix is used, i.e., L = In − A, we adopt a recently proposed anchor graph method in [11], which has shown promising results in both semi-supervised learning [25] and hashing [11]. The affinity matrix computed using anchor graph is A = Z−1 Z , where  = diag(Z1m ) and Z ∈ Rn×m is the similarity matrix between the n data points and m anchor points, which can be randomly sampled data points from the training data or cluster centers. In this paper, we directly use the p cluster centers in the reduced set (i.e., {z1 , ..., zp }) as anchor points, so we have m = p. Usually, Z is a sparse matrix constructed by using s nearest neighbors, namely the ith row of Z only contains s nonzero entries corresponding to the s nearest anchor points of the corresponding data point xi . In practice, we usually have1 s  p  n, and we can obtain Z in O(spn) time complexity [25]. On the other hand, we can easily obtain XX + γg Id = Vd d Vd by conducting eigenvalue decomposition, and it requires only O(nd 2 + d 3 ) time complexity that is linear with respect to n. Thus, it can be performed very efficiently for small value of d. Hereafter, X (XX + γg Id )−1 X can be 1 2  ˜ X ˜ with X ˜ = − factorized as X d Vd X. Then, we have a simpler form of Lg as follows: Lg = Hn − X (XX + γg Id )−1 X   −1 1  XX + γg Id X = In − 1n 1 n −X n 1 ˜˜ = In − 1n 1 n − X X. n

(21)

1 In our experiments, we fix s = 5, p = 300 and the number of images in the dataset, n, is more than 30 000, in which the condition, s  p  n, is generally satisfied.

1184

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 7, JULY 2014

A. Spectral Hashing (SH)

Algorithm 2: Algorithm of KSEH 1: 2:

3:

4:

5: 6:

7: 8: 9: 10:

Input: Training data X = [x1 , ..., xn ] ∈ Rd×n and number of hashing bits r. Perform k-means to obtain the p cluster centers {z1 , ..., zp }, which are used as anchor points as well as for the reduced set representation; Compute the kernel matrices Kpn defined between the p cluster centers and the n data points and Kpp defined among the p cluster centers, calculate Z according to [11] and  = diag(Z1n ); Conduct eigenvalue decomposition, i.e., 1 2   ˜ pn = − Kpn Kpn + γg Kpp = Vp p Vp , and set K p Vp Kpn ; 

 μ − 21  ˆ = √μK ˜ pn ∈ R(2p+1)×n ; Set K , 1 , Z n n ˆK ˆ  with the Compute the r eigenvectors U·i of nK smallest eigenvalues σi , i = 1, ..., r and denote U = [U·1 , ..., U  = diag{σ1 , ..., σr }; √·r ] ˆand 1  Obtain Y∗ = nK U− 2 ; Recover α∗ and b∗ according to (17) and (16); Learn the rotation R∗ by solving (13) with optimal p ∗ ∗ f (x) = i=1 αi· k(zi , x) + b∗ ; Output: hashing function p    ∗ h(x) = sign R∗ . α∗ k(z , x) + b i i· i=1

Therefore, M in (10) can be also simplified accordingly. Namely, we have   1 ˜ ˜ − X X M = In − Z−1 Z + μ In − 1n 1 n n  ˆ ˆ X, = (μ + 1)In − X (22)





 1 

μ ˆ = ˜ , where X μX 1 , Z− 2 ∈ R(p+d+1)×n . n n When d and p are small, (10) can be solved quite effiˆX ˆ ∈ ciently by solving the eigenvalue decomposition of X (p+d+1)×(p+d+1) 3 , which requires only O((p + d + 1) ) time R complexity. The training procedure of SEH is shown in Algorithm 1. For the kernel case, following the similar procedure, we can  obtain Kpn Kpn +γg Kpp = Vp p Vp with O(np2 +p3 ) time com −1   Kpn Kpn Kpn + γg Kpp plexity. Then, we can factorize Kpn and solve the eigenvalue decomposition problem of a matrix with the size of (2p + 1) × (2p + 1) instead, which requires only O((2p + 1)3 ) time complexity. Thus, it is very efficient when p is small. The training procedure of KSEH is shown in Algorithm 2.

III. Discussion With Related Work In this section, we discuss the relationship of our methods SEH and KSEH with some popular hashing works. We will show that SH [21], AGH [11], and OKH [7] are special cases of our methods.

The objective function of SH [21] is as follows: min tr(Y LY), Y

s.t. Y ∈ {−1, 1}n×r , 1  Y Y = Ir , Y 1n = 0r . n It can easily be verified that it is a special case of the objective function of KSEH in (14). Specifically, the objective function of KSEH reduces to the objective function of SH when setting μ = 0. Moreover, SH cannot effectively cope with the out-ofsample data, because it needs to assume that the data points are uniformly distributed in a high dimensional rectangle. In contrast, KSEH has a natural out-of-sample extension by using the nonlinear regression function. B. PCA Hashing (PCAH) Gong et al. [1] first found the embedding of the data (i.e., Y = X W) using PCA by optimizing the following problem: W∗ = arg max tr(W XX W).

(23)

W W=I

Then, an ITQ approach is carried out to transform the embedding into binary codes by minimizing the quantization error on the training data. Compared with SEH, PCA hashing only maximizes the variance while ignores the similarity preserving constraint. So SEH can better cope with the data sampled from a nonlinear manifold. C. Anchor Graph Hashing Liu et al. [11] proposed AGH, which solves the following optimization problem: min tr(Y LY),

(24)

Y

s.t.

1  Y Y = Ir , n

Y 1n = 0r ,

Y ∈ {−1, 1}n×r ,

where L = In − A with A = Z−1 Z . They also relaxed the constraint for binary solution. Based on eigenvalue decomposition, i.e., Z −1 Z = VV , the solution can be obtained as 1 1 1 1 Y = Z− 2 V− 2 = ZW with W = − 2 V− 2 , and the final hashing function is explicitly represented as   Wi· k(zi , x) , h(x) = sign (25) zi ∈Nx

where Nx is the s nearest anchor points to x, and Wi· is the ith row of W. Essentially, KSEH shares quite similar properties with AGH. Specifically, if the anchor points in KSEH are used as the reduced set representation, the final hashing function is similar to AGH. By looking deeper into the solutions to Y of AGH and KSEH, we find that they are all adapted from the eigenvectors related to A. For AGH, each column of Y is an eigenvector of A, while each column of Y is an for KSEH,   −1 eigenvector of A + μ n1 1n 1 + K (K pn Kpn + γg Kpp ) Kpn . n pn Thus, KSEH additionally uses discriminative subspace structure of the data to find the hash function and the solution Y of KSEH reduces to that of AGH when setting μ = 0. In addition, the out-of-sample extension of AGH using Nystr¨on

CHEN et al.: SPECTRAL EMBEDDED HASHING FOR SCALABLE IMAGE RETRIEVAL

method [26] does not have an explicit objective function, while KSEH provides an inherent out-of-sample extension to unseen data. D. Optimized Kernel Hashing (OKH) OKH [7] solves the following optimization problem: min Y LY + γtr(α Kpp α), α,b

 α + 1 n b , s.t. Y = Kpn 1  Y Y = Ir , n  Y 1r = 0r , Y ∈ {−1, 1}n×r ,

(26) (27)

where Kpn and Kpp are the same as in Section II-B. It is easy to show that KSEH reduces to OKH when setting μ = ∞ and μγg = γ. To accelerate OKH, He et al. suggested several ways to approximate the graph Laplacian matrix L. Due to the efficiency and effectiveness of the anchor graph [11], we also use the anchor graph to construct the graph Laplacian matrix, i.e., L = In − A with A = Z−1 Z (Section III-C). The resultant hamming embedding Y of OKH is enforced to be the low dimensional data representation [see the hard constraint in (27)]. In contrast, in KSEH we introduce an additional term to control the mismatch between the hamming embedding Y and the low dimensional data representation such that KSEH can better cope with the data sampled from a nonlinear manifold. E. Multiple Feature Hashing Our work is also related to MFH [27]. When there is only one type of feature, MFH reduces to our SEH in (5) with only slight difference in the constraints. However, MFH cannot efficiently handle high dimensional features because it needs to calculate the inverse of a d × d matrix as in our SEH. In contract, our KSEH can efficiently handle high dimensional features (see Section II-B). Moreover, the ITQ [1] technique is not employed in MFH, which may lead to inferior performance.

IV. Experiments We evaluate our methods SEH and KSEH on four realworld datasets CIFAR, Tiny-580K [28], NUS-WIDE [29], and Caltech-256 [30]. We compare our SEH and KSEH with the base algorithms SH [21], LSH [17] and its kernel extensions KLSH [9] and SKLSH [31]. We only report the results from the two-layer AGH (2-AGH) in [11], in which a hierarchical thresholding learning procedure is used to create multiple bits for each eigenfunction because it achieves better image retrieval performance as reported in [11]. For simpler presentation, in this paper, we refer to the unsupervised hashing method in [1] as PCA hashing (PCAH), and we also report the results of KPCAH [1], which is a kernel extension of PCAH [1]. Whenever possible, we also employ the ITQ technique [1] for other methods (e.g., OKH) by default to conduct a fair comparison. However, it is unclear how to directly combine the ITQ technique [1] with the hashing methods such as SH,

1185

LSH, KLSH, SKLSH, and 2-AGH; we only report the results from their original algorithms. Because the ITQ technique can readily be combined with OKH, PCAH, KPCAH as well as our proposed methods SEH and KSEH, we also report the results without using the ITQ technique for these methods to conduct a comprehensive comparison. We denote these methods without using the ITQ technique by OKH-wo-ITQ, PCAH-wo-ITQ, KPCAH-wo-ITQ, SEH-wo-ITQ, and KSEHwo-ITQ, respectively. A. Evaluation Protocols Both the query images and the database images can be represented by a few number of binary codes using the various hashing methods, and then the hamming distances between the query images and the database images can be computed very efficiently by using the bit xor operation. For any query image, the database images can be ranked based on the hamming distances and the search results are evaluated based on whether the returned images are the true neighbors of the query image. In this paper, we define the true neighbors based on two criteria. 1) The first criterion defines the true neighbors by using the nearest neighbors of the query image in the Euclidean space. Specifically, the goal is to find the database images that are the nearest neighbors of the query image based on the Euclidean distance. In the experiments, we follow [31] to determine a nominal threshold such that the average number of Euclidean neighbors is 50. 2) The second criterion defines the true neighbors by using the semantic neighbors of the query image, which share the same class label with the query image. Following [1], we first evaluate our methods on the large Tiny580K dataset by using the nearest neighbors of the query image in the Euclidean space as the true neighbors. It is also worth mentioning that, in this paper, we focus on the scalable image retrieval task, in which the end users are generally interested in retrieving the database images with the same semantic label as the query image. Therefore, we focus our evaluation on three datasets, including CIFAR (a subset of tiny image dataset), NUS-WIDE, and Caltech-256 by using the semantic neighbors as the true neighbors. As suggested in [11], we use the noninterpolated average precision (AP) for performance evaluation. It corresponds to the area under the recall/precision curve, and incorporates the effect of recall when AP is computed over the entire classification result set. Mean average precision is the mean of APs over all the query images. B. Datasets and Experimental Setup 1) Dataset Without Ground-Truth Labels: The Tiny-580K dataset is a subset of tiny image dataset [28], and it contains 580 000 images without having any ground-truth labels. Therefore, we use this dataset for performance evaluation by using the nearest neighbors in the Euclidean space as the true neighbors. In this paper, we use the default GIST feature representation for each image2 and randomly sample 1000 2 The

feature can be downloaded from .

1186

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 7, JULY 2014

Fig. 1. Performance comparison of different algorithms on the Tiny-580K dataset by using the nearest neighbors in the Euclidean space as the true neighbors. (a) Comparison before using the ITQ technique. (b) Comparison after using the ITQ technique. (c) Comparison of our methods KSEH and SEH with the rest methods (i.e., LSH, KLSH, SKLSH, SH, and 2-AGH) which cannot be readily combined with the ITQ technique.

images as the query set and employ the rest images as the training and database images. 2) Datasets With Ground-Truth Labels: We evaluate our methods on the following three datasets with ground-truth labels, in which the semantic neighbors are used as the true neighbors. The CIFAR dataset, which is also a subset of tiny image dataset [28], consists of 64 185 color images with the resolution of each image as 32 × 32 pixels. Each image is manually labeled as one of the 11 classes: Airplane, Automobile, Bird, Boat, Cat, Deer, Dog, Frog, Horse, Ship, and Truck. Following [1], we extract the 320-dimensional GIST descriptor from each image using the codes3 from Oliva and Torralba with the default setting [32]. In our experiment, we randomly sample 1000 images as the query set and employ the rest images as the training and database images. The NUS-WIDE dataset contains 269 648 images and their ground-truth annotations for 81 concepts. In this dataset, some images are annotated as multiple class labels. In this paper, we choose a subset with 79 216 images, in which each image is only associated with one class label. Three types of global features [i.e., grid color moment (225 dim.), wavelet texture (128 dim.), and edge direction histogram (73 dim.)] are extracted for each image. Finally, each image is represented as a single 426-D vector by concatenating the three types of global features. Each dimension is then normalized to be zero mean and unit variance. In our experiment, we randomly sample 10% images from each class to construct the query set. The rest of the images are exploited as the training and database images. The Caltech-256 dataset contains 30 607 images from 256 categories with each class consisting of at least 80 images. Following [33], we extract the so-called ScSPM feature from each image. Specifically, we first normalize each image by keeping the same aspect ratio and setting the maximum size after normalization as 300 × 300 pixels. We then extract the dense SIFT descriptors from 16 × 16 patches over a grid with the spacing of six pixels. Then, we employ the sparse representation and maximum pooling based techniques in [33] 3 Available at http://people.csail.mit.edu/torralba/code/spatialenvelope/gist. zip.

to extract the ScSPM feature, in which we use a codebook with the size of 1024 and three levels of pyramids. Finally, each image is represented as a 21 504-dimensional feature vector. Similar as in the NUS-WIDE dataset, we randomly sample 10% images from each class to construct the query set. The remaining images are used as the training and database images. In AGH [11], OKH [7], and our methods SEH and KSEH, we use m = p = 300 cluster centers from k-means clustering as the anchor points, and we also set the number of nearest anchor points s as 5. While better retrieval performance can be achieved by using larger m and p, the computational costs also increase. In this paper, we fix m = p = 300, which leads to a good tradeoff between accuracy and speed as suggested in [9] and [11]. We use the RBF kernel k(xi , xj ) = exp(−λxi − xj 2 ) on CIFAR, Tiny-580K and NUS-WIDE datasets and the Laplacian kernel k(xi , xj ) = √ exp(− λxi − xj ) on the Caltech-256 dataset, because it generally achieves better  performance on this dataset. In this paper, we set λ = n2 / ni,j=1 xi − xj 2 for all the methods for fair comparison. We also repeat the experiments for five times using different random training image/query image partitions and report the average performance from five rounds of experiments. We determine the best parameters (i.e., μ and γg ) of our methods KSEH and SEH in the first round of experiment, in which μ and γg are chosen from the set of {10−9 , 10−6 , 10−4 , 10−3 , 10−2 , 1, 102 , 103 , 104 , 106 , 109 }. The other parameters in our methods are all fixed. C. Results Using Nearest Neighbors in Euclidean Space as True Neighbors Considering the images in Tiny-580K dataset do not have ground-truth labels, we evaluate the performances of all methods on this dataset by using the nearest neighbors in the Euclidean space as the true neighbors. We note the ITQ technique can be combined with KSEH, SEH, OKH, PCAH and KPCAH, so we report the results of these methods before using the ITQ technique and after using the ITQ technique in Fig. 1(a) and (b), respectively, in which these methods are referred to as KSEH-wo-ITQ, SEH-wo-ITQ, OKH-wo-ITQ, PCAH-wo-ITQ, and KPCAH-wo-ITQ before using the ITQ technique. In Fig. 1(c), we compare KSEH and SEH with the rest methods (i.e., 2-AGH, SH, LSH, KLSH and SKLSH) that

CHEN et al.: SPECTRAL EMBEDDED HASHING FOR SCALABLE IMAGE RETRIEVAL

1187

Fig. 2. Performance comparison of different algorithms on three datasets by using the semantic neighbors as the true neighbors. (a), (d), (g) Comparison before using the ITQ technique. (b), (e), (h) Comparison after using the ITQ technique. (c), (f), (i) Comparison of our methods KSEH and SEH with the rest methods (i.e., LSH, KLSH, SKLSH, SH, and 2-AGH) which cannot be readily combined with the ITQ technique. (a)–(c) CIFAR. (d)–(f) NUS-WIDE. (g)–(i) Caltech-256.

cannot be readily combined with the ITQ technique. From Fig. 1, we have the following observations. 1) Our method KSEH-wo-ITQ is generally the best [Fig. 1(a)]. Also, we are more interested in the results after using the ITQ technique because most of the methods become much better after using the ITQ technique. Our method KSEH, which can be readily combined with the ITQ technique, generally achieves the best results [Fig. 1(b)]. While OKH outperforms other methods PCAH, KPCAH and our methods SEH and KSEH when using 8 and 16 bits, its best performance is much worse than our method KSEH. 2) The current state-of-the-art methods KPCAH and PCAH also perform well on this dataset and KPCAH becomes better than PCAH when using more bits. However, both KPCAH and PCAH can only achieve comparable performances with our SEH and they are consistently worse than our KSEH.

3) When compared with the rest methods (i.e., 2-AGH, SH, LSH, KLSH and SKLSH) in Fig. 1(c), our KSEH and SEH are generally much better than these methods. 4) It is also clear that KSEH-wo-ITQ and KSEH are consistently better than SEH-wo-ITQ and SEH, respectively, which demonstrate that it is beneficial to employ the kernel based hashing method KSEH for image retrieval. D. Results Using the Semantic Neighbors as the True Neighbors As aforementioned, the end users are always more interested in retrieving the database images with the same semantic label as the query image. Therefore, we focus on the performance evaluation of different methods on three datasets (i.e., CIFAR, NUS-WIDE, and Caltech-256) by using semantic neighbors as the true neighbors. For the Caltech-256 dataset, the number of samples is large and the feature dimension is also very high; thus, it is very time consuming to conduct PCAH and SEH as

1188

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 7, JULY 2014

Fig. 3. Top 15 returned images on the CIFAR dataset using different hashing algorithms with 128 bits. The left thumbnail image is the query image from the class frog. Incorrect results from different classes are highlighted by red boxes. Best viewed in color.

well as PCAH-wo-ITQ and SEH-wo-ITQ. Therefore, we do not report the results of PCAH and SEH as well as PCAHwo-ITQ and SEH-wo-ITQ on this dataset. The results are shown in Fig. 2. Specifically, we compare our methods KSEH-wo-ITQ and SEH-wo-ITQ with OKHwo-ITQ, PCAH-wo-ITQ, and KPCAH-wo-ITQ in Fig. 2(a), (d), and (g), and we compare our methods KSEH and SEH with OKH, PCAH, and KPCAH in Fig. 2(b), (e), and (h). In Fig. 2(c), (f), and (i), we compare KSEH and SEH with 2-AGH, SH, LSH, KLSH, and SKLSH. From Fig. 2, we have the following observations. 1) Without using the ITQ technique, KSEH-wo-ITQ is generally the best [Fig. 2(a), (d), (g)]. From Fig. 2(b), (e), and (h), we observe that the results of all the methods can be generally improved after using the ITQ technique. It is also clear that our methods KSEH and SEH generally outperform the existing hashing methods PCAH, KPCAH, OKH on CIFAR, NUS-WIDE and Caltech-256 datasets. Moreover, the results of KSEH and SEH generally become better when using more bits. 2) The performances of OKH on the CIFAR and NUSWIDE datasets generally drop when using more bits. Recall that in OKH the resultant hamming embedding of training samples is constrained to be the low dimensional data representation (27). When the number of bits increases, it becomes more difficult to enforce such a hard constraint on the resultant hamming embedding, which may degrade the retrieval results of OKH.

3) The performance of 2-AGH also decreases on CIFAR and NUS-WIDE datasets when using more bits and the best result is achieved when using a small number of bits. Similar results were also reported in their original work in [11]. 4) KSEH generally outperforms SEH on the CIFAR and NUS-WIDE datasets, which again demonstrates that it is beneficial to employ the kernel based hashing method KSEH for image retrieval. Fig. 3 shows the top 15 returned images on the CIFAR dataset using different algorithms, in which we set the number of bits as 128 for all the hashing methods. The left thumbnail image is the query image from the class frog. It is clear that the top ranked images using our KSEH and SEH are generally better than those returned by other existing hashing algorithms. E. Results of Training Time We take the Tiny-580K dataset as an example to report the training CPU time of KSEH, SEH, 2-AGH, OKH, KPCAH, PCAH, and SH in Fig. 4, in which all the experiments are conducted on an IBM server with 3.06-GHz multicore processors, and we set the number of bits as 32. All the methods generally scale linearly with respect to the number of training data. While our methods KSEH and SEH are slower than 2-AGH, OKH, SH and PCAH, they are faster than KPCAH and both KSEH and SEH can be employed for the practical applications in terms of the training time. In particular, we observe KSEH is faster than SEH, which demonstrates that it is

CHEN et al.: SPECTRAL EMBEDDED HASHING FOR SCALABLE IMAGE RETRIEVAL

1189

formulation as in (12). The encoding times of 2-AGH, OKH, and KSEH are also close to each other as their hash functions share the same formulation as in (20). Note that 2-AGH, OKH, and KSEH are slower than PCAH and SEH, because of the additional cost for computing the kernel between the query point and the anchor points. SH is slower than PCAH and SEH due to the additional cost for evaluating the sine function. KPCAH is generally the slowest, because it needs to compute the random Fourier features. Thus, it becomes slower when the dimension of the random Fourier feature becomes larger. In our experiments, we set the dimension of random Fourier feature to be 3000, as suggested in [1]. V. Conclusion

Fig. 4. Training CPU time (in seconds) of different hashing methods with respect to the number of training data, in which we set the number of bits as 32.

In this paper, we have proposed SEH and its kernel extension KSEH to learn compact hashing codes. We introduce a new regularizer into the objective function of SH, which penalizes the mismatch between the resultant hamming embedding and the low dimensional data representation obtained by using a linear or nonlinear regression function. We have also developed an efficient algorithm to solve the eigenvalue decomposition problem in SEH and KSEH and showed that a few existing hashing methods are special cases of our KSEH. Our extensive experiments using four datasets clearly demonstrate that our methods SEH and KSEH generally achieve better image retrieval performances when compared with the existing hashing methods. References

Fig. 5. Encoding CPU time (in seconds) of different hashing methods with respect to the number of bits.

also beneficial to utilize KSEH to improve the efficiency. If the feature dimension becomes higher, the efficiency improvement from KSEH over SEH will be more significant. While we are also aware that the random sampling based methods such as LSH and KLSH can be faster than all the other methods (we omit both algorithms in Fig. 4), their retrieval performances are usually much worse (Figs. 1, 2). F. Results of Encoding Time For a given query image, we need to compute its hamming embedding based on a given hashing model, and we refer to the corresponding time cost as the encoding time. Fig. 5 shows the encoding time with respect to the number of bits for KSEH, SEH, 2-AGH, OKH, KPCAH, PCAH and SH on the Tiny580K dataset, where the encoding time for each algorithm is the average time over 1000 independent queries. From the average CPU time, we observe that SEH and PCAH have similar encoding time, because their hash functions share the same

[1] Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin, “Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval,” IEEE Trans. Pattern Anal. Mach. Intell., 2012, in press. [2] A. Gordo, F. Perronnin, Y. Gong, and S. Lazebnik, “Asymmetric distances for binary embeddings,” in Proc. IEEE Int. Conf. Comput. Vision Pattern Recognit., 2011, pp. 729–736. [3] J.-P. Heo, Y. Lee, J. He, S.-F. Chang, and S.-E. Yoon, “Spherical hashing,” in Proc. IEEE Int. Conf. Comput. Vision, 2012, pp. 2957–2964. [4] W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang, “Supervised hashing with kernels,” in Proc. IEEE Int. Conf. Comput. Vision Pattern Recognit., 2012, pp. 2074–2081. [5] Y. Weiss, R. Fergus, and A. Torralba, “Multidimensional spectral hashing,” in Proc. Eur. Conf. Comput. Vision, 2012, pp. 340–353. [6] D. Gorisse, M. Cord, and F. Precioso, “Locality-sensitive hashing scheme for chi2 distance,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 2, pp. 402–410, Feb. 2012. [7] J. He, W. Liu, and S.-F. Chang, “Scalable similarity search with optimized kernel hashing,” in Proc. ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining, 2010, pp. 1129–1138. [8] A. Joly and O. Buisson, “Random maximum margin hashing,” in Proc. IEEE Int. Conf. Comput. Vision Pattern Recognit., 2011, pp. 873–880. [9] B. Kulis and K. Grauman, “Kernelized locality-sensitive hashing for scalable image search,” in Proc. IEEE Int. Conf. Comput. Vision, 2009, pp. 2130–2137. [10] R.-S. Lin, D. A. Ross, and J. Yagnik, “SPEC hashing: Similarity preserving algorithm for entropy-based coding,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., 2010, pp. 848–854. [11] W. Liu, J. Wang, S. Kumar, and S.-F. Chang, “Hashing with graph,” in Proc. Int. Conf. Mach. Learning, 2011, pp. 1–8. [12] J. Wang, S. Kumar, and S.-F. Chang, “Semi-supervised hashing for scalable image retrieval,” in Proc. IEEE Int. Conf. Comput. Vision Pattern Recognit., 2010, pp. 3424–3431. [13] K. Min, L. Yang, J. Wright, L. Wu, X.-S. Hua, and Y. Ma, “Compact projection: Simple and efficient near neighbor search with practical memory requirements,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., 2010, pp. 3477–3484.

1190

[14] J. Wang, S. Kumar, and S.-F. Chang, “Sequential projection learning for hashing with compact codes,” in Proc. Int. Cof. Mach. Learning, 2011, pp. 1127–1134. [15] H. Xu, J. Wang, Z. Li, G. Zeng, S. Li, and N. Yu, “Complementary hashing for approximate nearest neighbor search,” in Proc. IEEE Int. Conf. Comput. Vision, 2011, pp. 1631–1638. [16] X. Zhu, Z. Huang, H. Cheng, J. Cui, and H. T. Shen, “Sparse hashing for fast multimedia search,” ACM Trans. Inform. Syst., vol. 31, no. 2, pp. 9:1–9:24, 2013. [17] P. Indyk and R. Motwani, “Approximate nearest neighbors: Toward removing the curse of dimensionality,” in Proc. STOC, 1998, pp. 604–631. [18] A. Torralba, R. Fergus, and Y. Weiss, “Small codes and large image databases for recognition,” in Proc. IEEE Int. Conf. Comput. Vision Pattern Recognit., 2008, pp. 1–8. [19] M. Norouzi and D. Fleet, “Minimal loss hashing for compact binary codes,” in Proc. Int. Conf. Mach. Learning, 2011, pp. 353–360. [20] J. Song, Y. Yang, Z. Huang, H. T. Shen, and R. Hong, “Multiple feature hashing for real-time large scale near-duplicate video retrieval,” in Proc. ACM Multimedia, 2011, pp. 423–432. [21] Y. Weiss, A. Torralba, and R. Fergus, “Spectral hashing,” in Proc. Advances Neural Inform. Process. Syst., vol. 21. 2008, pp. 1753–1760. [22] F. Nie, Z. Zeng, I. W. Tsang, D. Xu, and C. Zhang, “Spectral embedded clustering: A framework for in-sample and out-of-sample spectral clustering,” IEEE Trans. Neural Netw., vol. 22, no. 11, pp. 1796–1808, Nov. 2011. [23] J. Ye, Z. Zhao, and M. Wu, “Discriminative k-means for clustering,” in Proc. Advances Neural Inform. Process. Syst., vol. 20. 2008, pp. 1649–1656. [24] B. Sch¨olkopf and A. J. Smola, Learning With Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (Adaptive Computation and Machine Learning). Cambridge, MA, USA: MIT Press, 2002. [25] W. Liu, J. He, and S.-F. Chang, “Large graph construction for scalable semi-supervised learning,” in Proc. Int. Conf. Mach. Learning, 2010, pp. 679–686. [26] C. Williams and M. Seeger, “Using the Nystr¨om method to speed up kernel machines,” in Proc. Advances Neural Inform. Process. Syst., vol. 13. 2001, pp. 682–688. [27] J. Song, Y. Yang, Z. Huang, H. Shen, and J. Luo, “Effective multiple feature hashing for large-scale near-duplicate video retrieval,” IEEE Trans. Multimedia, 2013, in press. [28] A. Torralba, R. Fergus, and W. T. Freeman, “80 million tiny images: A large dataset for non-parametric object and scene recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 11, pp. 1958–1970, Nov. 2008. [29] T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y.-T. Zheng, “NUSWIDE: A real-world web image database from National University of Singapore,” in Proc. ACM Conf. Image Video Retrieval, 2009, pp. 1–9. [30] G. Griffin, A. Holub, and P. Perona. (2007). “Caltech-256 object category dataset,” Dept. Eng. Appl. Sci., California Instit. Technol., Pasadena, CA, USA, Tech. Rep. 7694 [Online]. Available: http://authors.library.caltech.edu/7694 [31] M. Raginsky and S. Lazebnik, “Locality-sensitive binary codes from shift-invariant kernels,” in Proc. Advances Nerual Inform. Process. Syst., vol. 22. 2009, pp. 1509–1517. [32] A. Oliva and A. Torralba, “Modeling the shape of the scene: A holistic representation of the spatial envelope,” Int. J. Comput. Vision, vol. 42, no. 3, pp. 145–175, 2001. [33] J. Yang, K. Yu, Y. Gong, and T. S. Huang, “Linear spatial pyramid matching using sparse coding for image classification,” in Proc. IEEE Int. Conf. Comput. Vision Pattern Recognit., 2009, pp. 1794–1801.

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 7, JULY 2014

Lin Chen received the B.E. degree from the University of Science and Technology of China, Hefei, China, in 2009. He is currently pursuing the Ph.D. degree at the School of Computer Engineering, Nanyang Technological University, Singapore. His current research interests include computer vision and machine learning, especially the machine learning techniques for large scale computer vision tasks, such as image/video retrieval/classification.

Dong Xu (M’07–SM’13) received the B.E. and Ph.D. degrees from the University of Science and Technology of China, Hefei, China, in 2001 and 2005, respectively. He was with Microsoft Research Asia, Beijing, China, and the Chinese University of Hong Kong, Shatin, Hong Kong, for more than two years. He was a Post-Doctoral Research Scientist with Columbia University, New York, NY, USA, for one year. He is currently an Associate Professor with the School of Computer Engineering, Nanyang Technological University, Singapore. His current research interests include computer vision, statistical learning, and multimedia content analysis. Dr. Xu was the co-author of a paper that won the Best Student Paper Award in the prestigious IEEE International Conference on Computer Vision and Pattern Recognition in 2010.

Ivor Wai-Hung Tsang received the Ph.D. degree in computer science from the Hong Kong University of Science and Technology, Kowloon, Hong Kong, in 2007. He is currently an Assistant Professor with the School of Computer Engineering, Nanyang Technological University (NTU), Singapore. He is the Deputy Director of the Center for Computational Intelligence, NTU. Dr. Tsang was the recipient of the Natural Science Award (Class II) in 2008, China, which recognized his contributions to kernel methods, and the prestigious IEEE Transactions on Neural Networks Outstanding 2004 Paper Award in 2006. He also received a number of Best Paper Awards and Honors from reputable international conferences, including the Best Student Paper Award at CVPR 2010, the Best Paper Award at ICTAI 2011, and the Best Poster Award Honorable Mention at ACML 2012. He was also the recipient of the Microsoft Fellowship 2005 and the ECCV 2012 Outstanding Reviewer Award.

Xuelong Li (M’02–SM’07–F’12) is currently a Full Professor with the Center for Optical Imagery Analysis and Learning (OPTIMAL), State Key Laboratory of Transient Optics and Photonics, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an, Shaanxi, China.

Spectral embedded hashing for scalable image retrieval.

We propose a new graph based hashing method called spectral embedded hashing (SEH) for large-scale image retrieval. We first introduce a new regulariz...
12MB Sizes 0 Downloads 3 Views