1292

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 8, AUGUST 2013

Ranking Graph Embedding for Learning to Rerank Yanwei Pang, Senior Member, IEEE, Zhong Ji, Peiguang Jing, and Xuelong Li, Fellow, IEEE

Abstract— Dimensionality reduction is a key step to improving the generalization ability of reranking in image search. However, existing dimensionality reduction methods are typically designed for classification, clustering, and visualization, rather than for the task of learning to rank. Without using of ranking information such as relevance degree labels, direct utilization of conventional dimensionality reduction methods in ranking tasks generally cannot achieve the best performance. In this paper, we show that introducing ranking information into dimensionality reduction significantly increases the performance of image search reranking. The proposed method transforms graph embedding, a general framework of dimensionality reduction, into ranking graph embedding (RANGE) by modeling the global structure and the local relationships in and between different relevance degree sets, respectively. The proposed method also defines three types of edge weight assignment between two nodes: binary, reconstruction, and global. In addition, a novel principal components analysis based similarity calculation method is presented in the stage of global graph construction. Extensive experimental results on the MSRA-MM database demonstrate the effectiveness and superiority of the proposed RANGE method and the image search reranking framework. Index Terms— Dimensionality reduction, graph embedding, image search reranking, learning to rank.

I. I NTRODUCTION

T

HE exponential growth of image content has led to flourishing research and commercial activities in image search. Nowadays, most popular image search engines, such as Google, Bing, and Yahoo!, still depend on text-based search techniques. The general approach is to utilize the metadata associated with media contents as features, and then employ some well-known information retrieval methods, such as term frequency-inverse document frequency (TF-IDF) [7] and Okapi BM25 [29] to rank the images. However, this kind of image search approaches often return some noisy results on the top of the ranking list since the text cannot entirely reflect the image visual contents. To address this problem, a new promising direction has emerged

Manuscript received May 13, 2012; revised December 29, 2012; accepted February 22, 2013. Date of publication April 26, 2013; date of current version June 28, 2013. This work was supported in part by the National Basic Research Program of China (973 Program) under Grant 2012CB316400, the National Natural Science Foundation of China under Grant 61271325, Grant 61125106, Grant 61222109, and Grant 91120302, and the Shaanxi Key Innovation Team of Science and Technology under Grant 2012KCT-04. Y. Pang, Z. Ji, and P. G. Jing are with the School of Electronic Information Engineering, Tianjin University, Tianjin 300072, China (e-mail: [email protected]; [email protected]; [email protected]). X. Li is with the Center for OPTical IMagery Analysis and Learning, State Key Laboratory of Transient Optics and Photonics, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, China (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2013.2253798

in recent years, namely image search reranking, which applies visual information to reorder the text-based search results to enhance search performance [6], [13], [39], [46], [50], [51]. Image search reranking is a new paradigm followed by content-based image retrieval (CBIR) [34], [35], [48] and content-based image annotation [36], [37], [43], [44] in the domain of image content analysis and retrieval. It is defined as a process of reordering the images based on the initial text search results by incorporating visual information, aiming at refining the search results. Over the past decade, much work has been done on this topic, among which the approach employing the learning-torank technique is one of the most promising one [26], [51]. Compared to conventional classification-based approaches, learning-to-rank-based approaches are essentially more effective in mining ordinal relationships [51]. Generally speaking, this kind of approach first extracts the image’s visual features from the initial search results, and then builds a ranking function with the labeled training data, and finally reorders the images with the ranking function. For example, Yang et al. [51] were the first one to employ the learning-to-rank technique in the search reranking task. Ranking SVM [16] and ListNet [4] (two popular learning-to-rank algorithms) are employed to build the ranking function and then rerank the initial search results. Feature dimensionality reduction is an important step in learning-to-rank-based image search reranking and other multimedia retrieval applications, mainly due to the high dimension of the original visual features [6], [13], [26], [50]. For example, Duan et al. [6] employed 426-D low-level visual features; Yao et al. [50] used 2000-D bag-of-visual-words (BOW) features; and Liu et al. [26] used 1271-D image features, including ColorMoment, HSV color histogram, edge distribution histogram, etc. Typically, high-dimensional data will cause the “curse of dimensionality” and impose heavy burden on computation and storage. Dimensionality reduction is an effective technique to overcome or alleviate these problems. Popular methods include principal component analysis (PCA) [5], linear discriminant analysis (LDA) [5], locally linear embedding (LLE) [28], locality preserving projections (LPP) [14], etc. Yan et al. [49] proved that most algorithms can be unified within a graph embedding framework where the desired statistical or geometric data properties of a given algorithm are encoded as graph relationships. Our proposed ranking graph embedding (RANGE) dimensionality reduction algorithm is developed from this framework. Unfortunately, the great difference between learning to rank and classification for feature dimensionality reduction techniques is ignored in the state-of-the-art approaches. Generally, most dimensionality reduction methods are designed

2162-237X/$31.00 © 2013 IEEE

PANG et al.: RANKING GRAPH EMBEDDING FOR LEARNING TO RERANK

Fig. 1. Scheme for RANGE-based image visual search reranking illustrated with the query “horses”.

for classification tasks, not for ranking tasks. There are only two states (“0,” “1” or “−1,” “+1”) for the former, whereas there are more than two states for the latter. For example, the images are usually labeled as “very relevant”, “relevant,” and “irrelevant” with “2,” “1,” and “0”. In fact, the great differences lie not only in the label of states but also in the relationships between the states. For instance, the labels of “0” and “1” are contrary states in a classification task. However, they are similar states to some extent in a ranking task, since both have the irrelevant factors to the query. Further, there are also some similar characteristics for the images with the label of very relevant and relevant, since both of them have the relevant factors to the query. The complicated relationships between different relevance degrees for ranking tasks make it significantly different from classification tasks. Therefore, direct utilization of current dimensionality reduction techniques to a learning-to-rank scheme cannot achieve best performance. According to above considerations, we propose in this paper a novel dimensionality reduction algorithm RANGE to improve the ranking performance in image reranking task. The RANGE algorithm takes the relationships between different relevance degrees into account. In this paper, we specifically focus on the linear dimensionality reduction technique mainly due to its effectiveness, efficiency, and solid mathematical background. The proposed RANGE-based image search reranking framework comprises four stages: 1) feature extraction; 2) relevance labeling; 3) dimensionality reduction with the proposed method of RANGE; and 4) reranking with the ranking function, as illustrated in Fig. 1. When a query term such as “horses” is submitted to the web image search engine, an initial text-based search result is returned to the user (only some top images are given for illustration). The result is unsatisfactory because some human and scene images are retrieved as top results. To rerank these images, original high-dimensional features are first extracted to represent their visual contents. Then, since there is usually no explicit training data, a manual labeling or pseudo-relevance feedback mechanism is adopted to label some data with relevance degrees. Next, both labeled and unlabeled data are exploited in the proposed RANGE algorithm to map the image features

1293

into an intrinsically low-dimensional space. Finally, a learningto-rank algorithm (for example, Ranking SVM [16]) is employed to use the labeled data as training data to build a ranking function, which reranks all the images with the reduced low-dimensional features. From the above four stages, we summarize the following contributions of the paper which will be explained in detail in the subsequent sections. 1) To the best of our knowledge, there has been little previous work that considers ranking information in dimensionality reduction. We propose the novel RANGE algorithm by extending graph embedding with relevance degree labels information. Extensive experiments prove that it outperforms traditional feature dimensionality reduction algorithms for image search reranking task. 2) A new semisupervised feature dimensionality reductionbased image search reranking framework is proposed, which achieves comparable performance but preserves the superiority of low-dimensional features. 3) We develop a PCA-based adaptive edge weight assignment method in graph construction, which is effective and free of ad hoc parameters. 4) We divide the edge weight assignment into three types: binary, reconstructive, and global. Binary edge weight assignment is used for modeling the relationships between two very relevant nodes, two relevant nodes, two irrelevant nodes, and a very relevant node and an irrelevant node. The reconstruction one is used for modeling the relationships between a relevant node and a very relevant node, a relevant node and an irrelevant node. The global one is used for modeling the “intrinsic structure” over the whole examples. These three types of edge weight assignment completely contain the relevance degree labels information. The remainder of this paper is organized as follows. In Section II, we review some related works about visual search reranking and feature dimensionality reduction methods. Section III gives a brief review of graph embedding. Section IV presents the proposed RANGE dimensionality reduction algorithm in detail. Experimental results in image search reranking are given in Section V. Finally, we conclude this paper in Section VI. II. R ELATED W ORKS To the best of our knowledge, there has been little previous research that considers ranking information in dimensionality reduction. Recently, Tian et al. [33] proposed a sparse transfer learning dimensionality reduction method for visual reranking. Though it is effective, it does not utilize the ranking information. Therefore, in this section, we give a brief review of two related domains: visual search reranking and dimensionality reduction. A. Visual Search Reranking Since visual search reranking is a combination of text-based search and content-based image/video retrieval, it integrates the characteristics of real time and accuracy. Therefore, it

1294

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 8, AUGUST 2013

has great importance in establishing a practical image search system. Current approaches can be roughly categorized into three: unsupervised reranking [17], [32], [46], [50]; supervised reranking [24], [38], [41]; and learning-to-rank-based reranking [13], [26], [51]. Previous approaches to the visual search reranking problem predominantly try to employ unsupervised and supervised methods. Unsupervised reranking methods usually adopt clustering algorithms and graph-based algorithms to discover visual patterns that have high visual similarity. For instance, Wei et al. [46] and Yao et al. [50] used normalized cuts clustering algorithm and random walk algorithm, respectively, to refine the initial performance. Supervised reranking methods first train a classifier using the training data directly from the initial search results, and then reorder all the documents by the relevance scores predicted from the classifier. Support vector machine (SVM) is typically employed in this kind of methods [24], [38], [41]. The training data is generally obtained from human label [38], [41] or pseudo-relevance feedback [24], which assumes the top-ranked documents to be more relevant than the behind ones. In recent works, learning-to-rank-based methods demonstrate its superiority against conventional classification-based ones [13], [26], [51]. Different from the previous works which regarded the “classification performance” as the optimization objective, this kind of methods directly takes “ranking performance” as the optimization objective. Therefore, the learningto-rank technique is a powerful method widely applied in reranking applications. As an active research topic in the fields of information retrieval and machine learning, learning to rank aims to build a ranking function to precisely predict the relevance scores of test data [25]. Yang et al. [51] exploited ranking SVM and ListNet for reranking by learning the co-occurrence patterns between target semantics and features extracted from the initial list. Based on the observation that the ranks of different documents for a query are interdependent, Geng et al. [13] viewed the ranking as a structure output problem. To this end, based on learning to rank, they proposed a ranking model with large margin structured output learning, in which both textual and visual information are simultaneously leveraged in the ranking learning process. Liu et al. [26] discussed the differences of classification and ranking in detail, and presented two novel pairwise reranking models by formulating reranking as an optimization problem. They first converted the individual documents into “document pairs,” then found the optimal document pairs and their relevant relation with the proposed pairwise reranking models, and, finally, adopted a specially designed round robin criterion to recover the final ranked list. Furthermore, there has been some recent work beyond the two categories mentioned above. For instance, Zhang et al. [52] introduced the particle swarm optimization (PSO) mechanism and viewed the reranking as a mapping process from the initial text search list to the objective ranked list. Results diversity is also taken into account in visual search reranking [19], [45]. Wang et al. [45] studied the joint optimization of search relevance and diversity. Specifically, they proved that relevance reranking can be regarded as the process

of optimizing the mathematical expectation of the conventional average precision measure, whereas diversity reranking can be viewed as an optimization process to a new average diversity precision measure. B. Feature Dimensionality Reduction Dimensionality reduction has been successfully employed in applications of pattern recognition, computer vision, and multimedia retrieval, due to the fact that high-dimensional data often intrinsically lie in a low-dimensional subspace in the real world [14], [18], [42], [47], [49]. It plays an important role in overcoming the crucial “curse of dimensionality” problems and reducing the heavy burden of storage and computation brought by the original high-dimensional data. Many dimensionality reduction methods have been proposed in recent decades. Popular approaches include subspace-based methods [3], [5], [21], manifold-based methods [1], [14], [28], [42], [49], nonnegative matrix factorization-based methods [8]–[12], etc. In image retrieval and ranking applications, labeled data is much fewer than unlabeled data. Moreover, the significance of relevant and irrelevant examples is unequal, and their data structures are also different. For example, the irrelevant images are scattered over the whole space while the relevant images are not. Therefore, specially designed dimensionality reduction methods have been developed to address this kind of situation in recent years. For example, Bian et al. [2] proposed a biased discriminative Euclidean embedding (BDEE) method for CBIR, which parameterizes samples in the original highdimensional space to discover the intrinsic coordinate of image low-level visual features. In [39], a novel local-global discriminative (LGD) dimensionality reduction algorithm was proposed for the task of image search reranking, in which a submanifold was learned by transferring the local geometry and the discriminative information from the labeled images to the whole (global) image database. Within the relevance feedback framework, He et al. [15] presented a maximum margin projection (MMP) algorithm for CBIR, which can discover the local manifold structure by maximizing the margin between positive and negative examples at each local neighborhood. These approaches have been successfully applied to some standard datasets and generated satisfying results. III. R EVIEW OF G RAPH E MBEDDING F RAMEWORK Graph embedding unifies many popular dimensionality reduction algorithms (e.g., PCA, LDA, ISOMAP, LLE, LPP) within a general framework. This framework provides significant insights into the relationship among these algorithms. Since our proposed method is based on the framework of graph embedding [49], we briefly describe the main idea of the framework in this section. Given an undirected weighted graph G = (V, E, S), each instance xi (1 ≤ i ≤ n) represents a node v ∈ V , edges e ∈ E belong to V × V , and S is a similarity edge weight matrix assigning values to each edge. The matrices G and S can be defined to describe certain desired statistical or geometrical properties of the dataset. Graph embedding aims to determine

PANG et al.: RANKING GRAPH EMBEDDING FOR LEARNING TO RERANK

a low-dimensional representation Y = [y1 , . . . , yn ] of the node set V , where the column vector yi is the embedding for the vertex xi that preserves the similarities between pairs of data points in the original high-dimensional space. The similarity is measured by the edge weight. Direct graph embedding aims to maintain similarities among node pairs according to the graph preserving criterion [49] as follows:   yi − y j 2 Si j Y ∗ = arg min T tr(YBY )=c i= j = arg

min

tr(YBYT )=c

tr(YLYT )

(1)

where c is a constant, tr(·) is the trace of an arbitrary square matrix, and B is the constraint matrix, which may simply be a diagonal matrix used for scale normalization or may express more general constraints among vertices in a penalty graph G  . The penalty graph describes similarities between nodes that are unfavorable and should be avoided. L is the Laplacian matrix of a graph, which is defined as L=D−S

(2) 

where D is a diagonal matrix, and Dii = j =i Si j ∀i . Equation (1) can be solved by converting it into the following ratio formulation: Y∗ = arg min Y

tr(YLY T ) tr(YBYT )

.

(3)

If the constraint matrix B represents only scale normalization, this ratio formulation can be directly solved by eigenvalue decomposition. However, for a more general constraint matrix, it can be approximately solved with generalized eigenvalue decomposition by transforming the objective function into a more tractable approximate form maxY tr((YLY T )−1 (YBYT )). The extensions of graph embedding framework, such as its linearization, kernelization, and tensorization, can be found in [49]. IV. P ROPOSED RANGE A LGORITHM In this section, we first introduce some notations, and then present the proposed RANGE algorithm aiming at discovering the intrinsic low-dimensional representations of the multimedia data in rank applications. Let {x i , z i }(i = 1, . . . , l) be a set of l labeled instances, where x i ∈ R D denotes the input data instances, and z i ∈ {0, . . . , r − 1} denotes the corresponding relevance degree label. In addition to the labeled instances, let x i (i = l + 1, . . . , n) be a set of (n − l) unlabeled instances. The aim of the RANGE algorithm is to find a transformation matrix W that can map X = [x1 , . . . , xn ] ∈ R D×n to lowdimensional vectors Y = [y1 , . . . , yn ] ∈ Rd×n (d  D) with both the labeled and unlabeled instances, i.e., Y = WT X. Therefore, RANGE can be viewed as a semisupervised linear dimensionality reduction algorithm. Without loss of generality, we consider only the case of r = 3, and z i = {0, 1, 2} with 0, 1, 2 standing for irrelevant, relevant, and very relevant, respectively. Moreover, we denote

1295

Fig. 2.

Data relationships in and between different relevance degrees.

the datasets with labels 2, 1, 0 by A, B, and C, respectively, and the corresponding sets with labeled instances with SA , SB , SC , respectively. We label s instances for each set. So it is easy to see that l = 3s. Fig. 2 illustrates the data relationships in and between different relevance degrees. As mentioned above, without loss of generality, we consider only the three-level relevance degree label method. It can be easily concluded that there are six possible kinds of relationships: AA, AC, CC, BC, BB, and AB, between any pair of nodes. For example, AA means both nodes are very relevant to the query, and AC means one node is very relevant to the query and the other one is irrelevant to the query. Similar to [14], to avoid the singularity problem and reduce noise disturbance, PCA is first adopted to project X into a subspace by discarding the smallest principal components to maintain 98% of the energy. For convenience, we still use xi to denote the data examples in the PCA subspace in the following steps. A. Composition of Objective Function and Diverse Edge Weights Since the data characteristic in image search reranking is different from that in classification applications, we design a novel objective function to model the relationships among the data examples. The objective function J is composed of three parts: J1 , J2 , and J3 . Specifically, J1 is composed of the objective functions of J A A , J AC , JCC , and J B B , which model the local relationships in AA, AC, CC, and BB, respectively. The objective function of J2 models the local relationships in BC and AB. J1 and J2 are both designed for the labeled data, which means that they are in a supervised form. However, since there is no sufficient labeled data relative to the number of dimensions in reality, the generalization capability on these training data cannot be guaranteed. In fact, the unlabeled data examples are easier to obtain, and are always used to estimate the intrinsic geometric structure of the data examples in semisupervised methods [18], [31]. Therefore, we use J3 to model the data structure with both the labeled and unlabeled data examples. We will show in the following that all three parts of the objective function are based on the framework of graph. Because the relevance degree label information is introduced, we call the proposed algorithm as ranking graph embedding. It can be seen that the solution for RANGE is a multiobjective optimization problem. In addition to the reflection of different relationships, the difference between J1 , J2 , and J3 mainly lies in their edge

1296

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 8, AUGUST 2013

weight assignments. Edge weight assignment is one of the important steps in graph embedding methods, since it characterizes the data geometry. According to the data relationship characteristics, we design three kinds of edge weigh assignment approaches: binary assignment, reconstructive assignment, and global assignment, for J1 , J2 , and J3 , respectively. Fig. 3 illustrates the composition of objective function and diverse edge weight assignments. We will discuss them in detail in the following subsections. B. Objective Function of J1 With Binary Edge Weight The objective function of J1 models the data relationships in AA, AC, CC, and BB, and is represented by the combination of the objective functions of J A A , J B B , J AC , and JCC , which will be explained sequentially. Binary edge weight approach is employed in J1 , which means the edge weight between two nodes is set to be “1” if they are from the corresponding labeled sets in the corresponding objective function; otherwise it is set to be “0”. The data examples in set A are all very relevant to the query, that is to say, they have high visual similarities to each other. To reflect this kind of property, we set Si j = 1 for the data examples in SA , which assumes that x i is among the k nearest neighbors of x j , or x j is among the k nearest neighbors of x i . Therefore, the objective function JAA is defined as J A A = min

1 2

   yi − y j 2 .

(4)

x i ,x j ∈S A

The intuition behind (4) is to let the data examples in set SA in the transformed low-dimensional space to be as close as possible. It can be viewed as a special case of LPP [14], where all data examples are among the k nearest neighbors. By some simple algebra, the objective function J A A can be reduced to  1   yi − y j 2 2 xi ,x j ∈S A ⎛ ⎞  T 1 ⎝   T W xi − W T x j W T xi − W T x j ⎠ = tr 2 xi ,x j ∈S A ⎛ ⎞  T 1 ⎝  T = tr W xi − WT x j WT xi − WT x j SiAj A ⎠ 2 xi ,x j     = tr WT X D A A − S A A XT W   (5) = tr WT XLAA XT W AA where L A A = D A A − S A A is a Laplacian matrix, and Dii =  AA 1 xi , x j ∈ S A AA is an j Si j is a diagonal matrix. Si j = 0 otherwise edge weight matrix in a binary edge weight assignment form. As for the examples in set B, similar to those in set A, we also let them to be close in the intrinsic low-dimensional space. Considering they are just in a loose closeness state, we use a weight parameter α to control their closeness. The object

Fig. 3. ments.

Composition of objective function and diverse edge weight assign-

function J B B is defined as 1 J B B = α min 2

   yi − y j 2 .

(6)

yi ,y j ∈SB

Similar to J A A , the objective function can be reduced to    1   yi − y j 2 = tr WT XL B B XT W (7) 2 yi ,y j ∈SB

BB where L B B = D B B − S B B is a Laplacian matrix, and Dii =  BB 1 xi , x j ∈ S B BB is an j Si j is a diagonal matrix. Si j = 0 otherwise edge weight matrix in a binary edge weight assignment form. On the contrary, the data examples in set C, and set A between set C should be as far away as possible in the intrinsic low-dimensional space. Therefore, the objective functions of J AC and JCC are defined as follows:    1 yi − y j 2 J AC = max 2 y ∈S ,y ∈S i A j C or,yi ∈SC ,y j ∈SA   = max tr WT XL AC XT W  1   yi − y j 2 JCC = max 2 yi ,y j ∈SC   = max tr WT XLCC XT W (8) CC are where L AC = D AC − S AC and = DCC − S  LCC AC AC CC Laplacian matrices, and Dii = j Si j and Dii = j SCC ij 1 xi ∈ SA , x j ∈ SC AC AC are diagonal matrices. Si j = S j i = , 0 otherwise 1 x i , x j ∈ SC are edge weight matrices, both and SCC ij = 0 otherwise in a binary edge weight assignment form. Specifically, binary edge weight form can be divided into two types: relevant edge weight and irrelevant edge weight. The former refers to the case where the edge weight is set to be “1” if both nodes are relevant, and otherwise “0”, as in S A A and S B B . The latter refers to the edge weight which is set to “1” if both nodes are irrelevant, and otherwise to be “0”, as in S AC and SCC . Therefore, we get J1 = J A A + J B B − J AC − JCC .

C. Objective Function of J2 With Reconstructive Edge Weight J2 models the relationships in BC and AB. The label of “relevance” for data examples in set B reveals their nature of intermediate state to the query, which means that the

PANG et al.: RANKING GRAPH EMBEDDING FOR LEARNING TO RERANK

images share some kind of properties related to the query. Meanwhile, they also have some other properties beyond the query. Therefore, we unite the relationships in BC and AB as BAC, which regards examples in B as a tradeoff of set A and set C. For simplicity, we adopt the data examples in the sets of SA and SC to reconstruct them with the method of LLE [28]. The implementation of J B AC is as follows. 1) First, for data examples x i in set SB , we take their k-nearest neighbors kil from SA and SC , respectively, to construct the index set N(i ). Specifically, the distance between two examples are computed with a novel PCAbased method, which will be presented in the next subsection. 2) Second, data samples x i in set SB are constructed from the samples in set N(i ) in the original high-dimensional space  2 2k      min cil kil  (9) xi −  xi ∈S B ,xil ∈N(i)  l=1

where cil is the reconstructive coefficient, and  2k l=1 cil = 1. By some simple algebra, we   can get cil = 2k 2k 2k −1 −1 where Glt = t =1 Glt p=1 q=1 G pq , (xi − xil )T (xi − xit ) is a Gram matrix, xil , xit ∈ N(i ). 3) We then put these reconstructive coefficients into an n by n matrix. Let T = [ti j ]n×n be a zero matrix. If xil and x j (1 ≤ j ≤ n) are the same examples, we let ti j = cil . Therefore, T is a sparse matrix since 2k  n. 4) Finally, the idea of LLE algorithm is used to preserve the relationship between x i and N(i ) in the transformed low-dimensional space with the object function J2 2    n   1  yi − J2 = min ti j y j  (10)  .  yi ∈S B ,yl ∈Y 2   j =1 By some simple algebra, the objective function J2 can be reduced to 2    n   1 yi − ti j y j    2  j =1 ⎛ ⎛ ⎞⎛ ⎞T ⎞ n n    ⎟ ⎜ ⎝ yi − = tr ⎝ ti j y j ⎠ ⎝yi − ti j y j ⎠ ⎠ 

yi ∈S B ,yl ∈Y

j =1



j =1

= tr Y (H B − T)T (H B − T) YT   = tr WT X (H B − T)T (H B − T) XT W     = tr WT X HTB H B − HTB T − TT H B + TT T XT W   (11) = tr WT XL B AC XT W where L B AC = HTB H B − (HTB T + TT H B − TT T) = D B AC − S B AC , D B AC = HTB H B is a diagonal matrix, S B AC = HTB T + THTB − TT T is an edge weight matrix in a reconstructive way, where H B is a n-by-n diagonal matrix, and Hii = 1 yi ∈ S B . 0 otherwise

1297

D. Objective Function of J3 With Global Edge Weight The objective function of J3 is employed to estimate the intrinsic geometric structure of the data examples. It is defined as  1   yi − y j 2 Si j . (12) J3 = min 2 yi ,y j ∈Y

It can be written in a more compact form as    1   yi − y j 2 Si j = tr WT X (D − S) XT W 2 yi ,y j ∈Y   (13) = tr WT XLXT W where W is a transformation matrix, i.e., Y = WT X. L and D are the corresponding Laplacian matrix and diagonal matrix, respectively. S is an edge weight matrix for all the examples, which means that it is in a global form. The objective function of J3 is to ensure that, if x i and x j are “close”, then yi and y j are close as well, and vice versa. The similarity calculation plays an essential role in graph embedding method. Although the heat kernel is widely used in many techniques [14], [49], it is hard to determine its scale factor, which plays a significant role in the performance. To address this kind of drawback, we present a new PCAbased method to calculate the similarity between two nodes. Assume that xi is a p-dimensional instance after PCA projection. Since the components in the PCA subspace are ordered by the eigenvalue λ from the highest to the lowest, we can calculate their significance as  p  δm = λm λj , m = 1, . . . , p. (14) i=1

Then, the weight Si j of two nodes x i and x j can be defined as

p Si j = ϕm =

m=1 δm ϕm

⎧ ⎨

0

   x −x  −2 xim +x j m  im jm

e ⎩0

i = j

otherwise

x im + x j m = 0 otherwise

(15) (16)

where x im is the mth components of xi , and | · | stands for the operation of absolute value. Fig. 4 shows the impact tendency of δm with the increase of m. From Fig. 4 and (15), we can observe the following: 1) there is no parameter to be determined; 2) the smallest distance between x i and x j leads to the largest weight of Si j . On the contrary, the largest distance between x i and x j leads to the smallest weight of Si j . This property guarantees the similarity preservation characteristic of RANGE; 3) due to the weighted effect of δm , different components in the PCA subspace have different contributions to the weight definition adaptively. The principal components have more important impact on weight Si j than the components of less significance. In addition, there are typically two kinds of graph construction approaches. One is to construct a nearest-neighbor

1298

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 8, AUGUST 2013

TABLE I M AIN P ROCEDURE FOR THE RANGE A LGORITHM Input: Training examples: X = [x1 , . . . , xn ] ∈ R D×n and the labels z i (i = 1, . . . , l), parameters: reduced dimension d, labeled number s for each set, weight parameter α, nearest neighbors k. Output: Projection vectors: W = [w1 , . . . , wd ] ∈ R D×d .

Fig. 4.

Impact tendency of δm with increase of m.

graph as an approximation of the local manifold structure, which is widely adopted in manifold learning methods, such as LPP [14] and MLE [42]. The other way is to build a global graph as an approximation of the local manifold structure, which is employed in PCA, LDA, and so on. Actually, the local adjacency graph can be viewed as a special case of the global adjacency graph. In the global adjacency graph, if x i is beyond the neighborhood of x j , its weight Si j will incur a heavy penalty and its value can be approximately taken as zero, which is similar to the local adjacency graph. Consequently, we establish the global adjacency graph in our RANGE algorithm. E. Eigenmap As a result, we can get the final objective function J = J1 + J2 + J3 = J A A + J B B − J AC − JCC + J2 + J3     = tr WT XL A A XT W + αtr WT XL B B XT W     −tr WT XL AC XT W − tr WT XLCC XT W     +tr WT XL B AC XT W + tr WT XLXT W  = tr WT X (L A A + αL B B − L AC  −LCC + L B AC + L) XT W   (17) = tr WT XVXT W where V = L A A + αL B B − L AC − LCC + L B AC + L. Therefore, the solution for RANGE can be represented as the following constrained optimization problem:    arg min tr WT XVXT W W (18) s.t. WT XMXT W = E where M = D A A + αD B B − D AC − DCC + D B AC + D, E is an identity matrix. By denoting Q = S A A +αS B B − S AC − SCC + S B AC + S, we have V = M − Q. It is easy to see that the above optimization

Step 1: Project the image set {xi } into the PCA subspace by throwing away the smallest principal components; Step 2: Compute the similarities between two examples with the proposed PCA-based similarity measure in (16). Step 3: Compute all the diagonal matrices and edge weight matrices, such as D AA , D B B , …, S AA , S B B ,…, and the corresponding Laplacian matrices L AA , L B B ,…by L = D − S. Step 4: Compute M = D AA + αD B B − D AC − DCC + D B AC + D, Q = S AA + αS B B − S AC − SCC + S B AC + S, and V = M − Q. Step 5: Perform the eigenvalue decomposition for matrix (XMXT )−1 (XQXT ) using (20), and construct D × d transformation matrix W.

problem has the following equivalent variation:    arg max tr WT XQXT W W

s.t. WT XMXT W = E.

(19)

By introducing the Lagrangian multiplier, (19) can be solved by the following generalized eigen decomposition (the proof is provided in the Appendix): XQXT wi = λi XMXT wi

(20)

is the generalized eigenvector of where wi (XMXT )−1 (XQXT ), and λi is the corresponding eigenvalue. To guarantee the nonsingularity of the matrix XMXT , we apply the idea of regularization by adding some constant values to the diagonal elements of XMXT , as XMXT + αE, for any α > 0. Let the column vectors w1 , . . . , wd be the solutions of (20), ordered according to the first d largest eigenvalues. Thus, the embedding can be expressed as xi → yi = WT xi , W = [w1 , . . . , wd ]

(21)

where yi is a d-dimensional vector and W is a D × d transformation matrix. As can be observed from the above, the RANGE algorithm actually belongs to the framework of graph embedding. However, it distinguishes itself from previous algorithms in the construction of adjacency graph (definition of the similarity matrix) and the utilization of the relevance degree constraints. The main steps of the RANGE algorithm is summarized in Table I. V. E XPERIMENTAL R ESULTS In this section, the performance of the proposed RANGE algorithm is evaluated for the image visual search reranking task. We first introduce the dataset and methodologies, and then demonstrate the effectiveness of RANGE algorithm from two aspects. On one hand, we label the relevance degree to some images manually. With these true labeled data and unlabeled data, we demonstrate that our proposed RANGE algorithm is superior to many other feature dimensionality

PANG et al.: RANKING GRAPH EMBEDDING FOR LEARNING TO RERANK

1299

methods for image search reranking task, such as PCA [5], PCA-L1 [21], LDA [5], LSDA [3], SELF [31], LPP [14], MFA [49], and MMP [15]. On the other hand, we employ a pseudo-relevance feedback method to label the images automatically, which aims to prove that RANGE-based visual search reranking scheme is comparable to some state-of-theart methods with no human label effort. Note that, for all the experiments, we adopt the algorithm of ranking SVM for ranking, repeat them three times, and report the average results. The penalization parameter C is set to be 1 in the ranking SVM model.

A. Dataset and Methodologies We validate our RANGE scheme on the popular and publicly available MSRA-MM image dataset [40]. The dataset consists of 68 popular queries collected from the image search engines of Microsoft Bing Search. These queries cover a wide variety of categories, including objects, people, events, entertainments, and locations. For each query, the top 1000 images along with the surrounding texts are collected. The rank orders of these images are obtained as the initial ranked lists. In the dataset, each image to the corresponding query was manually labeled with three levels: 0 “irrelevant,” 1 “relevant,” and 2 “very relevant”. We employ normalized discounted cumulative gain (NDCG) at the depth {10, 20, 30, 40, 50, 60, 70, 80, 90, 100} to evaluate the ranking performance, since NDCG is widely used in information retrieval tasks, especially when there are more than two relevance levels [20]. Given a query q, the NDCG score at the depth d in the ranked documents is defined by j d  2r − 1 NDCG@d = Z d log(1 + j )

(22)

j =1

where r j is the rating of the j th document, and Z d is a normalization constant which is chosen so that a perfect ranking’s NDCG@d value is 1. We obtain the final performance by averaging NDCG from 68 queries. We adopt the features provided by [40] to make the results reproducible and comparable. They are totally 899 features, including: 1) 225 block-wise color moments; 2) 64 HSV color histograms; 3) 256 RGB color histograms; 4) 144 color correlograms; 5) 75 edge distribution histograms; 6) 128 wavelet textures; and 7) 7 face features. In addition, we used the top 500 images in the initial search results for reranking in our experiments, since it is typical that there are very few relevant images after the top 500 search results [50]. In RANGE, the parameters are set as the following: d = 150, s = 5, α = 0.2, k = 3. That is, we reduce the original feature dimension to 150, and randomly select five labeled images from each relevance degree group. The impact of the important parameters is discussed in the following section. As for the parameters used in other algorithms for comparison, we adopt the default arguments in the corresponding original papers.

Fig. 5. Performance comparisons of different dimensionality reduction approaches with manually labeled data.

B. Experiments With Manually Labeled Data In this section, we demonstrate the excellent performance of RANGE with manually labeled data in the image visual search reranking task. Fig. 5 illustrates the NDCG results within our scheme using different feature dimensionality reduction algorithms including: 1) unsupervised algorithms: PCA [5], PCAL1 [19], [21], LPP [14]; 2) supervised algorithms: LDA [5], LSDA [3], MFA [49]; and 3) semisupervised algorithms: SELF [31], MMP [15]. In Fig. 5, TEXT refers to the performance of the initial text-based search performance, BASELINE refers to the performance from the original provided 899-D features, and “RANGE-J3” refers to the algorithm that only adopts the J3 part in the RANGE. Note that all supervised and semisupervised methods use the same labeled data. The reduced dimension is 150 for all the algorithms except for LDA. Since the rank of the between-class scatter matrix in LDA is bounded by c − 1 (c is the class number) and we regard each relevance set as a kind of class, the reduced dimension of LDA is 2. Moreover, the performance gains against “TEXT” and “BASELINE” are shown in Tables II and III, respectively, which take the depth at 10, 50, and 100 for example. We employ statistical significance test [30] with Z -test and the Wilcoxon rank-sum test at the 5% significance level on the performance results, respectively. The results of both statistical significance test methods draw similar conclusions. 1) All the dimensionality reduction algorithms obtain significant performance improvements against “TEXT” at all depths. 2) As for Table II, the performance of MMP, LPP, RANGEJ3, and RANGE algorithms have significant improvement at all depths, and that of SELF algorithm has significant improvement against only at NDCG@10. Others have no significant differences with BASELINE. 3) The performances of the proposed RANGE-J3 and RANGE algorithms have more significant improvements at lower depths, which are shown in Table IV. From Fig. 5, and Tables II–IV, we can observe the following. 1) Image search reranking with visual features is greatly helpful to refine the initial search results, and the

1300

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 8, AUGUST 2013

TABLE II P ERFORMANCE G AINS (%) A GAINST “TEXT” AT THE D EPTHS OF 10, 50, AND

Methods PCA PCA-L1 LDA LSDA MFA SELF MMP LPP RANGE-J3 RANGE

NDCG@10 10.92 10.34 14.26 14.50 14.55 19.37 38.10 37.86 43.04 48.49

TABLE III P ERFORMANCE G AINS (%) A GAINST “BASELINE” AT THE D EPTHS OF

100 NDCG@50 14.91 14.98 15.52 15.74 15.80 16.72 21.10 21.82 22.31 23.29

10, 50, AND 100 NDCG@100 17.76 17.70 18.06 18.24 18.26 19.73 21.74 21.98 22.36 22.39

Methods

NDCG@10

NDCG@50

NDCG@100

PCA PCA-L1 LDA LSDA MFA SELF MMP LPP RANGE-J3 RANGE

−3.46 −3.97 −0.55 −0.34 −0.30 3.89 20.20 19.99 24.50 29.24

−0.84 −0.78 −0.31 −0.20 −0.08 0.72 4.50 5.12 5.54 6.39

−0.49 −0.53 −0.24 −0.08 −0.06 1.18 2.88 3.08 3.40 3.43

TABLE IV S IGNIFICANCE T EST R ESULTS OF Z -T EST AND W ILCOXON R ANK -S UM

2)

3)

4)

5)

6)

7)

performance gains obtained are from about 10% to 50%. Specifically, the proposed “RANGE” method has 48.49% gains against “TEXT” at NDCG@10. The performances of PCA, PCA-L1, LDA, LSDA, and MFA are inferior or similar to that of BASELINE, which demonstrates that not all the dimensionality methods are effective. The superiority of SELF, LPP, MMP, RANGE-J3, and RANGE against BASELINE shows that some dimensionality reduction methods are able to improve the reranking performance. The performance improves distinctly, especially when the depth is lower than 50. It coincides with our requirement since the search results at the top are more important for the users. The proposed RANGE algorithm outperforms the others at depths from 10 to 100, which proves that the introduction of ranking information is effective. The significant superiority of RANGE to PCA proves that the excellent performance of the RANGE is not caused by PCA. Taking NDCG@10 as an example, the performance improvement for RANGE-J3 against LPP is 3.6%, which demonstrates that the objective function of J3 is effective in RANGE. Furthermore, the superior performance of RANGE against RANGE-J3 (4.5% in NDCG@10) proves that the combination of J1 , J2 , and J3 is necessary.

We then evaluate the impact of weighting parameters d and s. Fig. 6 depicts the performance of the RANGE-based reranking method with different d from 50 to 300. From the figure, we discover that the best dimensions are 150, 200, and 250, and too small or too large a dimension deteriorates the performance. It is reasonable to take 150 as our reduced dimension from comprehensive considerations. Further, the influence of different label number s is investigated in Fig. 7. We can see that the performance steadily improves with the increase of s. It demonstrates that more labels can bring more improvement in performance. As for the computation time, our algorithm needs less than 1 s for each query running on a computer of 3.2 GHz CPU.

T EST AT THE 5% S IGNIFICANCE L EVEL NDCG@10 R-J3 R

NDCG@50 R-J3 R

NDCG@100 R-J3 R

TEXT

1

1

1

1

1

1

BASELINE PCA PCA-L1 LDA LSDA MFA SELF MMP LPP

1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 0 0

1 1 1 1 1 1 1 0 0

1 1 1 1 1 1 0 0 0

1 1 1 1 1 1 0 0 0

“R” and “R-J3” represent the proposed RANGE and RANGE-J3 algorithms, and “1” indicates significant improvement, “0” indicates no significant difference.

Fig. 6.

Performance comparisons in different reduced dimensions d.

C. Experiments With Pseudo-Relevance Feedback In this section, we demonstrate the effectiveness of RANGE with automatically acquired labeled data. The labels are obtained with the mechanism of pseudo-relevance feedback (PRF), which has been shown to be effective in improving initial text search results in both text and video retrieval [24]. We modify the idea typicality developed by [24] to carry out PRF. More specifically, the initial search results are first

PANG et al.: RANKING GRAPH EMBEDDING FOR LEARNING TO RERANK

1301

is only 899-D from low-level features. As for the computation time, the proposed algorithm needs less than 1.5 second computation for each query running on a machine of 3.2 GHz CPU. VI. C ONCLUSION

Fig. 7.

Performance comparisons of different labeled numbers s.

Fig. 8.

Performance comparisons with state-of-the-art algorithms.

clustered into a set of clusters with the algorithm proposed by Kontschieder et al. [22]. And then, pseudo-relevance scores of each sample are obtained by combining the cluster typicality and local typicality measures [24], which are decided by visual similarities and initial order information. Finally, the samples are ordered by descending pseudo-relevance scores, from which we select the labeled data from the top, middle, and bottom, respectively. Four state-of-the-art reranking algorithms are used for performance comparison: 1) context reranking (Context-V) [17], a typical graph-based reranking method using random walk with visual features to perform reranking; 2) co-reranking (Coreranking-TV) [50], a classic graph-based reranking method which reinforces both textual and visual information mutually via coupling two random walk graphs; 3) harvesting reranking (Harvesting-TV) [32], a typical supervised reranking method which uses both the textual and visual features and also utilizes the PRF approach to get the training data. For all the approaches, the parameters are selected to achieve the best performance. Specifically, it should be mentioned that the visual features employed in the first three methods were mainly based on the BOW feature [23]. From Fig. 8 we can observe that the proposed RANGEbased reranking method (referred to as RANGE-V to emphasize that it only uses the visual information) has comparable performance against the other methods. 1) With the low-level and low-dimensional features, the proposed RANGE-based image search reranking scheme is similar to or better than Context-V and Harvesting-TV algorithms, which indicates the effectiveness of RANGE feature dimensionality reduction. 2) Co-reranking-TV is slightly better than ours, because it employs a novel mutual reinforcement mechanism with both text and visual features. Only in terms of visual features, the BOW feature it used is up to 2000 dimensional. However, ours

In this paper, we presented a semisupervised manifold learning algorithm called RANGE. Based on RANGE, a novel image search reranking framework was developed. A comprehensive set of experiments was conducted to evaluate the performance of RANGE in application in image search reranking. These comparison studies clearly demonstrated that: 1) RANGE outperforms the traditional classification-oriented dimensionality reduction algorithms, which proves the effectiveness and importance of employing both ranking information into graph embedding and PCA-based edge weight assignment approach; and 2) RANGE-based image search reranking framework achieves better or at least comparable performance against state-of-the-art methods. The proposed RANGE algorithm is general enough to be applied to other multimedia ranking domains such as personal recommendation, tags ranking, etc. Moreover, we plan to incorporate the ranking information to other popular dimensionality reduction algorithms, such as LDA and kernelbased methods. A PPENDIX In this section we provide a proof of (20). Equation (19) can be written as ⎧ D T T ⎪ ⎨ arg max i=1 wi XQX wi wi s.t. w1T XMXT w1 = w2T XMXT w2 = · · · ⎪ ⎩ = w TD XMXT w D = 1.

(23)

By introducing a Lagrangian multiplier, we get J (wi , λi ) =

D 

  wiT XQXT wi − λ1 w1T XMXT w1 − 1

i=1

  −λ2 w2T XMXT w2 − 1 − · · ·   −λ D w TD XMXT w D − 1 .

(24)

Then ∂ J (wi , λi ) ∂wi

 T = XQXT wi + XQXT wi  T −λi XMXT wi − λi XMXT wi = XQXT wi + XQT XT wi − λi XMXT wi −λi XMT XT wi     = X Q + QT XT wi − λi X M + MT XT wi =0

Therefore     X Q + QT XT wi = λi X M + MT XT wi . Because M = MT and Q ≈ QT , we obtain (20).

(25)

(26)

1302

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 8, AUGUST 2013

R EFERENCES [1] M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral techniques for embedding and clustering,” in Advances in Neural Information Processing System, vol. 14. Cambridge, MA, USA: MIT Press, 2001, pp. 585–591. [2] W. Bian and D. Tao, “Biased discriminant Euclidean embedding for content-based image retrieval,” IEEE Trans. Image Process., vol. 19, no. 2, pp. 545–554, Feb. 2010. [3] D. Cai, X. He, K. Zhou, J. Han, and H. Bao, “Locality sensitive discriminant analysis,” in Proc. Int. Joint Conf. Artif. Intell., 2007, pp. 708–713. [4] Z. Cao, T. Qin, T. Liu, M. Tsai, and H. Li, “Learning to rank: From pairwise approach to listwise approach,” in Proc. Int. Conf. Mach. Learn., 2007, pp. 129–136. [5] R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed., New York, NY, USA: Wiley, 2001. [6] L. Duan, W. Li, I. Tsang, and D. Xu, “Improving web image search by bag-based re-ranking,” IEEE Trans. Image Process., vol. 20, no. 11, pp. 3280–3290, Nov. 2011. [7] D. Forsyth and J. Ponce, Computer Vision: A Modern Approach, 2nd ed. Englewood Cliffs, NJ, USA: Prentice-Hall, 2011, ch. 21. [8] N. Guan, D. Tao, Z. Luo, and B. Yuan, “Manifold regularized discriminative non-negative matrix factorization with fast gradient descent,” IEEE Trans. Image Process., vol. 20, no. 7, pp. 2030–2048, Jul. 2011. [9] N. Guan, D. Tao, Z. Luo, and B. Yuan, “Non-negative patch alignment framework,” IEEE Trans. Neural Netw., vol. 22, no. 8, pp. 1218–1230, Aug. 2011. [10] N. Guan, D. Tao, Z. Luo, and B. Yuan, “NeNMF: An optimal gradient method for non-negative matrix factorization,” IEEE Trans. Signal Process., vol. 60, no. 6, pp. 2882–2898, Jun. 2012. [11] N. Guan, D. Tao, Z. Luo, and B. Yuan, “Online non-negative matrix factorization with robust stochastic approximation,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 7, pp. 1087–1099, Jul. 2012. [12] N. Guan, D. Tao, Z. Luo, and J. Shawe-Taylor, “MahNMF: Manhattan non-negative matrix factorization,” J. Mach. Learn. Res., pp. 1–43, 2012. [13] B. Geng, L. Yang, C. Xu, and X. Hua, “Content-aware ranking for visual search,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2010, pp. 3400–3407. [14] X. He, S. Yan, Y. Hu, P. Niyogi, and H. Zhang, “Face recognition using Laplacianfaces,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 3, pp. 328–340, Mar. 2005. [15] X. He, D. Cai, and J. Han, “Learning a maximum margin subspace for image retrieval,” IEEE Trans. Knowl. Data Eng., vol. 20, no. 2, pp. 189–201, Feb. 2008. [16] R. Herbrich, T. Graepel, and K. Obermayer, “Large margin rank boundaries for ordinal regression,” in Advances in Large Margin Classifiers. Cambridge, MA, USA: MIT Press, 2000, pp. 115–132. [17] W. Hsu, L. Kennedy, and S. Chang, “Video search reranking through random walk over document-level context graph,” in Proc. ACM Int. Conf. Multimedia, 2007, pp. 971–980. [18] Y. Huang, D. Xu, and F. Nie, “Semi-supervised dimension reduction using trace ratio criterion,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 3, pp. 519–526, Mar. 2012. [19] Z. Ji, Y. Su, X. Qu, and Y. Pang, “Diversifying the image relevance reranking with absorbing random walk,” in Proc. Int. Conf. Image Graph., 2011, pp. 981–986. [20] K. Järvelin and J. Kekäläinen, “IR evaluation methods for retrieving highly relevant documents,” in Proc. Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retr., 2000, pp. 41–48. [21] N. Kwak, “Principal component analysis based on L1-norm maximization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 9, pp. 1672–1677, Sep. 2008. [22] P. Kontschieder, M. Donoser, and H. Bischof, “Beyond pairwise shape similarity analysis,” in Proc. Asian Conf. Comput. Vis., 2009, pp. 655–666. [23] F. Li and P. Perona, “A Bayesian hierarchical model for learning natural scene categories,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2005, pp. 524–531. [24] Y. Liu, T. Mei, X. Hua, J. Tang, X. Wu, and S. Li, “Learning to video search rerank via pseudo preference feedback,” in Proc. IEEE Int. Conf. Multimedia Expo, Jun.–2008, pp. 207–210. [25] T. Liu, Learning to Rank for Information Retrieval. Berlin, Germany: Springer-Verlag, 2011.

[26] Y. Liu and T. Mei, “Optimizing visual search reranking via pairwise learning,” IEEE Trans. Multimedia, vol. 13, no. 2, pp. 280–291, Apr. 2011. [27] Y. Pang, L. Zhang, and Z. Liu, “Neighborhood preserving projections (NPP): A novel linear dimension reduction method,” in Proc. Int. Conf. Adv. Intell. Comput., 2005, pp. 117–125. [28] S. Roweis and L. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 22, pp. 2323–2326, 2000. [29] S. Robertson, “Overview of the okapi projects,” J. Document., vol. 53, no. 1, pp. 3–7, 1997. [30] R. Sprinthall, Basic Statistical Analysis 9th ed. London, U.K.: Pearson, 2011. [31] M. Sugiyama, T. Ide, S. Nakajima, and J. Sese, “Semi-supervised local fisher discriminant analysis for dimensionality reduction,” Mach. Learn., vol. 78, nos. 1–2, pp. 35-61, 2010. [32] F. Schroff, A. Criminisi, and A. Zisserman, “Harvesting image databases from the web,” in Proc. IEEE Int. Conf. Comput. Vis., Oct. 2007, pp. 1–8. [33] X. Tian, D. Tao, and Y. Rui, “Sparse transfer learning for interactive video search reranking,” ACM Trans. Multimedia Comput., Commun. Appl., vol. 8, no. 3, p. 26, 2012. [34] D. Tao, X. Tang, X. Li, and X. Wu, “Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 7, pp. 1088–1099, Jun. 2006. [35] D. Tao, X. Li, and S. J. Maybank, “Negative samples analysis in relevance feedback,” IEEE Trans. Knowl. Data Eng., vol. 19, no. 4, pp. 568–580, Apr. 2007. [36] J. Tang, S. Yan, R. Hong, G. Qi, and T. Chua, “Inferring semantic concepts from community-contributed images and noisy tags,” in Proc. 17th ACM Int. Conf. Multimedia, 2009, pp. 223–232. [37] J. Tang, Z. Zha, D. Tao, T. Chua, “Semantic-gap oriented active learning for multi-label image annotation,” IEEE Trans. Image Process., vol. 21, no. 4, pp. 2354–2360, Apr. 2012. [38] J. Tešic, A. Natsev, L. Xie, and J. R. Smith, “Data modeling strategies for imbalanced learning in visual search,” in Proc. IEEE Int. Conf. Multimedia Expo, Jul. 2007, pp. 1990–1993. [39] X. Tian, D. Tao, X. Hua, X. Wu, “Active reranking for web image search,” IEEE Trans. Image Process., vol. 19, no. 3, pp. 805–820, Mar. 2010. [40] M. Wang, L. Yang, and X. Hua, “MSRA-MM: Bridging research and industrial societies for multimedia information retrieval,” Microsoft Research Asia, Beijing, China, Tech. Rep. MSR-TR-2009-30, 2009. [41] X. Wang, K. Liu, and X. Tang, “Query-specific visual semantic spaces for web image re-ranking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2011, pp. 857–864. [42] R. Wang, S. Shan, X. Chen, J. Chen, and W. Gao, “Maximal linear embedding for dimensionality reduction,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 9, pp. 1776–1792, Sep. 2011. [43] M. Wang, X. Hua, J. Tang, and R. Hong, “Beyond distance measurement: Constructing neighborhood similarity for video annotation,” IEEE Trans. Multimedia, vol. 11, no. 3, pp. 465–476, Apr. 2009. [44] M. Wang, X. Hua, T. Mei, R. Hong, G. Qi, Y. Song, and L. Dai, “Semisupervised kernel density estimation for video annotation,” Comput. Vis. Image Understand., vol. 113, no. 3, pp. 384–396, 2009. [45] M. Wang, K. Yang, X. Hua, and H. Zhang, “Toward a relevant and diverse search of social images,” IEEE Trans. Multimedia, vol. 12, no. 8, pp. 829–842, Dec. 2010. [46] S. Wei, Y. Zhao, Z. Zhu, and N. Liu, “Multimodal fusion for video search reranking,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 8, pp. 1191–1199, Aug. 2010. [47] B. Xie, M. Wang, and D. Tao, “Toward the optimization of normalized graph Laplacian,” IEEE Trans. Neural Netw. Learn. Syst., vol. 22, no. 4, pp. 660–666, Apr. 2011. [48] T. Xia, D. Tao, T. Mei, and Y. Zhang, “Multiview spectral embedding,” IEEE Trans. Syst., Man, Cybern., B, vol. 40, no. 6, pp. 1438–1446, Dec. 2010. [49] S. Yan, D. Xu, B. Zhang, H. Zhang, Q. Yang, and S. Lin, “Graph embedding and extensions: A general framework for dimensionality reduction,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 1, pp. 40–51, Jan. 2007. [50] T. Yao, T. Mei, and C. Ngo, “Co-reranking by mutual reinforcement for image search,” in Proc. ACM Int. Conf. Image Video Retr., 2010, pp. 34–41.

PANG et al.: RANKING GRAPH EMBEDDING FOR LEARNING TO RERANK

[51] Y. Yang, W. Hsu, and H. Chen, “Online reranking via ordinal informative concepts for context fusion in concept detection and video search,” IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 12, pp. 1880–1890, Dec. 2009. [52] L. Zhang, T. Mei, Y. Liu, D. Tao, and H. Zhou, “Visual search reranking via adaptive particle swarm optimization,” Pattern Recognit., vol. 44, no. 8, pp. 1811–1820, 2011.

Yanwei Pang (M’07–SM’09) received the Ph.D. degree in electronic engineering from the University of Science and Technology of China, Hefei, China, in 2004. He is currently a Professor with the School of Electronic Information Engineering, Tianjin University, Tianjin, China. He has authored more than 80 scientific papers including 14 IEEE Transaction papers in his areas of expertise. His current research interests include subspace learning, object detection and recognition.

Zhong Ji received the Ph.D. degree in signal and information processing from Tianjin University, Tianjin, China, in 2008. He is currently an Associate Professor with the School of Electronic Information Engineering, Tianjin University, Tianjin. He has authored more 30 scientific papers in his areas of expertise. His current research interests include multimedia content analysis and ranking, dimensionality reduction. Dr. Ji has been an Editor and Reviewer for more than 15 international journals and conferences.

1303

Peiguang Jing received the M.S. degree in signal and information processing from Tianjin University, Tianjin¸ China, in 2012, where he is currently pursuing the Ph.D. degree with the School of Electronic Information Engineering. His current research interests include pattern recognition and image retrieval.

Xuelong Li (M’02–SM’07–F’12) is a full Professor with the Center for OPTical IMagery Analysis and Learning, State Key Laboratory of Transient Optics and Photonics, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an, China, China.

Ranking graph embedding for learning to rerank.

Dimensionality reduction is a key step to improving the generalization ability of reranking in image search. However, existing dimensionality reductio...
909KB Sizes 0 Downloads 0 Views