Intrinsic graph structure estimation using graph Laplacian.

LETTER

Communicated by Shohei Shimizu

Intrinsic Graph Structure Estimation Using Graph Laplacian Atsushi Noda [email protected] School of Science and Engineering, Waseda University, Shinjuku, Tokyo 169-8555, Japan

Hideitsu Hino [email protected] Department of Computer Science, University of Tsukuba, Tsukuba, Ibaraki 305-8573, Japan

Masami Tatsuno [email protected] Department of Neuroscience, Canadian Centre for Behavioural Neuroscience, University of Lethbridge, Lethbridge AB T1K 3M4, Canada

Shotaro Akaho [email protected] Mathematica Neuroinformatics Group, National Institute of Advanced Industrial Science and Technology, Tsukuba, Ibaraki 305-8586, Japan

Noboru Murata [email protected] School of Science and Engineering, Waseda University, Shinjuku, Tokyo 169-8555, Japan

A graph is a mathematical representation of a set of variables where some pairs of the variables are connected by edges. Common examples of graphs are railroads, the Internet, and neural networks. It is both theoretically and practically important to estimate the intensity of direct connections between variables. In this study, a problem of estimating the intrinsic graph structure from observed data is considered. The observed data in this study are a matrix with elements representing dependency between nodes in the graph. The dependency represents more than direct connections because it includes influences of various paths. For example, each element of the observed matrix represents a co-occurrence of events at two nodes or a correlation of variables corresponding to two nodes. In this setting, spurious correlations make the estimation of direct connection difficult. To alleviate this difficulty, a digraph Laplacian is used for characterizing a graph. A generative model of this observed matrix Neural Computation 26, 1455–1483 (2014) doi:10.1162/NECO_a_00603

c 2014 Massachusetts Institute of Technology

1456

A. Noda et al.

is proposed, and a parameter estimation algorithm for the model is also introduced. The notable advantage of the proposed method is its ability to deal with directed graphs, while conventional graph structure estimation methods such as covariance selections are applicable only to undirected graphs. The algorithm is experimentally shown to be able to identify the intrinsic graph structure.

1 Introduction A graph is a mathematical structure that represents the relationship of data. Typical examples of graph structures are railroads, the Internet, transmission networks, and neural networks. In recent years, pattern recognition and data mining based on graph structure have been studied. For example, the graph structure was used by Bunke and Riesen (2011) for document analysis, Conte, Foggia, Sansone, and Vento (2004) for biometric identification, and Washio and Motoda (2003) for data mining. A graph is composed of nodes (vertices) and edges (arcs), where a connection between a pair of nodes is expressed by an edge. Consider a case of stock prices; nodes correspond to companies, and edges correspond to direct connectivity between companies. Influences of stock prices can transfer through several companies. As another example, in the case of neural networks, nodes correspond to nerve cells (neurons), and edges correspond to direct connectivity between neurons. If coincident multineuronal firings are observed, correlations between these observed neurons are stronger than others. However, these neurons may not be connected directly, and the observed coincident firings may be a result of influences from other neurons. This effect is called a pseudo-correlation, and it is problematic in data analysis. There are two major graph estimation problems. One is a problem of estimating the importance of nodes when connectivities of whole edges are known—for example, web page ranking problems such as PageRank (Page, Brin, Motowani, & Winograd, 1997) and HITS (Kleinberg, 1999). The other problem is estimating edges based on certain statistics, which represents dependency between nodes—for example, cell-signaling data analysis (Friedman, Hastie, & Tibshirani, 2007). We deal with the latter problem in this letter. There are many different kinds of statistics for representing the relationship between nodes; therefore, we collectively refer to these statistics as dependency. In a graph structure estimation, it is typically assumed that a set of observed data is generated from a multivariate gaussian distribution. Following this, the inverse covariance matrix is regarded as the intensity of direct connections between nodes. In many cases, it is important to estimate a few intrinsic connections between nodes of the graph because the inverse covariance matrix is assumed to be sparse. This formulation is referred to

Intrinsic Graph Structure Estimation Using Graph Laplacian

1457

as the SICS (sparse inverse covariance selection; Dempster, 1972). In the framework of SICS, given the empirical covariance matrix T and a regularization parameter ρ > 0, the objective is to minimize the 1 regularized negative log likelihood of a gaussian distribution: minimize{− log det + tr(T) + ρ||||1 }, 0

(1.1)

where is the inverse covariance matrix, 0 denotes positive definite ness of the matrix, , and ||||1 = i j |i j | is the element-wise 1 -norm of . Typical algorithms are glasso (graphical lasso; Friedman et al., 2007), COVSEL (Banerjee, Ghaoui, & d’Aspremont, 2008), SINCO (Scheinberg & Rish, 2009), LOREC (Luo, 2011), and QUIC (Hsieh, Sustil, Dhillon, & Ravikumar, 2011). These algorithms are computationally efficient and scalable and can deal with large-scale graphs. However, they cannot be used for estimating asymmetric graph structures because they assume a multivariate gaussian distribution as a data generative model. Another method for graph structure estimation is based on the information-theoretic approach (Schneidman, Berry, Segev, & Bialek, 2006; Tang et al., 2008; Tatsuno, Fellous, & Amari, 2009). Schneidman et al. (2006) and Tang et al. (2008) considered how to measure the information content of the observed spike train data. They proposed using the information entropy of the firing probability of neurons for measuring the information content of the data. Using real spike train data, they showed that the pairwise interaction between neurons has dominant information. Tatsuno et al. (2009) considered the generative model of the spike train data and modeled the probability of coincident multineuronal firings px x by a second-order i j log-linear model, log px x = θi xi + θ j x j + θi j xi x j − ψ, i j

(1.2)

where xi is the binary variable representing the state of the firing of neuron i; θi , θi j are the canonical parameters of the exponential model; and ψ is a normalization factor. They estimated parameters θi and θi j to identify the neural network structure from the observed spike trains. Similar to the SICS case, these models assume a symmetric graph structure, up to second-order statistics. Moreover, there are hyperparameters to be determined (Tatsuno et al., 2009). In this study, based on the random walk model on graphs, we propose a method that can extract only important connections in graphs without any tuning parameters. A random walk model is often used in graph analyses. For example, Ribeiro and Towsley (2010) proposed using random walk models for sampling nodes and edges from a given graph to calculate network characteristics such as degree distribution. Sarma, Gollapudi, and Panigrahy (2011) proposed to approximate the importance index of the node

1458

A. Noda et al.

in this PageRank (Page et al., 1997) scheme by using random walk modeling for a given graph. Although a random walk model is used for sampling from a fixed graph structure, to the best of our knowledge, there has been no attempt to use a random walk model on a graph to estimate intrinsic graph structure. Although the proposed framework of graph structure estimation is different from SICS, we compare the ability of our method to several SICS methods, which are actively studied and considered to offer state-of-the-art accuracy. The remainder of this letter is organized as follows. In section 2, we define the notion of graph structures and represent dependency as the total intensity of directed and undirected connections between nodes. Section 3 describes the digraph Laplacian that is a simple matrix function characterizing a graph and describes information transitions on a graph. In section 4, an algorithm for estimating graph structure is proposed. Section 5 shows the comparison of graph structure estimation with our method and existing methods by using both synthetic and real-world data sets. The last section, a conclusion, provides examples of future research on this field of study. 2 Relevant Background and Problem Setting In this section, we explain the notion of graph structures (Chung, 1997) and introduce the framework for estimating graph structure. Let V be a set of nodes {1, . . . , n} and E be a set of edges (i, j), i, j = 1, . . . , n that connects nodes i and j. A graph is characterized by a pair (V, E). There are two types of graphs: undirected and directed. When we are interested in only connectivity between nodes, we use undirected graphs, and when we are interested in the directions of edges, we use directed graphs. Our aim is in estimating the intrinsic graph connectivity when a certain kind of dependency between nodes is observed with noise. The relationship of intrinsic graph connectivity and dependency between nodes is modeled by a function, f : Rn×n → Rn×n → f () = .

(2.1)

The formal definitions of and are given later. By modeling information transition on a graph by a function f, we propose a framework of estimating the intrinsic graph structure by an observation .1 As a concrete model of f, we introduce the random walk on graphs, which is characterized by the digraph Laplacian introduced in section 3. We show a conceptual diagram of graph estimation in Figure 1. 1 Later in this study, we introduce the random walk model. In the model, a walker is regarded as an information carrier on the transit network represented by a graph.


1459

Figure 1: Conceptual diagram of a graph estimation problem. We focus on a problem to estimate edges based on data observed at each node.

Figure 2: Relationship between ξi j and θi j . The superficial dependency ξi j is composed of the intrinsic connectivity between various nodes.

Let be a connectivity matrix with []i j = θi j , i = j, which represents the connection intensity from nodes i to j, and θi j can be zero if there is no edge between i and j. For the example of stock prices, represents direct connectivity between companies. We can deal with both undirected and directed graphs, but in this study, we mainly consider the directed graph, that is, generally θi j = θ ji . In our representation, diagonal elements []ii are always set to 0, that is, we assume does not include self-loops. We emphasize that the connectivity in this study indicates the direction and intensity of information transition via the edge, unlike graphical models (Pearl, 2000) in which edges represent causality. Since we cannot observe directly, we introduce another parameter, which represents the dependent relationship between nodes. Let be a dependency matrix with []i j = ξi j , i = j, which can be represented as ξi j = ci + ci j θi j +

k∈V

ckij θik θk j +

ckl i j θik θkl θl j + · · · , for i = j,

(2.2)

k,l∈V

where ci , ci j , ckij , ckl i j . . . are attenuation coefficients. We show the relationship between parameters ξi j and θi j in Figure 2. The first term of equation 2.2 represents external inputs. The second term represents direct information propagation between nodes i and j, while the third term represents information propagation via one intermediate node k, and so on. A concrete example of equation 2.2 is shown in equation 3.11. For stock prices, is

1460

A. Noda et al.

Algorithm 1: General Procedure for Estimating Θ. input Ξ(0) ∈ Rn×n , [Ξ(0) ]ii = 0, i = 1, · · · , n. A positive constant r > 0. Maximum number of iterations k . Mapping f and its inverse f −1 , and projections Π and π initialize Ξ(1) = Ξ(0) + rIn ,

In : Unit matrix in Rn×n , k = 0

repeat (A)

Θ(k) = f −1 (Ξ(k) )

Calculate a mapping from Ξ(k) to Θ(k)

(B)

Θ(k) = Π(Θ(k) )

Impose certain constraints on Θ(k) and obtain Θ(k)

(C)

Ξ(k) = f (Θ(k) )

Calculate a mapping from Θ(k) to Ξ(k)

(D)

Ξ(k+1) = π(Ξ(k) )

Update diagonal elements of Ξ(0) using Ξ(k)

k ←k+1 until certain stopping criterion is satisfied return Θ(k) = Π(f −1 (Ξ(k) )))

supposed to be a certain quantity that represents how one company’s stock price affects another company’s stock price. As shown in equation 2.2, ξi j includes the information propagating through various paths, and direct relationships between nodes are not clear. Therefore, we propose a framework for estimating the connectivity matrix from . As described before, represents the essential structure of the graph. In principle, we could estimate from by taking the inverse of f. Therefore, we have to choose f, which has the inverse f −1 . For computational and theoretical simplicity, we later propose using the matrix exponential map as the function f. In algorithm 1, we show a general procedure for estimating when f and off-diagonal elements of are given. Algorithm 1 is the general framework of proposed graph structure estimation; an algorithm with a concrete model for f is given in algorithm 2. Since autodependency ξii is not well defined in terms of equation 2.2, we consider ξii as a latent variable and replace it with an appropriate initial value γ > 0. That is, the diagonal elements of (0) are complemented by 0 and replace the diagonal elements [(0) ]ii with r > 0 so that all of the eigenvalues of (1) become positive.2 Then we obtain (1) = f −1 ((1) ). We impose certain constraints on (1) . One of the constraints is that the diagonal elements [(1) ]ii should be 0. In section 3, as an additional constraint, we restrict the maximum number of nonzero elements of . Let T ⊆ Rn×n be a subspace of the connectivity , which is composed of elements that satisfy these constraints. We denote the

2 Technically, this operation is to ensure that we can calculate the inverse of the matrix log function in algorithm 2, and we set r = 1 when performing experiments.


1461

Figure 3: Conceptual diagram of the general procedure for estimating .

projection of to the constrained space T by

: Rn×n → T (k) → ((k) ) = (k) .

(2.3)

Thus, we obtain (1) = f ( (1) ). By using the diagonal elements of (1) , we update (2) = (0) + diag( (1) ), where diag(M) is a matrix-valued function for a square matrix M that replaces off-diagonal elements of M by 0. We denote this updating as a map from (k) to (k+1) : π : f (T ) → X (k) → π ( (k) ) = (k+1) ,

(2.4)

where X denotes a subspace of Rn×n composed of elements that the offdiagonal elements are equal to those of . We can estimate from offdiagonal elements of by repeating the above procedure. A conceptual diagram of the general procedure for estimating from is shown in Figure 3. 3 Model of Graph Structure We propose using the exponential map as the function f in equation 2.1 for theoretical and computational simplicity. Then we can simply obtain the inverse of f as the matrix logarithm. In this section, we explain digraph Laplacian (Li & Zhang, 2010) for considering the exponential map of directed graphs. We explain the notion of a random walk on a graph characterized by the digraph Laplacian and represent information propagation on a graph by a random walk.

1462

A. Noda et al.

3.1 Digraph Laplacian. The digraph Laplacian (Li & Zhang, 2010) is a function of a graph (V, E). First, we consider a directed graph and assume the connectivity matrix as []i j = θi j =

1

there is an edge from i to j and i = j,

0

otherwise,

(3.1)

where is known as an adjacency matrix in the literature of the graph theory. This assumption is reasonable when the connectivity θi j takes two distinct values that represents the states “connected” and “unconnected.” We expect that the estimation of becomes more robust to observation noises by this assumption. Next, we define a digraph Laplacian L as a function of as

[L()]i j =

⎧ ⎪ θik ⎨

i = j, (3.2)

k∈V

⎪ ⎩ −θ

ij

i = j.

From equation 3.2 all row-wise sums of L() are 0s, for example, L()1 = 0, where 1 and 0 are n-dimensional vectors whose elements are all 1s and 0s, respectively. The graph Laplacian is defined as a special case of the digraph Laplacian with a symmetric connectivity matrix : θi j =

1

i and j are connected,

0

otherwise.

(3.3)

The definition of the Laplacian matrix L is the same as equation 3.2, and L also becomes symmetric. By definition, all row-wise and columnwise sums of L() are 0s—for example, L()1 = 0 and 1T L() = 0T . The graph Laplacian plays a central role in spectral graph theory (Chung, 1997) and is utilized for various purposes of data analysis such as clustering (Shi & Malik, 2000; Belkin & Niyogi, 2001), manifold learning (Sahbi, Etyngier, Audibert, & Keriven, 2008), and web page ranking (Page et al., 1997; Kleinberg, 1999). 3.2 Random Walk on Graph. In this section, we consider a random walk on a graph characterized by (Kondor & Lafferty, 2002). Let β ∈ R be the transition probability between connected nodes in a unit time. We focus on the transition probability in a short interval 1/τ . The transition


1463

Figure 4: Transition probability on a graph with the connectivity .

probability matrix for this interval is written as ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ β In − L() = ⎜ ⎜ τ ⎜ ⎜ ⎜ ⎜ ⎝

1−

β θ1k τ k∈V β θ τ 21 .. . β θ τ n1

β θ τ 12 β 1− θ2k τ

···

β θ τ 1n

..

.. .

k∈V

···

. 1−

β θnk τ

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟, ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

k∈V

(3.4) where In is the n × n unit matrix. Note that (In − βL()/τ )1 = 1 holds. The element In − βL()/τ i j is the transition probability from node i to j in the interval 1/τ , and In − βL()/τ ii is the probability that a random walker remains in node i. We show a conceptual diagram of the transition probability that is represented by the matrix In − βL()/τ in Figure 4. Let ei ∈ Rn be an n-dimensional unit vector defined as [ei ]k =

1

k = i,

0

k = i.

(3.5)

The jth element of eTi (In − βL()/τ ) gives the probability of a one-step transition from node i to j. In the same way, eTi (In − βL()/τ )τ , (i.e., the ith column of (In − βL()/τ )τ ,) gives the τ -step transitions probability vector starting from the node i. By considering the continuous time limit of the random walk, we define the transition probability in a unit time as lim

τ →∞

βL() In − τ

τ

= e−βL() .

(3.6)

1464

A. Noda et al.

Figure 5: Relationship between variables. The dependency matrix is obtained by mapping the connectivity matrix . The quantity T = (ti j ) is the observed dependency = (ξi j ) with noise ε. We assume that is a {0,1}-valued matrix and approximate T using the random walk αe−βL() .

The digraph Laplacian L() characterizes a random walk on a directed graph, and the i j element of the matrix e−βL() is regarded as the probability that the walker starting from node i stays at node j after a unit time. When L() is the graph Laplacian, the right-hand side of equation 3.6 is called the diffusion kernel (Kondor & Lafferty, 2002). 3.3 Observed Data Model. In this section, we introduce a generative model of the observed data using the exponential map defined in equation 3.6. We show the relationship between variables in Figure 5. In our model, []i j = ξi j is composed of intrinsic connectivity θi j , i, j = 1, . . . , n modeled by a function f in equation 2.1. Here we assume that the dependency is observed with an additive noise as ti j = ξi j + ε,

(3.7)

where ε represents a noise term. Specifically, we consider ti j ∈ R as an estimate of ξi j , i = j and T as an n × n matrix with the i j element tij . That is, the quantity tij is characterized by the dependency ξi j with a certain probability model and T is observed instead of . We can assume an arbitrary distribution for tij , but for simplicity, we assume tij are samples from gaussian distributions with means ξi j . The probability density function of observed data tij is written as

1 2 exp − 2 (ti j − ξi j ) , p(ti j ; ξi j ) = √ 2σ 2πσ 2 1

(3.8)


1465

where σ 2 is a common variance. The joint distribution of the observed data T can be expressed as p(T; ) =

p(ti j ; ξi j ).

(3.9)

i= j

We observe only the dependency with noise, though we can estimate connectivity by the algorithm proposed in the next section. Figure 5 shows that the intrinsic connectivity is simplified, and then the digraph Laplacian L() is calculated. The intrinsic connectivity leads to dependency via a function f. We approximate using the random walk, and we assume a matrix T is observed as affected by additive noises. With a multiplicative factor α ∈ R+ for e−βL() in equation 3.6, the random walk on a graph is equivalent to the polynomial transformation model in equation 2.2, and we obtain β 2 L()2 β 3 L()3 αe−βL() = α · In − βL() + − + ··· 2! 3!

(3.10)

by a Taylor series expansion. The off-diagonal element (i, j ∈ V, i = j) of αe−βL() is written as −βL() β2 αe = α· βθi j + θik θk j − θi j θik − θi j θ jk + · · · ij 2! k∈V k∈V k∈V β (θik + θ jk ) + · · · θi j =α β − 2! +α

k∈V

β2 + ··· θik θk j + · · · . 2!

(3.11)

k∈V

The last expression of equation 3.11 corresponds to equation 2.2, = f (|α, β ) = αe−βL() ,

(3.12)

where we extended the function f to have additional parameters in the random walk in equation 3.10. We can see the relationship between equations 2.2 and 3.11, that is, α(β − 2!β k∈V (θik + θ jk ) + · · ·)θi j corresponds 2 k to ci j θi j , and α( β2! + · · ·) k∈V θik θk j corresponds to k∈V ci j θik θk j . Since −βL() ]i j by equation 3.12, ξi j = [αe ti j = ξi j + ε = αe−βL() i j + ε,

ε ∼ N (0, σ 2 ),

(3.13)

1466

A. Noda et al.

Algorithm 2: Proposed Algorithm for Graph Structure Estimation. input observed data T ∈ Rn×n , [T ]ii = 0, i = 1, · · · , n, r > 0, maximum number of iterations k initialize Ξ = T + rIn for m = 1 to n(n − 1) do for k = 1 to k do

⎧ ⎨ 1 ˆ = θˆ = Π(θîj ) = [Θ] ij ij ⎩ 0

| [log Ξ]ij | ≥ ζm , andi = j otherwise. Estimate Θm from [log Ξ]ij

(αm , β m ) = arg min J (α, β, Θm ) α,β m

Ξ = T + diag α · e−β

Estimate αm and β m

m L(Θm )

Update diagonal elements of Ξ

end for end for m ˆ =

arg min

m∈{1,2,··· ,n(n−1)} m ˆ

J (αm , β m , Θm )

Select m ˆ which minimizes the objective J

return Θ

where σ 2 is the variance of the gaussian noise for the observation. We have to calculate the inverse function of f in equation 2.1 to estimate a graph structure . Since f is modeled by the exponential map αe−βL() , we only have to calculate log by using fixed α and β. 4 Algorithm for Estimating Graph Structure In this section, we propose an iterative algorithm for estimating parameters α, β, and for the proposed model in equation 3.13 from an observed matrix T. This procedure is summarized in algorithm 2. Since we assume observation noises are gaussian, we consider the objective function as the sum of squared residuals, J (α, β, ) =

2 ti j − αe−βL() i j ,

(4.1)

i, j∈V,i= j

and estimate α, β, and that minimize equation 4.1. If = αe−βL() , the i j element of the logarithm of is [log ]i j = [log(αI) − βL()]i j ⎧ ⎨ log α − β θik i = j, = k∈V ⎩ βθi j i= j.

(4.2)


1467

We can estimate as ˆ i j = θî j = []

[log ]i j β

, for i = j.

(4.3)

We note that diagonal elements of are not defined, and we update []ii while updating parameters α, β, and using the following iterative algorithm. First, we initialize as = T + rIn ,

r > 0,

(4.4)

where r is chosen so as to det = 0. Next, we calculate the logarithm of . For fixed α and β, we calculate an approximation of the matrix logarithm ˆ which corusing the Jordan normal form (Higham, 2008) of to obtain , responds to procedure A in algorithm 1. The number of maximum possible edges m is incremented by a value of 1 starting from m = 1 in the iteration of the algorithm. Let ζm be the mth largest value of |[log ]i j | for i = j. Then we set ˆ i j = θî j = (θî j ) = []

1

| log i j | ≥ ζm ,

0

otherwise.

and i = j

(4.5)

By this thresholding operation, we choose m most likely connected edges. We estimate α and β that minimize the objective function in equation 4.1 ˆ , with ˆ ), (α, ˆ βˆ ) = arg min J(α, β, α,β

(4.6)

and obtain ˆ |α, ˆ βˆ ). = f (

(4.7)

We used the Nelder-Mead method (Lagarias, Reeds, Wright, & Wright, 1998) for optimizing α and β. The procedures shown in equations 4.6 and 4.7 correspond to procedure B in algorithm 1. We update the diagonal elements of by ˆ ˆ = T + diag αe ˆ −βL( ) ,

(4.8)

which corresponds to procedure D in algorithm 1. For a fixed m, we repeat these procedures k times. The stopping criterion of the inner loop can

1468

A. Noda et al.

be defined by the gap between the previous and current update of the estimate, but for simplicity, we set the maximum number of iterations k . In our preliminary experiments, we observed that the estimate converged at most two iterations; hence, we set k = 2 in our experiments. We denote the estimates obtained by these procedures as α m , β m , and m . We increased m to m + 1 and repeated algorithm 2. After we obtain a set ˆ as of estimates, S = {(α m , β m , m )}n(n−1) m=1 , we select m ˆ = m

arg min

J α m , β m , m ,

(4.9)

m∈{1,2,···,n(n−1)}

and obtain the estimate mˆ , which leads to the estimate of the intrinsic ˆ We note that when we deal with undirected graphs, θi j = θ ji , connectivity . the size of S is reduced to n(n − 1)/2. 5 Experiments In this section, we show experimental results of graph structure estimation. In section 5.1, we focus on an estimate of undirected graphs. We compare results of the proposed method with results of SICS methods. Since generative models of the proposed framework and SICS framework are different, we estimate graph structures from observed data according to each generative model. Observed data are generated from the random walk in section 5.1.1. This condition is faithful to the proposed framework. Observed data are generated by the multidimensional gaussian distribution in section 5.1.2. This condition is faithful to the SICS framework. In section 5.2, we estimate directed graph structures using the proposed method only because SICS methods cannot be applied to directed graph structure estimation. In section 5.3, we apply the proposed method and SICS methods to real-world financial data sets and compare the results of these methods. As representative SICS methods, we adopt glasso (Friedman et al., 2007), QUIC (Hsieh et al., 2011), and LOREC (Luo, 2011). We use StARS (stability approach to regularization selection; Liu, Roeder, & Wasserman, 2010) to find an appropriate regularization parameter ρ in SICS methods. 5.1 Artificial Data of Undirected Graphs. First, we estimate symmetric graph structures using both the proposed method and SICS methods and compare their performances. We set the number of nodes as n = 20. 5.1.1 Data in Accordance with the Random Walk. We compare the accuracy of different methods on the data in accordance with equation 3.13, the random walk on the graph. To see the relationship between the accuracy and the density of the ground-truth graph ∗ , we first define the density of ∗ as the probability P([∗ ]i j = 1), for i = j. A density is a value from 0 to 1.


1469

We use ∗ with different densities (0.2, 0.4, and 0.6) as case studies. The definition of accuracy is provided later. Next, we generate observed data T by ∗ ti j = αe−βL( ) i j + ε,

ε ∼ N (0, σ 2 ),

(5.1)

where α and β are randomly generated from a uniform distribution over [0, 1]. In SICS methods, diagonal elements of T are set to a common value r in the same way as equation 4.4. In all our experiments, r is fixed at 1, and it always made the eigenvalues of ∗ positive. We regard T as the covariance matrix of a gaussian distribution and solve the 1 regularized gaussian MLE problem: ˆ = arg min{− log det + tr(T) + ρ||||1 }.

(5.2)

0

Then, to obtain {0, 1}-valued connectivity, we transform the estimated θî j by the following thresholding operation: θî j =

0

θî j = 0,

1

otherwise.

(5.3)

We use StARS to find the regularization parameter ρ. StARS finds an appropriate value of ρ based on stability estimated by random subsampling. In our study, the observed data T are generated 100 times according to the assumed generative model for calculating stabilities in StARS. The proposed method and SICS methods return the estimated graph ˆ from observed data T. In Figure 6, we show the estimation structure results where the vertical axis and horizontal axis indicate the accuracy and size of the observed noise, respectively. We define the accuracy of estimation as accuracy (%) = 1 −

i, j∈V, i= j

|θi∗j − θî j |

n(n − 1)

× 100,

(5.4)

where θî j is the i j element of estimated connectivity matrix. We also define the size of observed noise as Size of noise =

Standard deviation of noise ε 1

Sample mean of observed data

n(n−1)

i, j∈V, i= j ti j

,

(5.5)

1470

A. Noda et al.

Figure 6: Accuracies of undirected graph structure estimations with different methods. Densities of the ground-truth graphs are (a) 0.2, (b) 0.4, and (c) 0.6. Error bars are standard deviations.

that is, the size of noise is the coefficient of variation. Figures 6a to 6c are accuracies obtained by different methods when the densities of are 0.2, 0.4, and 0.6. In Figure 6, we can see that the accuracies of SICS methods decrease as the densities increase. The proposed method is shown to obtain high accuracies in most cases. In Figure 7, to see the effect of the regularization parameters in SICS methods, we show the average estimation accuracies by the proposed method and the glasso method with various regularization parameters. The vertical axis and horizontal axis of the graphs indicate accuracy and regularization

1471

60 40

proposed method glasso

0

20

accuracy

80

100


0.000

0.005

0.010

0.015

0.020

rho

60 40

proposed method glasso

0

20

accuracy

80

100

(a) size of noise is 0.006

0.000

0.005

0.010

0.015

0.020

rho

(b) size of noise is 0.06 Figure 7: Averaged accuracies with 1 standard deviation error bars of undirected graph structure estimation. Dashed lines show the results of glasso using various regularization parameters, and solid lines show those of the proposed method with different noise levels: (a) 0.006 and (b) 0.06. The density of the ground-truth graph is fixed to 0.2. The observed matrix is generated 100 times in each setting.

1472

A. Noda et al.

parameters ρ, respectively. In this study, we fixed the density of ∗ to 0.2, but we observed a similar tendency for other densities. We prepared two levels of noise: 0.006 and 0.06. The proposed method has no tuning parameter in contrast to SICS methods. Figure 7 shows that the misspecification of the regularization parameter of glasso leads to inaccurate estimation results. In this study, the proposed method achieved almost the same accuracy as the optimal case of glasso. 5.1.2 Data Generated from the Multidimensional Gaussian Distribution. In this section, we compare the accuracy of different methods on the data generated from multidimensional gaussian distributions. We make the precision matrix from the ground-truth graph ∗ as ∗ = ∗ + In . We sample {si }i=1,···,N , si ∈ R20 from the multidimensional gaussian distribution N (0, (∗ )−1 ), where N is the number of samples. We consider the empirical covariance matrix of s1 , . . . , sN as an observation T=

N 1 (si − μ)(s ¯ i − μ) ¯ T, N i=1

where

μ ¯ =

N 1 si . N

(5.6)

i=1

In Figure 8a, we show the relationship between the accuracy and the number of samples N when the density of ∗ is fixed to 0.2 to investigate how the performance is influenced by the number of samples. In Figure 8b, we show the accuracy when N = 200 to investigate how the performance is influenced by the connection density. From Figure 8a, we see that when the data are generated from a multivariate gaussian with the inverse covariance matrix ∗ , the proposed method accurately estimates the ground-truth graph. From Figure 8b, we can see that the accuracies of every method decrease as the densities increase. However, the accuracy decrease of the proposed method is slower than others and retains more than 60% accuracy even when the density reached 0.6. There are two possible reasons for the better accuracies achieved by the proposed method shown in Figures 6 and 8. The first is the essential difference of the underlying models of the proposed and SICS frameworks. While the SICS framework considers only the second-order statistics, the proposed framework takes higher-order statistics into account, as shown in equations 3.10 and 3.11. The second reason is the difference of the regularization methods. We adopt StARS for defining the regularization parameter in SICS methods, while algorithm 2 has a mechanism to automatically determine an appropriate sparsity based on the fitness of the model. 5.2 Artificial Data of Directed Graphs. In this section, we estimate directed graphs represented by asymmetric adjacency matrices . Since SICS methods cannot be used for estimating directed graphs, we use only

200

1000

(a)

number of observed data

500

10000

LOREC

QUIC

glasso

accuracy

0.2

(b)

density

0.4

0.6

LOREC

QUIC

glasso

proposed

Figure 8: Accuracies of undirected graph structure estimations with different methods. (a) Density of the ground-truth graph is set to 0.2. The numbers of observed data are 200, 500, 1000, and 10,000. (b) Number of observed data is set to 200. Densities of the ground-truth graphs are 0.2, 0.4, and 0.6. Error bars are standard deviations.

accuracy

100

80

60

40

20

0

100 80 60 40 20 0

proposed

Intrinsic Graph Structure Estimation Using Graph Laplacian 1473

A. Noda et al.

60 40

accuracy

80

100

1474

20

density=0.2 density=0.4

0

density=0.6

0.006

0.017

0.028

0.04

0.051

size of noise Figure 9: Averages of accuracies of directed graph structure with respect to different densities. The densities of the ground-truth graphs are 0.2, 0.4, and 0.6. Error bars are standard deviations. The observed matrix is generated 100 times in each setting.

the proposed method. We prepare several with densities 0.2, 0.4, 0.6, and randomly generate α, β from a uniform distribution over [0, 1]. We set the number of nodes to n = 20. We generate observed data T according to equation 3.13 in the same manner as the experiments of the undirected graph. We show the relationship between accuracy, defined by equation 5.4, and the size of observed noise defined, by equation 5.5, in Figure 9. As shown in Figures 6 and 9, the proposed method is able to estimate the directed graph structure the same as the undirected case. The accuracy of estimation deteriorates with the increase of the density of the ground-truth graph and the size of noise. This tendency is observed in the experiments with both directed and undirected graphs. Sections 5.1 and 5.2 are summarized as follows. We performed experiments with artificial data and showed that the proposed method is able to estimate the graph structures more accurately than the conventional SICS methods in most cases. The proposed method, which has no


1475

tuning parameter, achieved comparable or superior accuracies to the finely tuned glasso. Also, the proposed method is applicable to directed graphs, while the conventional SICS methods are applicable only to undirected graphs. 5.3 Real-World Data. In this section, we apply the proposed method and SICS methods to analyze a set of time-series data of stock prices downloaded from Yahoo! finance.3 First, we consider an undirected graph with nodes corresponding to companies to compare the results of the proposed method with the results of SICS methods. Next, we consider a directed graph to analyze the relationship of the companies in more detail by the proposed method. The data set is composed of the daily closing prices of 10 companies from November 11, 2010, to July 11, 2011, which is equivalent to 158 business days. We show the data in Figure 10. We take an interest in how the relationship of companies changed around the day of the Great East Japan Earthquake on March 11, 2011. 5.3.1 Undirected Graphs. The edges of the graph represent a certain kind of relationship of stock prices between companies. For example, if stock prices of companies i and j behave similarly, it is natural to consider that there is an edge between these companies. We estimate the undirected connectivity matrix from an observation. Let Si (t) ∈ R be a stock price of company i at time t. Since we focus on the existence of connections and do not consider whether connections are positive or negative, we consider an absolute value of relative change rate Ri (t) = |(Si (t + 1) − Si (t))|/Si (t). We assume that virtual random walkers in the stock market have uptick and downturn factors. They transit between nodes according to the rate determined by the intensity of the connection, and we assume the rise and decline of stock prices of companies can be explained by the transition of these random walkers. Given the stock values of each company at each time tick, we first translate the stock values to the difference of them at times t and t + 1 to obtain Ri (t), i = 1, . . . , n and t = 1, . . . , N. The degree of synchrony of companies i and j is measured by using Kendall’s τ correlation (Kendall & Gibbons, 1990): ti j =

1

Deformed graph laplacian for semisupervised learning.

Subvoxel accurate graph search using non-Euclidean graph space.

Graph Estimation From Multi-Attribute Data.

A significance test for graph-constrained estimation.

Parameter estimation in tree graph metabolic networks.

Graph Estimation with Joint Additive Models.

Studying cerebral vasculature using structure proximity and graph kernels.

String graph construction using incremental hashing.

Immunoglobulin Classification Using the Colored Antibody Graph.

A Local Structural Descriptor for Image Matching via Normalized Graph Laplacian Embedding.

A clustering-based graph Laplacian framework for value function approximation in reinforcement learning.

2-Norm Constraint and Graph-Laplacian PCA Method for Feature Extraction.

On the least signless Laplacian eigenvalue of a non-bipartite connected graph with fixed maximum degree.

Progressive image denoising through hybrid graph Laplacian regularization: a unified framework.

Graph ensemble boosting for imbalanced noisy graph stream classification.

Interval Graph Limits.

Positive Semidefinite Rank-based Correlation Matrix Estimation with Application to Semiparametric Graph Estimation.

Environmental evolutionary graph theory.

Classifying multigraph models of secondary RNA structure using graph-theoretic descriptors.

Occlusal caries detection using random walker algorithm: a graph approach.

Prostate MRI segmentation using learned semantic knowledge and graph cuts.

SEGMENTATION OF MYOCARDIUM USING DEFORMABLE REGIONS AND GRAPH CUTS.

Boosting for multi-graph classification.

Backtrackless walks on a graph.