Journal of Bioinformatics and Computational Biology Vol. 12, No. 5 (2014) 1450025 (14 pages) # .c Imperial College Press DOI: 10.1142/S0219720014500255

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by UNIVERSITY OF AUCKLAND LIBRARY - SERIALS UNIT on 02/09/15. For personal use only.

Evaluating predictive performance of network biomarkers with network structures

Shang Gao*,†,††, Ibrahim Karakira†, Salim Afra†, Ghada Naji‡, Reda Alhajj†,§,¶, Jia Zeng|| and Douglas Demetrick** *College

of Computer Science and Technology, Jilin University Changchun, China



Department of Computer Science, University of Calgary 2500 University Drive N. W., Calgary, Alberta, Canada



Department of Biology, Lebanese University, Tripoli, Lebanon

§Department

of Computer Science, Global University, Beirut, Lebanon



Institute of Informatics, Wroclaw University of Technology, Wroclaw, Poland ||Institute

for Personalized Cancer Therapy, MD Anderson Cancer Center The University of Texas 1515 Holcombe Blvd, Houston, Texas, USA

**Department

of Pathology, Oncology and Biochemistry and Molecular Biology University of Calgary, 3330 Hospital Drive N. W., Calgary, Alberta, Canada †† [email protected] Received 10 April 2014 Revised 26 June 2014 Accepted 6 August 2014 Published 15 September 2014

Network is a powerful structure which reveals valuable characteristics of the underlying data. However, previous work on evaluating the predictive performance of network-based biomarkers does not take nodal connectedness into account. We argue that it is necessary to maximize the bene¯t from the network structure by employing appropriate techniques. To address this, we aim to learn a weight coe±cient for each node in the network from the quantitative measure such as gene expression data. The weight coe±cients are computed from an optimization problem which minimizes the total weighted di®erence between nodes in a network structure; this can be expressed in terms of graph Laplacian. After obtaining the coe±cient vector for the network markers, we can then compute the corresponding network predictor. We demonstrate the e®ectiveness of the proposed method by conducting experiments using published breast cancer biomarkers with three patient cohorts. Network markers are ¯rst grouped based on GO terms related to cancer hallmarks. We compare the predictive performance of each network marker group across gene expression datasets. We also evaluate the network predictor against the average method for feature aggregation. The reported results show that the predictive performance of network markers is generally not consistent across patient cohorts. Keywords: Network model; optimization; graph Laplacian; network markers; breast cancer.

††Corresponding

author. 1450025-1

S. Gao et al.

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by UNIVERSITY OF AUCKLAND LIBRARY - SERIALS UNIT on 02/09/15. For personal use only.

1. Introduction Networks provide unique advantages in modeling multiplex relationships between genes, proteins, and diseases. In recent years, network-based approaches became promising in the disease biomarker detection problem. A notion of \network medicine" has received considerable spotlights centered at the principle that genes and gene products act highly interactively to cause complex diseases.1–3 For example, in cancers, identifying pairwise disordered relationships between genes and proteins naturally ¯ts the goals of network modeling, i.e. to study relationships and ¯nd connection patterns between nodes and modules in a global map.4,5 To this end, many integrative methods using network data have been proposed to track di®erential regions of the network (i.e. subnetworks) predictive of disease phenotype.4 In this direction, a fundamental assumption is that networks re°ect accurate connectedness of data. This has become a critical concern in protein interaction networks due to di®erent experimental protocols and the presence of noise.6 For gene co-expression networks, where two genes are connected if correlated, link inaccuracies mostly arise due to data heterogeneity. Many methods are proposed to quantify the interconnectedness and the topological overlapping of networks7–9 and to study the dynamical network biomarkers.10–12 Here, our aim is to use nodal connectedness to evaluate network-based markers (by aggregating genes into network predictors) against certain outcome such as clinical variables. Indeed, network modules (a.k.a. subnetworks) have been deemed a fundamental medium to understand and to naturally represent biological pathways and cellular processes.3,13,14 For this reason, many believe that network modules could provide useful directions for ¯nding key components attributable to disease phenotypes. Although obtaining network markers had been the major focus since van't Veer's pioneer work in the network-based thinking,15 (for instance, Chuang et al. derived biomarkers to predict breast cancer metastasis,3 little work have been done to evaluate the predictive power of the derived network markers, that is, to evaluate the predictability of a gene set against some phenotype given the connectedness of constitutive genes or gene products. In this paper, we introduce a lightweight, parameter-free method for evaluating network-based markers, called Interconnectedness Network Score (INS), using clinical outcome and gene expression data. The motivation of designing an e®ective method to gauge the predictive power of network biomarkers is demanding. After extensive e®orts for ¯nding targeted disease biomarkers, for example, from our previous results,16,17 one needs to retrospectively check to see how predictive derived network markers are against clinical outcome, especially to compare the predictive results with singleton markers (i.e. individual genes that are known related to the diseases). The usual approach to aggregate network connectedness is by averaging the gene expressions of constitutive genes in a network,3,18–20 and then using Receiver Operating Characteristic (ROC) to measure the performance of the network-based markers. This way each network module is essentially transformed to a pseudo-feature. The upside of such aggregation is that 1450025-2

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by UNIVERSITY OF AUCKLAND LIBRARY - SERIALS UNIT on 02/09/15. For personal use only.

Evaluating predictive performance of network biomarkers with network structures

Fig. 1. The overall work°ow of INS for evaluating predictive level of network-based markers/modules. We ¯rst collect network biomarkers from previous studies and extract the relevant constituent genes from expression data after preprocessing (Step 1), then we learn weights for each node based on nodal connectedness (Step 2), ¯nally, we derive network predictors by re-weighting the constituent genes for performance evaluation.

we can utilize standard ROC curves to interpret the predictive level (in this context, predictability refers to the performance of classi¯ers); the downside of this is that when averaging, the connectedness information of network modules is lost. For example, consider two network markers with di®erent nodal connectedness as shown in Fig. 1, with simple average aggregation, the derived new features are indistinguishable between graph structures of two network modules (because both derived features equal to the average gene expressions over nodes A, B, C, and D). Therefore, the question we ask here is how to derive e®ective features that better describe network markers/modules given their graph structures? Here, we design a method to derive module-based network features (Fig. 1), and aim to propose a method to evaluate the performance of network biomarkers. The main idea is to learn a weight coe±cient for each node in the network modules from the quantitative measure such as gene expression data. The weight coe±cients are computed from an optimization problem.21,22 Since each pair of nodes connected by an edge in a network module has di®erent strength of associations (computed as edge weights), and we are seeking a coe±cient vector that preserves network connectedness. This is obtained by minimizing the total weighted di®erence between coe±cients associated with nodes (Fig. 1, Step 2), which can be written in terms of graph 1450025-3

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by UNIVERSITY OF AUCKLAND LIBRARY - SERIALS UNIT on 02/09/15. For personal use only.

S. Gao et al.

Laplacian (see Sec. 2). After obtaining the coe±cient vector for the network marker, we can then compute the corresponding network predictor (Fig. 1, Step 3). The method e®ectively takes network proximity into consideration; therefore, derived network predictors are more reliable for plotting ROC curves. To demonstrate the method, here we evaluate published breast cancer biomarkers with four patient cohorts. Network markers are ¯rst grouped based on GO terms related to cancer hallmarks. We compare the predictive performance of each network marker group across gene expression datasets. We also evaluate the network predictor against the average method for the aforementioned feature aggregation. 2. Material and Methods 2.1. Learning network coe±cients Input: Suppose we have a collection of m networks that are indicative of certain cellular phenotype,  ¼ fA1 ; A1 ; . . . ; Am g, where Ap ; 1  p  m is the adjacency matrix of network p in  with Ap :¼ ½aij ¼ 1 if node i and j are connected, 0 otherwise. We have a gene expression dataset R :¼ ½rgs for which we want to evaluate the predictive power of network markers in , where ½rgs is the expression value of gene g in sample s, and denote the clinical variable (e.g. metastasis outcome) as o ¼ ðo1 ; o1 ; . . . ; ojsj Þ where jsj denotes the number of samples in R. Output: Let the coe±cient vector for network Ap with k nodes be c ¼ ðc1 ; c2 ; . . . ; ck Þ. Our goal is to derive c for Ap whose pair-wise magnitude preserves the neighborhood connectivity of Ap . To preserve the local connectivity by coe±cient vector c, the problem reduces to minimize

X

ðci  cj Þ 2 wij ;

ð1Þ

ij

where wij is the weight between node i and j in Ap . W :¼ ½wij refers to the weighted adjacency matrix for Ap 21; wij represents the weight (a similarity measure) between gene i and j if connected. We used the heat kernel to compute wij .21 Putting Eq. (1) in matrix form (see Ref. 21 for details), we get:

X

ðci  cj Þ 2 wij ¼ 2c T ðD  WÞc ¼ 2c T Lc;

ð2Þ

ij

P where D is the diagonal matrix with D :¼ ½dii ¼ j wij , L is the graph Laplacian L : ¼ D  W. The problem is then reduced to ¯nding: argmin c T Lc: c T Dc¼1

ð3Þ

The constraint c T Dc ¼ 1 removes the arbitrary scaling factor to the solution, which is given by the second smallest eigenvector of the generalized eigenvalue problem: Lc ¼ Dc: 1450025-4

Evaluating predictive performance of network biomarkers with network structures Table 1. GO terms searched and number of network markers obtained.

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by UNIVERSITY OF AUCKLAND LIBRARY - SERIALS UNIT on 02/09/15. For personal use only.

GO term Apoptosis Cell adhesion Cell cycle Immune response Phosphorylation Response to external stimulus Cell growth

GO ID

Number of network modules (gene groups) (p < 0:001)

GO:0006915 GO:0007155 GO:0007049 GO:0006955 GO:0016310 GO:0009605 GO:0016049

9 4 28 3 8 7 3

Coe±cient vector c represents the relative importance of nodes due to the network topology measured by wij , i.e. if two nodes are far apart in the network, wij incurs a heavy penalty from Eq. (1). After solving for c, we obtained the coe±cient vector which is subsequently used to weigh the gene expression levels of constituent genes. After re-weighing (Step 3 of Fig. 1), we obtained corresponding network predictors. 2.2. Breast cancer biomarkers Breast cancer biomarkers are retrieved from the Cell Circuits database (http://www. cellcircuits.org).23 We search Gene Ontology terms (p-value < 0:001) related to cancer hallmarks from Chuang et al.'s work,3,14 and collect network markers for each GO group.24 Totally, we obtained 62 network-based biomarkers from seven GO groups (Table 1). Gene symbols are mapped using UniProt ID Mapping (http:// www.uniprot.org/) (The UniProt Consortium) and DAVID (http://david.abcc. ncifcrf.gov/).25 2.3. Gene expression data preprocessing and normalization The utilized gene expression datasets are retrieved from the NCBI GEO database (http://www.ncbi.nlm.nih.gov/geo) with the accession ID GSE2034 (n ¼ 286),26 GSE1456 (n ¼ 159),27 and GSE6532 (n ¼ 327).28 All the three datasets use the HGU113A platform, we did so in order to avoid bias in cross-platform validation. Gene expression data were processed with the MAS5.0 algorithm, and subsequently log2 transformed and median-centered across samples. 3. Results 3.1. Nodal connectedness a®ects predictive performance We used DREAM5 (Dialogue for Reverse Engineering Assessments and Methods) gene expression datasets, described in detail in Ref. 29. The input data includes a compendium of 805 microarray experiments for E.coli, consisting of 4,511 genes (including 214 decoy genes). To see if the nodal connectedness a®ects the predictive performance against genetic perturbations, we used the gold standard benchmark provided by the DREAM5 challenge. The benchmark data includes experimentally 1450025-5

S. Gao et al.

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by UNIVERSITY OF AUCKLAND LIBRARY - SERIALS UNIT on 02/09/15. For personal use only.

Table 2. Predictive performance of two di®erent set of network modules.

Hub Set Random Random Random Random Random Random Random Random Random Random

Set Set Set Set Set Set Set Set Set Set

1 2 3 4 5 6 7 8 9 10

Average AUC

95% con¯dence interval

0.67 0.43 0.51 0.41 0.54 0.53 0.47 0.39 0.43 0.49 0.38

0.614–0.701 0.393–0.472 0.481–0.545 0.382–0.455 0.481–0.575 0.480–0.561 0.422–0.511 0.347–0.426 0.393–0.472 0.455–0.531 0.362–0.445

validated 2,066 transcriptional interactions retrieved from RegulonDB. We created two sets of network modules: The ¯rst set includes network modules with at least one hub gene and its immediate neighbors (hub set). The hub gene is identi¯ed as nodes with degrees greater than the average node degree of the benchmark network plus two standard deviations of the total node degree distribution. The immediate neighbors of a hub node i is de¯ned as the nodes with direct interactions with i in the gold-standard benchmark dataset. The size of the network is controlled by recursively following direct neighbors of a node until the size of subnetwork meet the prede¯ned size (on closest iteration), which is 6 in this experiment. The second set includes randomly selected network modules without any hub genes (random set). We collected 26 network modules from the ¯rst set with average network size 6 and we randomly selected the same number of nodes to form the second set. If a network module from the random set has size less than 6, we randomly add neighbors from one of the constituent genes. We compare the ROC curves for these two sets of network modules (Table 2). Hub set modules have higher average area under the curve (AUC) than random set modules, which indicates that the network topology a®ects the predictive performance. Here it is worth mentioning that it is possible to conduct further investigation in case there is a need to provide more alternatives (with details provided if possible) other than the most widely used simple averaging method. 3.2. Retro-perspective validation using Wang's data We use the retrieved network markers to predict metastatic and nonmetastatic samples in Wang's cohort where the network markers were derived from the work described in Ref. 3. We tested the predictive performance over the entire range of sensitivity and speci¯city values of network markers against Wang's dataset, and we compared the AUC with the average aggregation. In apoptosis (Fig. 2), cell growth, immune response, and response to external stimulus groups, our method reports better performance over the average method (6 out of 9, 2 out of 3, 3 out of 3, 7 out of 1450025-6

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by UNIVERSITY OF AUCKLAND LIBRARY - SERIALS UNIT on 02/09/15. For personal use only.

Evaluating predictive performance of network biomarkers with network structures

Fig. 2. ROC curves for the apoptosis GO term against Wang's data, 6 out of 9 markers show better predictive performance with our method (in grey), other groups show similar trend except cell cycle group.

7 network markers with higher AUC). Other GO groups (cell adhesion, phosphorylation) show similar performance for both methods. This suggests that by taking the network connectivity into account, predictive performance can be improved in classifying breast cancer metastasis. Interestingly, in the cell cycle group only 1 out of 28 markers shows higher AUC with our method. The implication is that edge connectedness in the network markers are not predictive of metastasis in general for this group of markers, because totally ignoring it (using average aggregation) leads to better classi¯cation performance. For the apoptosis markers shown in Fig. 2, 95% CIs show moderate overlapping between INS (average upper bound and lower bound are 0.463 and 0.561, respectively, binomial exact test) and simple average method (average upper bound and lower bound are 0.423 and 0.488, respectively, binomial exact test) for network-based markers with better performance using INS. Similar e®ects are observed in other GO groups. From the above study with E. coli data, the INS method shows consistent better performance. It is worth noting that our aim is not to propose a method that produces better predictive performance using existing network-based markers, as the 1450025-7

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by UNIVERSITY OF AUCKLAND LIBRARY - SERIALS UNIT on 02/09/15. For personal use only.

S. Gao et al.

way of identifying network-based markers di®ers, the predictive results are expected to di®er (this is similarly true when evaluating gene signatures: There is a big pool of gene signatures but very few of them produce consistent predictive performance.24 For example, using Chuang's data we observed that 6 out of 9 markers show better performance for the apoptosis group, this is likely due to the inconsistent predictive performance for individual network-based markers from Chuang's data. In fact, from Chuang's method, the network-based markers are derived from a greedy approach, which does not ¯nd the optimal solution in general.

Fig. 3. Area under the curve (AUC) for 62 network markers in three di®erent patient cohorts. 1450025-8

Evaluating predictive performance of network biomarkers with network structures

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by UNIVERSITY OF AUCKLAND LIBRARY - SERIALS UNIT on 02/09/15. For personal use only.

3.3. Cross validation with other gene expression datasets To cross check the predictive performance of network biomarkers, we compare the ROC curves for each GO group with Loi and Pawitan's cohort27,28 using network predictors. From Fig. 3, we did not observe unique trend for all GO groups. For example, in the Apoptosis group 6 out of 9 markers are more predictive in Wang's cohorts, whereas in the cell cycle group, Wang's cohort are not as predictive as the other two cohorts (6 out of 28 markers have higher AUC compared with Loi and Pawitan's cohort). The results show that network biomarkers are not consistently predictive across patient cohorts, facing the same dilemma of the gene set approach where most gene sets are not robust in predicting cellular phenotypes.24

4. Discussion and Conclusion The notion of network-based biomarkers di®ers from the traditional biomarker approaches (i.e. gene signatures) in that a group of connected genes or proteins such as pathways in a reference network (e.g. signaling transduction network or protein interaction network) is collectively identi¯ed as a predictive marker. The underlying reason for identifying this network-based markers (small network modules) is that biological functions and cellular processes are often carried out by various genetic and biochemical interactions in a concerted or disordered manner. Many works have focused on detecting or inferring network modules that are predictive of disease phenotypes, or prioritizing genes in background networks. However, these e®orts lack an evaluation method to gauge the predictive power of identi¯ed network modules when taking connected patterns of genes (i.e. network topology) into consideration. Simple method like average aggregation essentially overlooks the network-based principle: While interactions are sought to predict disease phenotypes, the connectedness of genes is not accounted for when measuring the predictive performance. For example, BRCA-1, a established familial breast tumor suppressor gene, interacts with many proteins in the interactome, making it a network \hub." When evaluating network modules that include BRCA-1, we need to consider its importance re°ected by the relative high node degree. In this context, the nodal connectedness is important when evaluating network modules. Although network approaches have become promising in predicting disease phenotypes like breast cancer recurrence, the way of making them predictive is problematic. On the one hand, most work have focused on obtaining the network modules and the argument that those are more predictive than the gene set approach; on the other hand, when evaluating the predictive performance of network markers, simple aggregation is employed and therefore network connectedness is totally ignored. Here, we o®er a simple approach in taking network connectedness into account when evaluating network biomarkers against clinical variables. Using this method, we showed that network markers are not consistently predictive when compared with the simple aggregation approach. The crucial problem is that 1450025-9

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by UNIVERSITY OF AUCKLAND LIBRARY - SERIALS UNIT on 02/09/15. For personal use only.

S. Gao et al.

methods that identify network markers generally do not include network connectedness as the factor when \scoring" subnetworks. Similar to most of the gene signatures, network markers do not show robust predictive performance across gene expression pro¯les in di®erent GO groups, making them nonrobust when predicting breast cancer metastasis. Our method depends on the connectedness aggregation, which echoes the goal of designing an evaluation method to incorporate the nodal connectedness into aggregation. Clearly, many e®orts can be sought for more complex aggregation schemas, such as hierarchal aggregation based on GO terms and nonlinear forms.30 Here, we assumed the linear relationship between a node and its neighbors in passing information31,32 and derived the linear reweighing. The conclusion that network-based markers are not consistently predictive across patient cohorts is not surprising. Given that most gene signatures are not robust in the presence of cellular complexity, experimental noise, and incomplete data collection, the added dimension, i.e. from single genes that forms gene signatures to interacting genes that form a much more complex network modules, further confounds predictive signals. For this reason, network-based personalized medicine is not clinically deployed, although the network-based thinking is pervasive. Our approach in evaluating network-based markers is an added line of evidence. With the aforementioned discussion, the conclusion is not solely due to the absence of nodal connectedness in evaluating network modules, but rather in identifying truly predictive network-based markers. Our focus in this paper is to evaluate network-based markers when considering the network topology. Our observations that network-based markers are not consistently predictive across patient cohorts re°ect the fact that network-based markers are far from being predictive in clinical studies.

References 1. Barabasi AL, Gulbahce N, Loscalzo J, Network medicine: A network-based approach to human disease, Nat Rev Genet 12(1):56–68, 2011. 2. Ideker T, Sharan R, Protein networks in disease, Genome Res 18(4):644–652, 2008. 3. Chuang HY, Lee E, Liu YT, Lee D, Ideker T, Network-based classi¯cation of breast cancer metastasis, Mol Syst Biol 3:140, 2007. 4. Liu X, Liu ZP, Zhao XM, Chen L, Identifying disease genes and module biomarkers by di®erential interactions, J Am Med Inform Assoc 19(2):241–248, 2012. 5. Garcia-Alonso L, Alonso R, Vidal E, Amadoz A, de Maria A, Minguez P, Medina I, Dopazo J, Discovering the hidden sub-network component in a ranked list of genes or proteins derived from genomic experiments, Nucleic Acids Res 40(20):e158, 2012. 6. Das J, Mohammed J, Yu H, Genome-scale analysis of interaction dynamics reveals organization of biological networks, Bioinformatics 28(14):1873–1878, 2012. 7. Ostlund G, Lindskog M, Sonnhammer EL, Network-based identi¯cation of novel cancer genes, Mol Cell Proteomics 9(4):648–655, 2010. 8. Leordeanu M, Hebert M, A spectral technique for correspondence problems using pairwise constraints, Int Conf Computer Vision (ICCV), pp. 1482–1489, 2005. 1450025-10

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by UNIVERSITY OF AUCKLAND LIBRARY - SERIALS UNIT on 02/09/15. For personal use only.

Evaluating predictive performance of network biomarkers with network structures

9. Zaslavskiy M, Bach F, Vert JP, Global alignment of protein-protein interaction networks by graph matching methods, Bioinformatics 25(12):i259–i267, 2009. 10. Chen L, Liu R, Liu ZP, Li M, Aihara K, Detecting early-warning signals for sudden deterioration of complex diseases by dynamical network biomarkers, Sci Rep 2:342, 2012. 11. Liu R, Wang X, Aihara K, Chen L, Early diagnosis of complex diseases by molecular biomarkers, network biomarkers, and dynamical network biomarkers, Med Res Rev 34(3):455–478, 2014. 12. Liu R, Yu X, Liu X, Xu D, Aihara K, Chen L, Identifying critical transitions of complex diseases based on a single sample, Bioinformatics 30(11):1579–1586, 2014. 13. Haibe-Kains B, Olsen C, Djebbari A, Bontempi G, Correll M, Bouton C, Quackenbush J, Predictive networks: A °exible, open source, web application for integration and analysis of human gene networks, Nucleic Acids Res 40(Database issue):D866–D875, 2012. 14. Hanahan D, Weinberg RA, The hallmarks of cancer, Cell 100(1):57–70, 2000. 15. van't Veer LJ, Dai H, van de Vijver MJ et al., Gene expression pro¯ling predicts clinical outcome of breast cancer, Nature 415(6871):530–536, 2002. 16. Gao S, Chen A, Rahmani A, Jarada T, Alhajj R, Demetrick D, Zeng J, MCF, A tool to ¯nd multi-scale community pro¯les in biological networks, Comput Methods Programs Biomed 112(3):665–672, 2013. 17. Gao S, Zeng J, ElSheikh AM, Naji G, Alhajj R, Rokne J, Demetrick D, A closer look at \social" boundary genes reveals knowledge to gene expression pro¯les, Curr Protein Pept Sci 12(7):602–613, 2011. 18. Cerami E, Gao J, Dogrusoz U et al., The cBio cancer genomics portal: An open platform for exploring multidimensional cancer genomics data, Cancer Discov 2(5):401–404, 2012. 19. Estrada E, Rodríguez-Velazquez JA, Subgraph centrality in complex networks, Phys Rev E 71(5):056103, 2005. 20. Guimera R, Nunes Amaral LA, Functional cartography of complex metabolic networks, Nature 433(7028):895–900, 2005. 21. Belkin M, Niyogi P, Laplacian eigenmaps and spectral techniques for embedding and clustering, Advances in Neural Information Processing Systems, Vol. 14, MIT Press, pp. 586–691, 2001. 22. Roweis ST, Saul LK, Nonlinear dimensionality reduction by locally linear embedding, Science 290(5500):2323–2326, 2000. 23. Mak HC, Daly M, Gruebel B, Ideker T, CellCircuits: A database of protein network models, Nucleic Acids Res 35(Database issue):D538–D545, 2007. 24. Li J, Lenferink AE, Deng Y, Collins C, Cui Q, Purisima EO, O'Connor-McCourt MD, Wang E, Identi¯cation of high-quality cancer prognostic markers and metastasis network modules, Nat Commun 1:34, 2010. 25. Huang da W, Sherman BT, Lempicki RA, Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists, Nucleic Acids Res 37(1):1–13, 2009. 26. Wang Y, Klijn JG, Zhang Y et al., Gene-expression pro¯les to predict distant metastasis of lymph-node-negative primary breast cancer, Lancet 365(9460):671–679, 2005. 27. Pawitan Y, Bjohle J, Amler L et al., Gene expression pro¯ling spares early breast cancer patients from adjuvant therapy: Derived and validated in two population-based cohorts, Breast Cancer Res 7(6):R953–R964, 2005. 28. Loi S, Haibe-Kains B, Desmedt C et al., De¯nition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade, J Clin Oncol 25(10):1239–1246, 2007.

1450025-11

S. Gao et al.

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by UNIVERSITY OF AUCKLAND LIBRARY - SERIALS UNIT on 02/09/15. For personal use only.

29. Newman M, Networks: An Introduction, Oxford University Press, 2010. 30. Yu X, Li G, Chen L, Prediction and early diagnosis of complex diseases by edge-network, Bioinformatics 30(6):852–859, 2014. 31. Belkin M, Niyogi P, Laplacian Eigenmaps for dimensionality reduction and data representation, Neural Comput 15(6):1373–1396, 2003. 32. Kuchaiev O, Rasajski M, Higham DJ, Przulj N, Geometric de-noising of protein-protein interaction networks, PLoS Comput Biol 5(8):e1000454, 2009.

Shang Gao is an Associate Professor of Computer Science at Jilin University, China. He received his B.Math degree with distinction from the University of Waterloo in 2006, and M.Sc. and Ph.D. in Computer Science from the University of Calgary in 2009 and 2014, respectively. He was a Vanier scholar of Canada. His research interests include complex networks, graph mining methods, and computational biology.

Ibrahim Karakira received his B.Sc. in Computer Science from Global University in Beirut, Lebanon in 2012. He is currently M.Sc. student in Computer Science at the University of Calgary. His research interests include database systems, data mining, and bioinformatics.

Salim Afra received his B.Sc. in Computer Science from Global University in Beirut, Lebanon in 2012. He is currently M.Sc. student in Computer Science at the University of Calgary. His research interests include database systems, data mining, and bioinformatics.

Ghada Naji attended in 1974 ¯rst year medicine and then in 1975 completed DEUGB in Biology at the University of Bordeaux II, France. Ghada continued her studies at Bordeaux II and received M.Sc. in Genetics in 1977, M.Sc. in biology in 1978 and Ph.D. in Biology in 1981. She specialized in \plant propagation in vitro" and \biology and pathology of crops". Since 1983, she is a faculty member in the Department of Biology at the Lebanese University, Tripoli, Lebanon. She contributed to a number of papers which have been published in international venues, including journals, edited books, and refereed conference proceedings. She is a cochair of the International Symposium on Network Enabled Health Informatics, Biomedicine, and Bioinformatics (HI-BI-BI 2012, 2013, and 2014). Her research interests include gene expression data analysis, protein drug interaction, gene regulatory networks, and disease biomarker detection.

1450025-12

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by UNIVERSITY OF AUCKLAND LIBRARY - SERIALS UNIT on 02/09/15. For personal use only.

Evaluating predictive performance of network biomarkers with network structures

Reda Alhajj received his B.Sc. degree in Computer Engineering in 1988 from the Middle East Technical University, Ankara, Turkey. After he completed his B.Sc. with distinction from METU, he was o®ered a full scholarship to join the graduate program in Computer Engineering and Information Sciences at Bilkent University in Ankara, where he received his M.Sc. and Ph.D. degrees in 1990 and 1993, respectively. Currently, he is Professor in the Department of Computer Science at the University of Calgary, Alberta, Canada. He is also a±liated with Global University in Beirut, Lebanon. He published over 375 papers in international journals and fully refereed conferences. He served on the program committee of several international conferences including IEEE ICDE, IEEE ICDM, IEEE IAT, SIAM DM, etc. He also served as guest editor of several special issues and as the Program Chair of IEEE IRI, ADMA, CaSoN, ASONAM, and OSINT-WM. He is editor in chief of the Social Network Analysis and Mining journal and Lecture Notes on Social Networks, Encyclopaedia of Social Network Analysis and Mining; he also serves on the editorial board of several journals. He frequently gives invited talks in North America, Europe, and the Middle East. Alhajj's leads a very productive research group of Ph.D. and M.Sc. students working primarily in the areas of biocomputing and biodata analysis, data mining, multiagent systems, schema integration and re-engineering, social network analysis, and XML. Most of his students hold competitive awards, including NSERC Vanier, NSERC CGS, PGS, Alberta Innovates, etc. He received Outstanding Achievements in Supervision Award from Faculty of Graduate Studies at the University of Calgary.

Jia Zeng received her Doctoral degree from the University of Calgary, Canada. Her Ph.D. dissertation was on the development and application of novel machine learning approaches and multi-agent systems to facilitate genome annotation. After her graduation, she joined Baylor College of Medicine in Houston, TX, USA as a CPRIT (Cancer Prevention and Research Institute of Texas) fellow and pursued interdisciplinary research in the area of cancer epigenetics. Zeng is currently a Senior Computational Scientist at the Institute for Personalized Cancer Therapy at MD Anderson Cancer Center. Her work focuses on developing information retrieval, data mining, and machine learning systems to facilitate the routine delivery of precision medicine in oncology.

Douglas Demetrick is a Professor in the Departments of Pathology and Laboratory Medicine; Oncology; Biochemistry and Molecular Biology; Medical Genetics at the University of Calgary. Demetrick completed his Ph.D. in Immunochemistry at the University of British Colombia and his medical degree at the University of Calgary. Following his residency in Anatomical Pathology at the University of Calgary, he was a visiting scientist at the Cold Spring Harbor Laboratory in New York. As a US and Canadian Board-certi¯ed Anatomical Pathologist, he is 1450025-13

S. Gao et al.

J. Bioinform. Comput. Biol. 2014.12. Downloaded from www.worldscientific.com by UNIVERSITY OF AUCKLAND LIBRARY - SERIALS UNIT on 02/09/15. For personal use only.

currently Director of the Molecular Pathology Laboratory at Calgary Laboratory Services, as well as Program Leader for Research and Development. As a full member of the Southern Alberta Cancer Research Institute, his research concerns the identi¯cation and characterization of novel biomarkers predictive of anti-cancer treatment response.

1450025-14

Evaluating predictive performance of network biomarkers with network structures.

Network is a powerful structure which reveals valuable characteristics of the underlying data. However, previous work on evaluating the predictive per...
363KB Sizes 1 Downloads 5 Views