201

Cancer Biomarkers 13 (2013) 201–213 DOI 10.3233/CBM-130363 IOS Press

Review

Systems biology of cancer biomarker detection Sanga Mitraa , Smarajit Dasb,∗ and Jayprokas Chakrabartia,c a

Indian Association for the Cultivation of Science, Calcutta, India Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden c Gyanxet, BF 286 Salt Lake, Calcutta, India b

Abstract. Cancer systems-biology is an ever-growing area of research due to explosion of data; how to mine these data and extract useful information is the problem. To have an insight on carcinogenesis one need to systematically mine several resources, such as databases, microarray and next-generation sequences. This review encompasses management and analysis of cancer data, databases construction and data deposition, whole transcriptome and genome comparison, analysing results from high throughput experiments to uncover cellular pathways and molecular interactions, and the design of effective algorithms to identify potential biomarkers. Recent technical advances such as ChIP-on-chip, ChIP-seq and RNA-seq can be applied to get epigenetic information transformed into a high-throughput endeavour to which systems biology and bioinformatics are making significant inroads. The data from ENCODE and GENCODE projects available through UCSC genome browser can be considered as benchmark for comparison and meta-analysis. A pipeline for integrating next generation sequencing data, microarray data, and putting them together with the existing database is discussed. The understanding of cancer genomics is changing the way we approach cancer diagnosis and treatment. To give a better understanding of utilizing available resources’ we have chosen oral cancer to show how and what kind of analysis can be done. This review is a computational genomic primer that provides a bird’s eye view of computational and bioinformatics’ tools currently available to perform integrated genomic and system biology analyses of several carcinoma. Keywords: Cancer, database, microarray, next generation sequencing, system biology

1. Introduction Cancer System Biology (CSB) aims to discover molecular mechanisms fundamental to cancer succession by viewing cancer as a complex biological system that is driven, in part, by impaired differentiation. The overarching goal of the review is to pave a pathway for better utilization and understanding of the available assets of cancer that will enable us to recognize molecular therapeutic targets and strategies to exterminate this ∗ Corresponding author: Smarajit Das, Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, University of Gothenburg, Gothenburg, SE-405 30, Sweden. E-mail: smarajit. [email protected].

disease, or at least, sustain it in a nonlethal state. We put forward the usage of database, microarray data and next generation sequencing data in cancer systems biology research with an emphasis on network biology and pathway analysis. A snapshot of available tools and software are also given. There are broadly 19 different cancer types that have been studied throughout the world. There are different bioinformatics studies have been executed for each type of cancer [1–5]. Here in this review we focused on bioinformatics’ studies of Oral Cancer only. Oral cancer being one of the prevalent cancers in Asian countries is chosen to further explain the need to look for existing databases and further what can be gained from it, how to statistically handle microarray and

c 2013 – IOS Press and the authors. All rights reserved ISSN 1574-0153/13/$27.50 

202

S. Mitra et al. / Systems biology of cancer biomarker detection

NGS data(s) and finally why we need a system biology approach to work with huge data sets produced from recent high tech methods. Oral cancer is a subtype of head and neck cancer; its causes are multifactoral, though tobacco usage is suspected to be the primary reason in about ∼90% of the cases [6–10]. High dosages of alcohol and infection by EBV, HIV and HPV are also suggested as important predisposing factors [11,12]. There are three most common precancerous lesions, namely leukoplakia, erythroplakia and oral submucosus fibrosis seen in the mouth [13, 14]. The next stage after the precancerous lesion is the cancerous lesion [15]. It normally starts from any of the precancerous lesion in the mouth. Histological changes i.e. changes within epithelial cells are observed on the verge of frank cancer (carcinoma “in situ”). It can be either exophytic or endophytic. The most common form of cancer is Oral squamous cell carcinoma (OSCC) [6]. Based on the site of occurrences (lip, tongue, gingival etc.) oral squamous cell carcinoma have several types, e.g. carcinoma of lip, tongue, buccal mucossa, gingiva, palate, floor of mouth, alveolus/jaw bones and maxillary antrum (sinus). When oral cancer becomes metastatic, cancer cells spread to distant organ/tissues through lymphatic blood vessels and affect salivary glands and organs such as lungs, brain etc. [9]. A wider understanding of oral cancer emerges from knowledge of expression profiles, proteomics information and, more recently, from ChIP-seq [Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq)], Whole Genome-Seq and RNA-Seq data. Since GENCODE intends to find out all possible gene features in human genome through multi-approach, it will be easier to detect more biomarkers for oral cancer in the near future. The annotation of long non-coding RNAs (lncRNA) by the GENCODE [Encyclopædia of genes and gene variants] consortium (http://www.gencodegenes.org/) within the framework of the ENCODE [Encyclopedia of DNA Elements] project (http://genome.ucsc.edu/ENCODE/) has opened a new avenue for study of their role in oral cancer. Databases have enriched this arena by making large-scale data available, as and when required, for investigation. The manifestation of high throughput (HTP) technologies that probe global DNA, RNA and protein expression has commenced a “systems biology” approach to the study of cancer that amalgamates experimental and computational methods when proposing and validating biological hypotheses.

1.1. Data availability and analysis As biology has increasingly turned into a data-rich science; the need for storing and communicating large datasets has grown. Some obvious examples of these are nucleotide and protein sequences and structural data because of their size and ubiquitous use in biology. Other types of data that are available in databases are metabolic pathways, gene expression data (microarrays), next generation sequencing data (NGS) and other types of data relating to genetic and epigenetic alterations, biological functions and processes.

2. Database Over the years, a large number of cancer-specific databases (e.g. Human Lung Cancer Database, Prostate Gene Database, Oral Cancer Gene Database and Breast Cancer Gene Database) have been published. Until now 19-cancer type have been mainly discussed in Oncomine [16] and TCGA [The cancer Genome Atlas] (http://cancergenome.nih.gov/), the well-known cancer Genome project. We tried to summarize the annotation of these 19 cancer types and tabulated them in Table 1. This table will give a snapshot of the available data of any type of cancer and from where they should be retrieved. We have also enlisted databases which are not cancer specific such as COSMIC, CanProVar and many others. Moreover, we can visualize that until date no databases have been constructed for lymphoma and myeloma. Therefore, this opens an avenue of a new work of database development. Now let us focus on the idea of analysis and giving biological insights using a database. To give an overview of what kind of analysis can be done with a database we concentrated on oral cancer database. Information regarding relevant gene and miRNA alteration in oral cancer is available in three databases: i. HNOCDB [17] ii. Oral Cancer Gene Database [18] iii. HeNeCan miRs [8] HNOCDB serves as a ‘one stop-shop’ database, where information regarding a particular gene/miRNA has been extracted from other databases and enriched with information from several additional analyses not obtainable from other repositories. Oral Cancer Gene Database provides information about genes related to oral cancer along with the reference. For almost all genes information on aliases, function, chromosomal location, mutations and SNPs, mRNA expression, pro-

S. Mitra et al. / Systems biology of cancer biomarker detection

tein information, pathways involved and interacting proteins, expression of genes in different tissues, and clinical correlates are also provided. From HeNeCan miRs particulars about miRNAs involved in oral cancer can be retrieved. From the database, we can deduce the chromosomal hotspots. Some loci are known to be altered in oral cancer, and from databases we can get an overall view. By mapping the genes on individual chromosome, we get chromosomal coordinates. Similarly, chromosomal hotspot for miRNAs can also be determined. From the alterations and from the known function of the genes, we can mine the biochemical pathways that are altered. To have a deeper knowledge about genetic integrity of oral cancer [19], instead of studying each gene individually we can study a gene set. A gene set is a predefined set of genes that may have similar locations or functions or form a particular pathway. Genomic alterations may add prognostic information and indicate biological aggressiveness thereby emphasizing the need for genome-wide profiling of oral cancers. While surveying literature, tumors from oral cancer manifested deletions involving 3p and 9p21 and gains involving 3q, 5q, 7p, 8q, 11q, and 20q. The identification of 3p and 9p loss in post-treatment lesions could serve as a simple and direct test for assessing risk of second oral malignancy (SOM) development [17]. There are different gene enrichment analysis tools for studying gene sets. Among them the most common web tool is GSEA (Gene set Enrichment analysis) from Broad Institute [20–23]. Oral cancer genes can be given as an input to the GSEA which is a gene search tool embedded in MSigDB database that further classifies the genes according to function. Gene enrichment scores can be calculated revealing the probable pathways associated with cancer phenotype. The pathway enrichment analysis, which we did for oral cancer and presented here in supplementary Table 1, has been done with KEGG Mapper. In this case we have taken oral cancer genes as input from our own database i.e. HNOCDB and then through KEGG Mapper we have classified them into pathways, listing only those which are maximum enriched for oral cancer. Among these about 60 genes were involved in several other forms of cancer. When gene sets are defined by biological pathways, the term gene set enrichment analysis and pathway enrichment analysis are interchangeable. Thus, bioinformatics methods, using the biological knowledge accumulated in databases make it possible to systematically dissect large gene lists and project the enriched and pertinent biology behind oral cancer.

203

However, to unravel further one needs to explore the expression profile data (microarray) and for a better insight, the high-throughput generated data.

3. Microarray Microarrays are capable of giving snapshot of the expression of thousands of genes in a single experiment. It can be used as a diagnostic profile for any disease. Microarray can show how RNA levels change during development, after exposure to stimulus, during cell cycle, etc. Thus it helps to look at the entire transcriptome at a go. Different statistical analysis are used further to derive any hypothesis or conclusion from microarray data [24]. In cancer biology microarray has tremendous application. In general, it can be used to see the genetic difference between cancer patients and normal patients, tumour classification and to check the treatment effect [25]. In case of oral cancer in various experiments, genes are detected that can be used as prognostic and predictive tools for cancer detection. Metastasis and apoptosis related genes are also ventured. Endeavour has been taken to check the effect of betel quid chewing, alcohol consumption and different treated cancer cells through microarray. Potential pathways involved in the changing state of oral leukoplakia to oral squamous cell carcinoma are also revealed. To gain further insight meta-analysis of several microarray data is required [26]. Single microarray data can only put forward a few sets of genes and pathways involved but analyzing several datasets will add more confidence in the result. Within group (difference in oral cancer subtypes) and in-between group (oral cancer subtypes compared with other head and neck subtypes) divergence and similarity can be corroborated by analyzing multiple microarray datasets. Moreover, we can look for the same biological pathways altered irrespective of the source. Meta-analysis at the pathway levels will add to another complexity of data analysis [27]. Oral Cancer microarray data can be availed from the following portals: – Oncomine [https://www.oncomine.org/resource/ login.html] – GEO (Gene Expression Omnibus) [http://www. ncbi.nlm.nih.gov/geo/] – TCGA (The Cancer Gene Atlas) [https://tcgadata.nci.nih.gov/tcga/] – Array express [http://www.ebi.ac.uk/arrayex press/]

Esophageal

Gastric

Head and neck

Kidney

Leukemia

Liver

Lung

Lymphoma

6

7

8

9

10

11

12

13



HLungDB – an integrated database of human lung cancer research. LCD – Lung Cancer Database LuGenD – Lung Cancer Gene Database

Liverome – Liver cancer-related gene signatures OncoDB. HCC – Oncogenic database of heptacellular carcinoma dbHCCvar – Heptacellular Carcinoma Genetic Variation Database HCCNet – An integrated network database of hepatocellular carcinoma

LGA – Leukemia Gene Atlas

LeGenD – Leukemia Gene Database

RCDB – Renal Cancer Gene Database

HNOCDB – Head, neck and oral cancer database Oral Cancer Gene Database OrCGDB – a database of genes involved in oral cancer HeNeCan miRs – A database of integrative information on Head and Neck Cancer associated microRNAs

GCDB – Gastric Cancer Database

DDEC – Dragon database of genes implicated in esophageal cancer

dbCPCO – a database of genetic markers tested for their predictive and prognostic value in colorectal cancer

http://tarmir.rgcb.res.in/henecan/

Biochim Biophys Acta. 2011 Aug;1816(1):67-72. PMID: 21549178

http://genetmed.fudan.edu.cn/dbHCCvar/ http://www.megabionet.org/hcc/ http://www.megabionet.org/bio/hlung/

Cell Res. 2010 Jun;20(6):732-4. PMID: 20479783 Nucleic Acids Res. 2010 Jan;38(Database issue):D665-9. PMID: 19900972



http://liverome.kobic.re.kr./ http://oncodb.hcc.ibms.sinica.edu.tw/index.htm

BMC Genomics. 2011 Nov 30;12 PMID: 22369201 Nucleic Acids Res. 2007 Jan;35(Databaseissue):D727-31 PMID: 17098932 Hum Mutat. 2011 Dec;32(12):E2308-16. PMID: 21936021

http://lungcancer.ibibiosolutions.com/ http://www.bioinformatics.org/LuGenD/

http://www.bioinformatics.org/legend/ legend.htm http://www.leukemia-gene-atlas.org/LGAtlas/

PLoS One. 2012;7(6):e39148. PMID: 22720055

http://www.juit.ac.in/attachments/jsr/rcdb/ homenew.html

http://gyanxet.com/hno.html http://www.actrec.gov.in/OCDB/index.htm http://www.tumor-gene.org/Oral/oral.html

BMC Res Notes. 2012 May 18;5:246. PMID: 22608002

http://www.gastric-cancer.site40.net/index.php

Int. J. Bioaqtomatio, 2012, 16(2), 129-134

http://apps.sanbi.ac.za/ddec/

Oral Oncol. 2012 Feb; 48(2):117-9. PMID: 22024348 Bioinformation. 2011 May 7;6(4):169-70. PMID: 21572887 Nucleic Acids Res. 2001 Jan 1;29(1):300-2. PMID: 11125119

BMC Cancer. 2009 Jul 6;9:219. PMID: 19580656

http://www.med.mun.ca/cpco

Colorectal

5

CCDB – Cervical Cancer Gene database Hum Mutat. 2010 Aug; 31(8):901-7. PMID: 20506273

Oncogene. 1999 Dec 23;18(56):7958-65. PMID: 10637506

Cervical

4

BCDB – A database for breast cancer research and information BCGD – The Breast Cancer Gene database http://crdd.osdd.net/raghava/ccdb/faq.php

http://mbcr.bcm.tmc.edu/ermb/bcgd/bcgd.html

Bioinformation. 2010 Jun 15;5(1):1-3. PMID: 21346869

Breast

3

Nucleic Acids Res. 2011 Jan;39 (Database issue):D975-9. PMID: 21045064

http://122.165.25.137/bioinfo/breastcancerdb/

Med Phys. 2012 Jun;39(6):3253-61. PMID: 22755708

http://bladder.nhri.org.tw/ http://packages.bic.mni.mcgill.ca./

J Formos Med Assoc. 2002 Feb;101(2):104-9. PMID: 12099200

MNI BITE database

Brain and CNS

2

URL

Reference

Srl. Cancer Database name No. name Cancer type specific databases 1 Bladder Bladder Cancer database

Table 1 List of cancer databases for 19 cancer types

204 S. Mitra et al. / Systems biology of cancer biomarker detection

DDPC – Dragon Database of Genes associated Nucleic Acids Res. 2011 Jan;39(Database issue): D980-5. PMID: 20880996 with Prostate Cancer CaPCLDB – Prostate Cancer Cell Lines Database

S-MED – Sarcoma microRNA expression database Lab Invest. 2010 May;90(5):753-61. PMID: 20212452 The Sarcoma Database

17 Pancreatic

18 Prostate

19 Sarcoma

miRCancer

CancerProView

RCGDB

CanGeneBase (CGB) Database on cancer related genes

6

7

8

The Roche Cancer Genome Database

http://crdd.osdd.net/raghava/cancerdr/

Bioinformation. 2009 Jul 27;3(10):422-4. PMID: 19759863

Hum Mutat. 2010 Apr;31(4):407-13. PMID: 20127971

http://bioinfo.au-kbc.org.in/cancerdb/

http://rcgdb.bioinf.uni-sb.de/MutomeWeb/

http://cancerproview.dmb.med.keio.ac.jp/php/ cpv.html

Bioinformatics. 2013 Mar 1;29(5):638-44. PMID: 23325619 http://mircancer.ecu.edu/

Sci Rep. 2013;3:1445. PMID: 23486013

Cancer related Gene/Protein and Disease Pathway Genomics. 2012 Aug;100(2):81-92. PMID: 22659240 Database

miRNA cancer association database

Cancer Drug Resistance database

CancerDR

5

http://www.cbioportal.org/public-portal/

4

http://bioinfo.vanderbilt.edu/canprovar/

Hum Mutat. 2010 Mar;31(3):219-28. PMID: 20052754

Human Cancer Proteome Variation Database

cBioPortal for Cancer Genomics provides visual- Cancer Discov. 2012 May;2(5):401-4. PMID: 22588877 ization, analysis and download of large-scale cancer genomics data sets

CanProVar

cBIO

2

3

http://cancer.sanger.ac.uk/cancergenome/ projects/cosmic/

Curr Protoc Hum Genet. 2008. PMID: 18428421

http://www.oncomir.umn.edu/ http://sarcomadatabase.org/

http://apps.sanbi.ac.za/ddpc/ or http://cbrc.kaust.edu.sa/ddpc/ http://capcelllines.ca/

http://pancreaticcancerdatabase.org/index.php http://www.pancreasexpression.org/ http://www.bioinformatics.org/pcgdb/

http://apps.sanbi.ac.za/ddoc/index.php

www.wmi.usyd.edu.au:8080/melanoma.html

URL

Non-cancer type specific databases 1 COSMIC Calogue Of Somatic Mutations In Cancer

BMC Genomics. 2007 Nov 28;8:439. PMID: 18045474

Pancreatic Cancer Database Pancreatic Expression database PC-GDB – Pancreatic Cancer Gene Database

16 Ovarian





DDOC – Dragon Database for Exploration of Ovarian Cancer Genes

15 Myeloma

Table 1, continued Database Reference name eMelanoBase – an online locus-specific variant Hum Mutat. 2003 Jan;21(1):2-7. PMID: 12497626 database for familial melanoma

Srl. Cancer No. name 14 Melanoma

S. Mitra et al. / Systems biology of cancer biomarker detection 205

206

S. Mitra et al. / Systems biology of cancer biomarker detection

3.1. Microarray data analysis Microarray analysis is a complex and rapidly growing field. Issues include normalization within and among arrays, limited replication of experiments and massive multiple testing. Hypothesis testing is a major requirement in microarray data analysis. For any test to be performed whether it is t-test, Wilcoxon test, permutation test and z-test, a certain level of significance is considered, conveniently taken as 5%. Since huge number of genes needs to be tested in this case problem of multiple testing increases increasing the rate of rejecting correct hypothesis. False positive and false negative occurrence is the major problem, which creeps in with multiple testing [28–32]. To minimize this error Bonferroni correction or a modified one by Holm’s may be applied on p-value but all these methods lead to over killing of false positive. Reducing stringency of above two methods Benjamini and Hochhberg designed False Discovery Rate (FDR) to control the proportion of false positives among the set of rejected hypotheses. A new method known as q value estimation prescribed by John Storey in the year 2002 has become the most popular one in the field of microarray analysis. Microarray data analysis can be broadly categorized into three types: Gene-by-gene – Statistical methods are applied for studying individual genes [33]. From individual gene study one may determine the prime player(s) in the disease concerned. I. Categorizing groups of genes – It reduces complexity of data and identifies relevant patient or gene groups. Mutational hotspot can be determined when cohort of genes is studied. It can be done via two ways: a. Class prediction (Supervised learning) – Supervised learning, also known as discrimination, is exploring a gene expression signature that predicts class (phenotype) membership. It has a predetermined aim of feature selection and pattern recognition. It can be used to study a specific signaling pathway. The main methods used for this are Support Vector machine (SVM), Artificial Neural Network (ANN), K-Nearest Neighbours (KNN), Self Organizing Maps (SOM), Diagonal Linear Discriminates (DLDA), Random Forest (RF), Naive Bayes [34].

b. Class Discovery (Unsupervised Learning) – It is the unbiased way of searching a biologically relevant unknown taxonomy identified by a gene expression signature or biologically relevant set of co-expressed genes. The gene expression data first needs to be arranged and filtered, distance should be measured and ultimately a clustering algorithm must be selected to cluster the data into related classes. Filtering eliminates the genes which are not expressed and those which have no variation across samples. It applies several statistical methods to do so. Then we need to quantify the similarity or dissimilarity of the genes being considered. Then after a clustering algorithm needs to be selected based on the specific requirement of the person analyzing data. Details of the clustering algorithms have been dealt earlier [35]. II. Deducing pattern by gene regulation – Networking and pathway analysis are used to deduce patterns from expression study. It refers to the computational approaches used to investigate network behavior as a system [36]. It is classified into two types: a. Topological/structural network analysis: It identified the global qualitative properties of the system. – Directed – One approach uses classical graph theory to identify various motifs in a pathway represented as a directed graph. A motif is a group of interacting entities capable of information processing that appears repeatedly. If the graph is signed, Boolean network analysis can be used to identify the semiquantitative features such as positive and negative feedback loops and minimal cut sets in the pathway [37,38]. Feedback loops strongly affect the behavior of the system. A minimum cut set of entities is the smallest group of entities that, when disrupted, affect the particular network behavior of interest. The identification of minimal cut sets aids the assessment of the robustness of a system. Motifs, feedback loops and minimal cut sets of a pathway connecting, e.g., a receptor and a TF that regulates many genes illustrate the global properties of the system.

S. Mitra et al. / Systems biology of cancer biomarker detection

207

Table 2 Pathway database and tools Databases 1. NetPath [62] 2. KEGG [63] 3. HPRD [64] 4. MINT [65] 5. Pathway Commons [66] 6. Pathway Interaction Database [67] 7. Reactome [68]

URL http://www.netpath.org/ http://www.genome.jp/kegg/ http://www.hprd.org/ http://mint.bio.uniroma2.it/mint/Welcome.do http://www.pathwaycommons.org/pc/ http://pid.nci.nih.gov http://www.reactome.org/ReactomeGWT/entrypoint.html

Tools and softwares 1. Cytoscape [69] 2. GenMAPP [70] 3. CellDesigner [71] 4. PUMA2 [72] 5. Onto-Express [73]

http://www.cytoscape.org/ http://www.genmapp.org/ http://www.celldesigner.org/ http://compbio.mcs.anl.gov/puma2 http://vortex.cs.wayne.edu/ontoexpress/

– Indirect/undirected – Probabilistic graphical models approaches such as Bayesian network analysis are used to analyze and learn about the cellular networks from quantitative experimental data and to infer indirect relationships. b. Dynamical analysis: A higher resolution mathematical modeling, elucidates the detailed local and certain global quantitative behaviors of the system. Dynamical analysis requires more information on the reaction parameters and initial conditions than topological approaches. – Deterministic model – It assumes the network of pathways as a processor unit. Based on the appropriate quantitative experimental measurements of key entities in a priori known network of pathways deterministic models can be used to predict the time-dependant cross-talk between pathways of the network under certain conditions. It describes average behaviors. – Stochastic model – It uses a probabilistic representation. Stochastic approaches are important when the absolute number of the reactant molecules in each cell is small. In this condition, the probabilistic nature of chemical reactions may affect system behavior and deterministic models may not be valid [39]. Pathway analysis can be done via several tools. Detail descriptions of several biological pathways are documented in many databases. Moreover, pathway analysis can also be done in R platform. Several pathway database and tools used for pathway mining from

gene expression profiling data are documented in Table 2. The methods of analyzing microarray data differ from one to another. Some method is superior in some special cases, while the others may have suitability in other cases. Based on this fact and the demand of the situation either of the methods need to be selected. Through microarray analysis of oral cancer, a total of 12 regulatory pathways were found to be significantly associated with oral cancer invasiveness, including the regulations of cell adhesion, immune response, metabolisms associated with steroid and lipid biosynthesis. Two functional pathways are most significant: the cell adhesion through extracellular matrix remodeling and MHC-class-I mediated antigen presentation. In the cell adhesion through extracellular matrix remodeling pathway, there were 13 differentially expressed genes involved. So further meta-analysis of several microarray datasets will verify the pathways altered in oral cancer. It will also put forward several new pathways [40]. Large scale data available in ENCODE will further enable complex network and pathway analysis.

4. Next generation sequencing Recently next-generation sequencing (NGS) technologies are challenging microarrays as the tool of choice for genome analysis [41]. The improved affordability of comprehensive sequence-based genomic analysis will facilitate novel questions to be put forward in many areas of biology. It is foreseeable that massively parallel sequencing platforms will go way ahead of the microarray for many applications, however, there are alcoves for microarrays to fill and fascinatingly we may very well witness a symbiotic relationship between microarrays and high-throughput se-

SNP analysis A. Softwares 1. Rehh 2. M(3) 3. Adegenet 1.3-1 4. Is-rSNP 5. SNP chip 6. ALOHOMORA 7. SNPsyn 8. GSA-SNP 9. SNP server B. Databases 1. SNP@Domain 2. SNPP 3. SNPxGE2

Methylation analysis A. Softwares 1. MEDIP-HMM 2. CpGassoc 3. BiQ Analyzer 4. FadE 5. CPG_MPs 6. QUMA 7. MeMO B. Databases 1. MethDB 2. PubMeth 3. MethyCancer 4. NGSmethDB 5. MethylomedB 6. DiseaseMeth

Software/Databases Copy number analysis A. Softwares 1. HD-CNV 2. CNV Ruler 3. CONTRA 4. Control-FREEC 5. CNAmet 6. DiNAMIC 7. CNAnova 8. CMDS 9. CNVdetector 10. CANGEM B. Databases 1. CNVVdb 2. CASNP

http://cran.r-project.org/web/packages/rehh/index.html http://bioinformatics.med.yale.edu/group http://adegenet.r-forge.r-project.org/ http://bioinformatics.research.nicta.com.au/software/is-rsnp/ www.bioconductor.org http://gmc.mdc-berlin.de/alohomora/ http://snpsyn.biolab.si/ http://gsa.muldas.org/ http://snpinfo.niehs.nih.gov http://sung.bio.cc/index.php/SNP@Domain http://bioinformatics.oxfordjournals.org/content/21/2/266.full http://lambchop.ads.uga.edu/snpxge2/

R package that detect genome wide SNP data from haplotype structure SNP calling algorithm for Illumina based array data Tool for analysis genomewide SNP data A novel technique for in silico regulatory SNP detection R classes and methods for SNP array data A tool for linkage analysis using SNP array data Detection and exploration of SNP SNP interaction A general approach for gene set analysis of polymorphism A real time SNP discovery tool

SNP with protein domain structure and sequence Automated large scale SNP genotype data management A database for human SNP co expression association

http://www.methdb.de/ http://www.pubmeth.org/ http://methycancer.psych.ac.cn/ http://bioinfo2.ugr.es/meth/NGSmethDB.php http://epigenomics.nyspi.org/methylomedb/ http://bioinfo.hrbmu.edu.cn/diseasemeth/

Public database for DNA methylation data A cancer methylation database by text mining and expert annotation The database for human DNA methylation and cancer A database for next generation sequencing single cytosine resolution DNA methylation data A database for DNA methylation profiling A human disease methylation database

http://cnvvdb.genomics.sinica.edu.tw/ http://cistrome.dfci.harvard.edu/CaSNP/

A database for copy number variation across vertebrate genome A database for integrating copy number alteration of cancer genome from SNP Array data

www.jstacs.de/index.php/MeDIP-HMM http://cran.r-project.org/web/packages/CpGassoc/index.html http://biq-analyzer.bioinf.mpi-inf.mpg.de/ http://code.google.com/p/fade/ http://202.97.205.78/CpG_MPs/ http://quma.cdb.riken.jp/ http://www.bioinfo.tsinghua.edu.cn/ tigerchen/memo.html

http://daleylab.org/lab/?page_id=125 www.ircgp.com/CNVRuler/index.html http://contra-cnv.sourceforge.net/ http://bioinfo-out.curie.fr/projects/freec/ http://csbi.ltdk.helsinki.fi/CNAmet/ http://www.bios.unc.edu/research/genomic_software/DiNAMIC http://www.compbio.group.cam.ac.uk/software/cnanova http://cmds.sourceforge.net/ http:www.csie.ntu.edu.tw/ kmchao/tools/CNVDetector/ http://www.cangem.org/

Hot Spot detection for Copy Number Variance A Copy Number variation based on case-control association analysis Copy Number Analysis for targeted resequencing Copy number analysis from NGS data R Package for copy number, methylation and expression data Identify copy number aberration in tumors Copy number abnormalities in cancer SNP microarray data Copy number aberration in cancer from high resolution data Locating copy number variation using CGH data Mining gene copy number changes in Cancer

Genome wide Identification of DNA methylation data from high density tiling array An R function for analysis of DNA methylation micro array data Visualizing and quality control of DNA methylation data from bisulfate sequencing Whole genome methylation analysis for multiple sequencing platforms Identification of CPG methylation pattern of genomic region for high throughput bisulfate sequencing data Quantification tool for methylation analysis Prediction for protein methylation modification

URL

Function

Table 3 Software’s and databases utilized for next-generation sequence analysis and data storage

208 S. Mitra et al. / Systems biology of cancer biomarker detection

S. Mitra et al. / Systems biology of cancer biomarker detection

quencing in near future. This in turn will allow us to better reveal the true genetic information present in an organism. Being able to augment the precision of this information will allow us to screen with other tools (such as RTqPCR or DNA microarrays) what the biological function of these sequences are [42]. NGS technology has been broadly used by analysis of Chip seq and RNA seq data. Table 3 describes the corresponding Softwares and Databases for different type of data generated by Next Gen Sequence analysis platform. 4.1. Chip-seq Detection of mutations in tumor cells is important for understanding the molecular pathogenesis of cancer. Alterations in DNA methylation in oral cancer are functionally important and clinically relevant [43, 44]. To check further impact of methylation on oral cancer Chromatin immunoprecipitation (ChIP) along with high-throughput DNA sequencing (ChIP-seq) is becoming a necessary step. A strong association has been observed between aberrant DNA methylation in adult tumors and polycomb group profiles in embryonic stem cells, cancer-associated genetic mutations in epigenetic regulators such as DNMT3A and TET family genes, and the discovery of altered 5hydroxymethylcytosine, a product of TET proteins acting on 5-methylcytosine, in human tumors with TET mutations [45,46]. The abundance and distribution of covalent histone modifications in primary cancer tissues relative to normal cells is an important but largely uncharted area, although there is good evidence for a mechanistic role of cancer-specific alterations in histone modifications in tumor etiology, drug response, and tumor progression. 4.2. Chip seq analysis ENCODE consortium provides the corresponding data for different cell lines through UCSC genome browser. Meanwhile, the discovery of new epigenetic marks continues, and there are many useful methods for epigenome analysis applicable to primary tumor samples, in addition to cancer cell lines. For DNA methylation and hydroxymethylation, next-generation sequencing allows increasingly inexpensive and quantitative whole-genome profiling. Similarly, the refinement and maturation of chromatin immunoprecipitation with next-generation sequencing (ChIP-seq) has made possible genome-wide mapping of histone mod-

209

ifications, open chromatin, and transcription factor binding sites. While doing ChIP-seq analyses the guidelines implemented by ENCODE needs to be followed. Computational tools have been developed apace with these epigenome methods to better enable accurate interpretation of the profiling data [47–50]. 4.3. RNA seq On the other hand, application of RNA sequencing technology has produced a transformation in cancer genomics, generating large data sets that can be analyzed in different ways to answer a multitude of questions about the genomic alterations associated with cancer [51]. Recently RNA sequencing technology has looked for strand-specific changes in oral cancer patient and reported large chromosomal regions of gain and loss during oral tumor development [52]. Analytical approaches can discover focused mutations such as substitutions and small insertion/deletions, large structural alterations and copy number events. As capacity to produce such data for multiple cancers are improving, so are the demands to analyze multiple tumor genomes simultaneously growing. As the repertoire of data grows to include mRNA-seq, non-coding RNA-seq and methylation for multiple genomes, our challenge will be to astutely combine data types and genomes producing a coherent picture of the genetic basis of cancer [53,54]. With increasing advancement of sequencing technology, ENCODE is determining standards of working principles. The Cancer Genome Atlas already provides 326 downloadable tumor samples for Head and Neck squamous cell carcinoma along with 384 normal samples. Though their target is 500 but Copy number analysis, Methylation, Gene Expression and miRNA expression have been recorded for a significant number of the tumor and control samples. 4.4. RNA seq analysis RNA-Seq was introduced as a new method to perform Gene Expression Analysis, using the advantages of the high throughput of Next-Generation Sequencing (NGS) machines [55]. The goal of this RNA seq method is to generate a count table for the selected genic features of interest, i.e. exons, transcripts, gene models, etc. At first, the data will be read in R using Bioconductor [56] ShortRead [57] Rsamtools and GenomicRanges packages. Then, annotation will be re-

210

S. Mitra et al. / Systems biology of cancer biomarker detection

Fig. 1. Pipeline for computational data analysis. This is schematic diagram showing how NGS data, microarray data and database can be integrated together to determine novel biomarker in oral cancer. (Colours are visible in the online version of the article; http://dx.doi.org/ 10.3233/ CBM-130363)

trieved using the biomaRt and Genome Intervals packages [58]. Finally the IRangesand GenomicFeatures packages will be used to define the reads coverage, and assign counts to genic features of interest. In the next step process can be simplified by the use of the easyRNASeq package [59] and how more advanced pre-processing can be performed, such as demultiplexing, RPKM, “correction” or normalization using the DESeq or edgeR packages. Finally, the count information will be exported as bed and wig formatted file, to be visualized into the UCSC genome browser or a standalone genome browser like IGB. First, the genomic and genic annotation will be retrieved from the selected/preferred source and converted into an appropriate object. In parallel, the sequenced reads’ information (e.g. chromosome, position, strand, etc.) will be retrieved from the alignment file and, as well, converted to a similar object. Then, the reads contained in the reads object are summarized per genic annotation

contained in the annotation object. This give rises to a count table that, finally, can be normalized using additional R packages. Huge data available in ENCODE will lead to development of several computational pipelines for analyzing these data for differential gene expression. Cutting across a broad swath of normal and disease cell lines, the giant project has created an unprecedented resource for cancer researchers. Cancer ultimately is a disease of the genome, where our research is interested in gene pathways, genetic changes, deletions, or duplications, all of these now have additional layers of information that can be brought to bear by ENCODE, literally by just dialing them up in genome browsers [60]. One fruitful area of study will be in gene deletions in cancer, where ENCODE shows that just about every one of these non-gene regions has a deletion in it, with maps that connect it to genes. That actually increases greatly the value of existing data that had been gener-

S. Mitra et al. / Systems biology of cancer biomarker detection

ated, such as cancer genomes.” This kind of data can be brought out not only in areas of gene regulation, but in a lot of alternative transcripts and other features of gene structures that are different in immortal cells versus normal cells [61]. Among many other promising examples, ENCODE data show that in cancer and other malignancies, about 25 different transcription factors are disrupted. A number of these transcription factors were known to have roles in cancer, but other ones were not. Also, they are connected in ways that suggest that there are certain regulatory genetic backgrounds that may make individuals more susceptible not only to individual kinds of cancers but perhaps to malignancies generically [60]. 4.5. MicroArray vs NGS Microarray gene expression studies are useful for generating preliminary data. Of late, the practical feedback from grant review panels is that if funding is going to be spent generating expression data, why not get the in situ pictures of transcriptomics instead of merely comparisons using measurements from sets of probes that we as a default must agree to call a ‘transcript’? The generalized opinion is that high-quality results can be acquired using either type of platform. With microarrays, we use internal reproducibility of the differential expression inference as a method to choose among the many types of transformation/normalization (i.e., our Efficiency Analysis informs us which methods to choose). With NGS, one must also be sure to correct for sequence length in their use of coverage as a proxy measure for expression. Although on the surface it may appear simple, unless one uses internal standards, or employs a reference study design, comparisons among different transcripts is especially challenging; ’coverage’ actually a complex function of the effects of the local sequence characteristics on amplification and sequencing efficiency. Even as genome sequences show variation in coverage due to local sequence effects, the transcriptome will, as well. Overcoverage is also a risk with NGS; one can mistakenly reduce differences by maxing out the coverage for some parts of the transcriptome. In our methodological research, peoples are working studying the problem of finding out how much of the available coverage one might use to maximize the reproducibility of the inference of differential expression. For a methodologist, there are many areas for formal methodological development and research for optimization of data representation of NGS based transcriptomics with many open problems worth looking into.

211

In recent times, new technologies, viz. NGS and microarray, are being increasingly used in the study of cancer biology. Collectively taking the support from microarray data and NGS data and determining the overlap between the two will add on to the confidence of data generation. In this review, we suggest the usage of NGS, microarray and existing databases in combination to determine novel biomarkers in any cancer, including oral cancer, and for that a schematic outline is given in Fig. 1.

5. Conclusion Functional genomics (transcriptomics) analysis is needed to understand genome function on a global scale – Gene regulation shapes cellular function. To better understand gene-regulation genome-wide, peoples commonly detect the openness and specific stage of the chromatin (histone modification ChIP-seq), try to find transcription factor bound sites (TF ChIP-seq), find co-regulators bound to transcription factors (ChIPseq and other techniques), search active enhancers (ChIP-seq, GRO-seq, RNA-seq)and finally determine transcript specific gene expression levels (GRO-seq, RNA-seq). Cancer visualized through database and microarray analysis aim to investigate possible opportunities for the development of targeted therapies. We have carried out explicit and implicit biomedical knowledge from publicly available gene and text databases to create a gene-to-miRNA, gene-to-gene co-citation network for oral cancer genes by automated analysis of titles and abstracts in over 10 million MEDLINE records. The associations between genes have been annotated by linking genes to terms from the gene ontology (GO) database. We validated the extracted networks by large-scale data analysis showing that cooccurrence reflects biologically meaningful relationships, thus providing an approach to extract and structure known biology. We validated the applicability of the method by combining gene set enrichment analysis and pathway analysis from both database and microarray and may be from next generation sequencing in the near future we can get a much wider and deeper view of oral cancer. This is broadly applicable to many other disease studies.

References [1]

M. Wang, H. Chu, P. Li, L. Yuan, G. Fu, L. Ma, D. Shi, D. Zhong, N. Tong, C. Qin, C. Yin and Z. Zhang, Genetic vari-

212

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9] [10]

[11] [12]

[13] [14]

[15]

[16]

[17]

[18]

[19] [20]

[21]

S. Mitra et al. / Systems biology of cancer biomarker detection ants in miRNAs predict bladder cancer risk and recurrence, Cancer Res 72 (2012), 6173-6182. H. Haeberle, J.T. Dudley, J.T. Liu, A.J. Butte and C.H. Contag, Identification of cell surface targets through metaanalysis of microarray data, Neoplasia 14 (2012), 666-669. L. Fu-Jun, J. Shao-Hua and S. Xiao-Fang, Differential proteomic analysis of pathway biomarkers in human breast cancer by integrated bioinformatics, Oncol Lett 4 (2012), 10971103. J.C. Higareda-Almaraz, R. Enríquez-Gasca Mdel, M. Hernández-Ortiz, O. Resendis-Antonio and S. EncarnaciónGuevara, Proteomic patterns of cervical cancer cell lines, a network perspective, BMC Syst Biol 5 (2011), 96. P. Liu, X. Wang, C.H. Hu and T.H. Hu, Bioinformatics analysis with graph-based clustering to detect gastric cancer-related pathways, Genet Mol Res 11 (2012), 3497-3504. C.R. Leemans, B.J.M. Braakhuis and R.H. Brakenhoff, The molecular biology of head and neck cancer, Nat Rev Cancer 11 (2011), 9-22. R. Byakodi, S. Byakodi, S. Hiremath, J. Byakodi, S. Adaki, K. Marathe and P. Mahind, Oral cancer in India: an epidemiologic and clinical review, J Community Health 37 (2012), 316-319. J.M. Babu, R. Prathiba, V.S. Jijith, R. Hariharan and M.R. Pillai, A miR-centric view of head and neck cancers, Biochim Biophys Acta 1816 (2011), 67-72. D.M. Walker, G. Boey and L.A. McDonald, The pathology of oral cancer, Pathology 35 (2003), 376-383. S. Warnakulasuriya, G. Sutherland and C. Scully, Tobacco, oral cancer, and treatment of dependence, Oral Oncol 41 (2005), 244-260. C. Steele and E.J. Shillitoe, Viruses and oral cancer, Crit Rev Oral Biol Med 2 (1991), 153-175. J. Reidy, E. McHugh and L.F. Stassen, A review of the relationship between alcohol and oral cancer, Surgeon 9 (2011), 278-283. N.L. Rhodus, Oral cancer: leukoplakia and squamous cell carcinoma, Dent Clin North Am 49 (2005), 143-165. A. Villa, C. Villa and S. Abati, Oral cancer and oral erythroplakia: an update and implication for clinicians, Aust Dent J 56 (2011), 253-256. D.R. Nair, R. Pruthy, U. Pawar and P. Chaturvedi, Oral cancer: Premalignant conditions and screening – an update, J Cancer Res Ther 8(Suppl 1) (2012), S57-S66. D.R. Rhodes, J. Yu, K. Shanker, N. Deshpande, R. Varambally, D. Ghosh, T. Barrette, A. Pandey and A.M. Chinnaiyan, ONCOMINE: a cancer microarray database and integrated data-mining platform, Neoplasia 6 (2004), 1-6. S. Mitra, S. Das, S. Das, S. Ghosal and J. Chakrabarti, HNOCDB: a comprehensive database of genes and miRNAs relevant to head and neck and oral cancer, Oral Oncol 48 (2012), 117-119. N.S. Gadewal and S.M. Zingde, Database and interactome map of genes involved in oral cancer, Online J Bioinformatics 8 (2007), 41-44. A. Saiz-Rodriguez, Molecular basis of oral cancer, Med Oral 6 (2001), 342-349. A. Subramanian, P. Tamayo, V.K. Mootha, S. Mukherjee, B.L. Ebert, M.A. Gillette, A. Paulovich, S.L. Pomeroy, T.R. Golub, E.S. Lander and J.P. Mesirov, Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles, Proc Natl Acad Sci USA 25 (2005), 1554515550. V. Saxena, D. Orgill and I. Kohane, Absolute enrichment:

[22]

[23]

[24]

[25]

[26]

[27] [28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37] [38]

[39]

[40]

gene set enrichment analysis for homeostatic systems, Nucleic Acids Res 34 (2006), e151. Subramanian, H. Kuehn, J. Gould, P. Tamayo and J.P. Mesirov, GSEA-P: a desktop application for Gene Set Enrichment Analysis, Bioinformatics 23 (2007), 3251-3253. C. Backes, A. Keller, J. Kuentzer, B. Kneissl, N. Comtesse, Y.A. Elnakady, R. Müller, E. Meese and H.P. Lenhof, GeneTrail – advanced gene set enrichment analysis, Nucleic Acids Res 35 (2007), W186-W192. A. Watson, A. Mazumder, M. Stewart and S. Balasubramanian, Technology for microarray analysis of gene expression, Curr Opin Biotechnol 9 (1998), 609-614. M.L. Broadhead, J.C. Clark, C.R. Dass and P.F. Choong, Microarray: an instrument for cancer surgeons of the future? ANZ J Surg 80 (2010), 531-536. G.C. Tseng, D. Ghosh and E. Feingold, Comprehensive literature review and statistical considerations for microarray metaanalysis, Nucleic Acids Res 40 (2012), 3785-3799. J. Quackenbush, Computational analysis of microarray data, Nat Rev Genet 2 (2001), 418-427. Y. Benjamini, Y. Hochberg, Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal Statistical Society B 57 (1995), 289-300. S. Dudoit, Y.H. Yang, M.J. Callow and T.P. Speed, Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments, Statistica Sinica 12 (2002), 111-139. J.D. Storey and R. Tibshirani, Estimating false discovery rates under dependence, with applications to DNA microarrays, (2001), Preprint, http://www.stat.berkeley.edu/storey/. V.G. Tusher et al. Significance analysis of microarrays applied to the ionizing radiation response, Proc Natl Acad Sci USA 98 (2001), 5116-5121. P.H. Westfall and S.S. Young, Resampling-based multiple testing: examples and methods for p-value adjustment, (1993). Wiley. K. Owzar, W.T. Barry and S.H. Jung, Statistical considerations for analysis of microarray experiments, Clin Transl Sci 4 (2011), 466-477. Q. Liu, A.H. Sung, Z. Chen, J. Liu, L. Chen, M. Qiao, Z. Wang, X. Huang and Y. Deng, Gene selection and classification for cancer microarray data based on machine learning and similarity measures, BMC Genomics 12(Suppl 5) (2011), S1. J. Rahnenführer, Clustering algorithms and other exploratory methods for microarray data analysis, Methods Inf Med 44 (2005), 444-448. T.M. Cheng, S. Gulati, R. Agius and P.A. Bates, Understanding cancer mechanisms through network dynamics, Brief Funct Genomics 11(6) (2012), 543-560. R. Albert, Boolean Modeling of Genetic Regulatory Networks Pennsylvania State University, University Park. A. Lahdesmaki, On learning gene regulatory networks under the boolean network model, Machine Learning 52 (2003), 147-167. M.K. Kerr, M. Martin and G.A. Churchill, Analysis of variance for gene expression microarray data, Journal of Computational biology 7 (2006), 819-837. C.J. Kang, Y.J. Chen, C.T. Liao, H.M. Wang, J.T. Chang, C.Y. Lin, L.Y. Lee, T.H. Wang, T.C. Yen, C.R. Shen, I.H. Chen, C.C. Chiu and A.J. Cheng, Transcriptome profiling and network pathway analysis of genes associated with invasive phenotype in oral cancer, Cancer Lett 284 (2009), 131-140.

S. Mitra et al. / Systems biology of cancer biomarker detection [41] [42]

[43]

[44]

[45]

[46] [47]

[48]

[49]

[50]

[51]

[52] [53]

[54]

[55] [56]

[57]

J. Shendure, The beginning of the end for microarrays? Nat Methods 5 (2008), 585-587. J. Koshy, Y.W. Qian, G. Bhagwath, M. Willis, T.W. Kelley and P. Papenhausen, Microarray, gene sequencing, and reverse transcriptase-polymerase chain reaction analyses of a cryptic PML-RARA translocation, Cancer Genet 205 (2012), 537540. R. Díez-Pérez, J. Campo-Trapero, J. Cano-Sánchez, M. López-Durán, M.A. Gonzalez-Moles, J. Bascones-Ilundain and A. Bascones-Martinez, Methylation in oral cancer and pre-cancerous lesions, Oncol Rep 25 (2011), 1203-1209. R. Guerrero-Preston, A. Báez, A. Blanco, M. Berdasco, M. Fraga and M. Esteller, Global DNA methylation: a common early event in oral cancer cases with exposure to environmental carcinogens or viral agents, P R Health Sci J 28 (2009), 24-29. D.R. Bentley, S. Balasubramania, H.P. Swerdlow et al., Accurate whole human genome sequencing using reversible terminator chemistry, Nature 456 (2008), 53-59. G. Salbert and M. Weber, Tracking genomic hydroxymethylation by the base, Nat Methods 9 (2011), 45-46. D.C. Koboldt, K. Chen, T. Wylie, D.E. Larson, M.D. McLellan, E.R. Mardis, G.M. Weinstock, R.K. Wilson and L. Ding, VarScan: variant detection in massively parallel sequencing of individual and pooled samples, Bioinformatics 25 (2009), 2283-2285. Y. Shen, Z. Wan, C. Coarfa, R. Drabek, L. Chen, E.A. Ostrowski, Y. Liu, G.M. Weinstock, D.A. Wheeler, R.A. Gibbs and F. Yu, A SNP discovery method to assess variant allele probability from next-generation resequencing data, Genome Res 20 (2010), 273-280. H. Li, B. Handsaker, A. Wysoker et al., The sequence alignment/map format and SAMtools, Bioinformatics 25 (2009), 2078-2079. R. Li, Y. Li, K. Kristiansen and J. Wang, SOAP: short oligonucleotide alignment program, Bioinformatics 24 (2008), 713714. H. Edgren, A. Murumagi, S. Kangaspeska, D. Nicorici, V. Hongisto, K. Kleivi, I.H. Rye, S. Nyberg, M. Wolf, A.L. Borresen-Dale and O. Kallioniemi, Identification of fusion genes in breast cancer by paired-end RNA-sequencing, Genome Biol 12 (2011), R6. RNA sequencing method may assist in identifying oral cancer, J Am Dent Assoc 141 (2010), 506. D.W. Ho, Z.F. Yang, K. Yi, C.T. Lam, M.N. Ng, W.C. Yu, J. Lau, T. Wan, X. Wang, Z. Yan, H. Liu, Y. Zhang and S.T. Fan, Gene expression profiling of liver cancer stem cells by RNA-sequencing, PLoS One 7 (2012), e37159. A. Fenner, Prostate cancer: next-generation RNA sequencing identifies gene signature of neuroendocrine differentiation in prostate tumors, Nat Rev Urol 9 (2011), 8. A. Mortazavi, Mapping and quantifying mammalian transcriptomes by rna-seq, Nature Methods, 5 (2008), 621-628. R.C. Gentleman et al., Bioconductor: open software development for computational biology and bioinformatics, Genome Biology 11 (2010), R80. M. Morgan, S. Anders, M. Lawrence, P. Aboyoun, H. Pages

213

and R. Gentleman, Shortread: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data, Bioinformatics 25 (2009), 2607-2608. [58] S. Durinck, Y. Moreau, A. Kasprzyk, S. Davis, B. De Moor, A. Brazma and W. Huber, Biomart and bioconductor: a powerful link between biological databases and microarray data analysis, Bioinformatics 21 (2005), 3439-3440. [59] N. Delhomme, I. Padioleau, E.E. Furlong and L. Steinmetz, Easyrnaseq: a bioconductor package for processing rna-seq data, Bioinformatics 28 (2012), 2532-2533. [60] Decoding ENCODE for Cancer, Cancer Discov 2 (2012), [PubMed – in process]. [61] N.S. Blow, Lessons from ENCODE, Biotechniques 53 (2012), 203. [62] K. Kandasamy, S.S. Mohan et al., NetPath: A public resource of curated signal transduction pathways, Genome Biology 11 (2010), R3. [63] M. Kanehisa and S. Goto, KEGG: kyoto encyclopedia of genes and genomes. Nucl Acids Res 28 (2009), 27-30. [64] S. Peri, J.D. Navarro et al., Human protein reference database as a discovery resource for proteomics, Nucl Acids Res 32 (2004), D497-D501. [65] A. Ceol, A. Chatr Aryamontri, L. Licata, D. Peluso, L. Briganti, L. Perfetto, L. Castagnoli and G. Cesareni, MINT, the molecular interaction database: 2009 update, Nucl Acids Res 38 (2010), D532-539. [66] E.G. Cerami, B.E. Gross, E. Demir, I. Rodchenkov, O. Babur, N. Anwar, N. Schultz, G.D. Bader and C. Sander, Pathway Commons, a web resource for biological pathway data, Nucl Acids Res 39 (2011), D685-D690. [67] C.F. Schaefer, K. Anthony, S. Krupa, J. Buchoff, M. Day, T. Hannay and K.H. Buetow, PID: the Pathway Interaction Database, Nucl Acids Res 37 (2009), D674-D679. [68] I. Vastrik, P. D’Eustachio, E. Schmidt, G. Joshi-Tope, G. Gopinath, D. Croft de, B. Bono, M. Gillespie, B. Jassal, S. Lewis, L. Matthews, G. Wu, E. Birney and L. Stein, Reactome: a knowledge base of biologic pathways and processes, Genome Biol 8 (2007), R39. [69] M.E. Smoot, K. Ono, J. Ruscheinski, P.L. Wang and T. Ideker, Cytoscape 2.8: new features for data integration and network visualization, Bioinformatics 27 (2011), 431-432. [70] K.D. Dahlquist, N. Salomonis, K. Vranizan, S.C. Lawlor and B.R. Conklin, GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways, Nat Genet 31 (2002), 19-20. [71] A. Funahashi, N. Tanimura, M. Morohashi and H. Kitano, CellDesigner: a process diagram editor for gene-regulatory and biochemical networks, BIOSILICO 1 (2003), 159-162. [72] N. Maltsev, E. Glass, D. Sulakhe, A. Rodriguez, M.H. Syed, T. Bompada, Y. Zhang and M. D’Souza, PUMA2 – gridbased high-throughput analysis of genomes and metabolic pathways, Nucl Acids Res 34 (2006), D369-D372. [73] S. Draghici, P. Khatri, P. Bhavsar, A. Shah, S. Krawetz and M.A. Tainsky, Onto-Tools, The toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and OntoTranslate, Nucl Acids Res 31 (2003), 3775-3781.

Copyright of Cancer Biomarkers is the property of IOS Press and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

Systems biology of cancer biomarker detection.

Cancer systems-biology is an ever-growing area of research due to explosion of data; how to mine these data and extract useful information is the prob...
249KB Sizes 0 Downloads 0 Views