Metabolomic pathway visualization tool outsourcing editing function.

Metabolomic Pathway Visualization Tool Outsourcing Editing Function Masahiro Sugimoto, Member, IEEE 

Abstract— Recent rapid improvements of measuring instrument enables us to perform various omics studies to simultaneous profile multiple molecules, which provides a holistic view of various molecular interactions, such as signal transaction, protein interactions, and metabolic pathways. Metabolomics is recently emerged omics that can identify and quantify low weight metabolites usually defined as organic molecules whose size is < 1500 Da. In comparison to the other omics, the development of software tools to deal with metabolomic data is not matured. Conventional pathway drawing and visualization tool provide tool-specific unique functions, however, such user interface requires users to learn the usage and prevention for the use of these tools. Here, we developed a more generic pathway visualization tool. This tool incorporate pathway data yielded by common drawing tools, e.g. MS PowerPoint, and visualize the quantified values on the pathways. The statistical results also can be overlaid on each metabolite. The developed tools facilitate the interpreting metabolomic data in pathway forms.

I. INTRODUCTION High-throughput molecular profiling techniques of living systems have been developed. These techniques are called omics analyses, including sequencing of DNA (genomics), quantifying gene expressions (transcriptomics), detecting protein interactions (proteomics), and profiling metabolic pathways (metabolomics). We developed analytical protocols of metabolomics and performed various application studies. Since both protocol and instrument become more powerful and enable us to detect a wide range of molecules, recent data include more comprehensive information, i.e. the metabolomics data become a kind of “big data”. Therefore, bioinformatics is a key technology to efficiently analyze and interpret such as biological datasets. Here, we introduce recent bioinformatics technologies, especially in the data visualization, for metabolomics data. Analytical techniques of metabolomics Same as the other omics, the ultimate goal of metabolomics is to realize comprehensive quantification of all metabolites in a single measurement. Usually, separation system and detection instrument are coincidentally used for the analyses. However, due to wide variety of molecular chemical features, no single method realizes a simultaneous separation of all metabolites in a single measurement. The Coverage and sensitivity of detector are considered as a trade-off relationship. Usually we have to select an optimized method or combine multiple techniques to quantify the target molecules. *Research supported by grants from Yamagata Pref. and Tsuruoka City. Masahiro Sugimoto, is with Advanced Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata, Japan, (phone: +81-235-29-0528; fax:+81-235-29-0574; e-mail: [email protected]).

978-1-4244-9270-1/15/$31.00 ©2015 IEEE

Nowadays, nuclear magnetic resonance (NMR) and mass spectrometry (MS) are major methods used in metabolomics. NMR enables non-invasive analyses and provides a benefit that we are not required to destroy the given samples; however, due to low sensitivity, only a few kinds and higher concentration metabolites can be measured. MS shows higher sensitivity; however MS alone cannot differentiate the metabolites whose weights are same, e.g. isomers, such as leucine and isoleucine in amino acids. Therefore, separation systems, such as gas chromatography (GC), liquid chromatography (LC), and capillary electrophoresis (CE), are commonly used prior to the MS, named hyphened-MS technologies. Data processing of metabolomics raw data Data processing starts with converting vender-specific data to general ones, eliminating noise, detecting peaks, integrating peaks, alignment of multiple datasets, and interpreting the data. These data processing are common in hyphened-MS data and various tools for LC-MS and GC-MS are already available [1]. In contrast, software package that can deal with CE-MS is few. Two CE-MS data-specific problems are considerable for this reason. (1) CE-MS consumes a smaller amount of samples are consumed for each measurement, which is beneficial when the variable samples are dealt with. However, the measurement with small amount of sample leads lower signal-noise (S/N) ratio of observed peaks. (2) CE-MS employs unique separation mechanism compared to LC-MS and GC-MS, skewed and complex peak shape are observed in CE-MS data, in contrast to Gaussian-curve like shape in LC-MS and GC-MS data. In addition, large and nonlinear drift of migration times of CE-MS prevents data alignment, which makes it difficult to compare multiple samples. Therefore, we developed several software optimized data processing for CE-MS data, including JDAMP [2] and MasterHands [3, 4]. One bottleneck issue of interpreting metabolomic data is a number of unknown peaks are usually included in the measured data when we conduct non-targeted analysis. Therefore, we also developed prediction of these peaks without matching metabolite data [5, 6]. Upon the processed data matrix (sample  metabolites) including quantified values is available, these data will be transferred to the second analyses, such as statistical analysis, pathway analysis, and visualization. Pathway analysis of processed data Metabolomic data are usually interpreted using pathway maps. Therefore, the database including various metabolic pathways with metabolites and related to factors, such as metabolic enzymes, transporters and regulators are important. In this field, there are three kinds of the data, including (1) entities (e.g. metabolite, enzyme, and co-factors), (2)

7659

relationship between entities (e.g. metabolic pathways including their regulatory entities), and (3) quantified values of entities (e.g. a set of metabolomic concentration measured experimentally). Kyoto Encyclopedia of Gene and Genomes (KEGG) [7, 8] is a commonly used large database, including the data categorized into (1) and (2). MetaCyc [9, 10] also included the data of (1) and (2) for various species. HMDB [4, 11] is a collection of metabolite entities, categorized into (1) and each entity are fully connected to Small Molecule Pathway Database (SMPDB) [12, 13]. All of the data in the database have been retrieved by literature and, therefore, manually curated. These systems, especially KEGG, has various interfaces, such as application programmable interface (API) and simple object access protocol (SOAP)[14], user can access the data in the database, and various programs using this interface have been developed, such as R language interface [15], IPAVS [16], UniPathway [17], Pathway Projector [18], and hiPathDB [19]. Species-specific system using KEGG and similar database have been also developed, e.g. GreenPhyIDB for plant [20] and AtPID for Arabidopsis thaliana [21]. Most of these data are simply visualize a part of pathway or overlaid view of both of pathway and users quantified data. Pathway analysis is one of common analytical methods used in quantitative omics data. Ingenuity Pathway (www.ingenuity.com) is the most popular tool to analyze microarray data for analyzing gene expression. This algorithm ranks each pathway based on the enrichment (or different expression) of gene sets belonging to each pathway. This idea was implemented in Gene Set Enrichment Analysis (GSEA) [22, 23]. The analytical results are available based on different levels (abstract or concrete) of ontology of gene functions. The similar algorithm was developed for metabolomics data in Metabolite Set Enrichment Analysis (MSEA) [24], available as a part of integrated metabolome analysis environment, named MetaboAnalyst [25-28]. The ranked pathways are fully connected SMPDB and HMDB in pathways and entities levels, respectively. MetaCore (Thomson Reuters, Philadelphia, PA) [29, 30] performs pathway analyses and can deal with microarray data as well as other quantitative omics data, such as metabolomics and proteomics. Such multi-omics pathway analysis becomes common [31]. All these tools and datasets are designed to analyze and visualize metabolic pathways. However, metabolic pathway is not still perfectly established and to explore a new regulatory mechanism are still active. Therefore, the visualization of conventional network is not enough in this field. Vanted [32-35] is a well-established and flexible tool to visualize as well as to edit metabolic pathways. To facilitate drawing metabolic pathways, user can download several pathways already registered in public database, e.g. KEGG, and modify, e.g. add new metabolites with its reactions, the downloaded pathway. In addition, user can map their own quantified data in various forms, e.g. bar graph and line plots. Clustering using Kohoren’s self-organizing map (SOM) also help the estimation of new regulation mechanism in metabolic pathways. Such type of software provides unique editing canvas functions to draw network. However, the learning of how to draw the network is time consuming and requires

efforts. Therefore, more versatile software, i.e. easy to use it without learning how to draw the pathways, should be developed. Such software should import any graphical data that is yield by generic drawing software, which may induce any user to use the tool and also visualize experimentally measure data on this tools facilitates interpreting metabolomic data. II. DESCRIPTION FOR PATHWAYMAP SOFTWARE A. Implementation We implemented our program, named KeioPathwayMap, using flash and therefore, as a library, Adobe AIR ver. 3.1. or later (Adobe Systems Incorporated, San Jose, CA) are necessary for clients. We confirmed that the developed software works on Windows 7 or later (both 32/64bit OS). For developers, Flex Builder (ver. 3, Adobe Systems) and Flex SDK (ver. 3.4.1, Adobe Systems) are necessary and used as a plugin in Eclipse software (The Ecliplse Fundation, http://www.eclipse.org/downloads). The software is distributed upon request ([email protected]) B. How to use the PathwayMap. After spawning Pathway software, users have to select pathways on which quantified data will be visualized. Prepare quantified metabolite data (usually, concentration) with statistical results (usually, P-values by Student’s t-test or Mann-Whitney test, or Q-values corrected by false discovery rate). A line includes metabolite ID, which corresponds to the ID in the pathway map, while a column includes group and sample information, and also quantified values. The data are prepared as s plain text (CSV) file. Subsequently, users have to set the gradation of color based on the fold change of averaged values between two groups given in the user’s data file. Then, metabolic pathways with colored metabolites are visualized. User also can select several options, e.g. how to deal with undetected peaks among multiple sample, how to represent the statistical results, and how to draw arrows (up or down directions) close to each metabolite. The most outstanding feature of PathwayMap is a function to import data yielded generic drawing software, MS PowerPoint data, as a template of pathway. On the PowerPoint, users will draw pathways using shapes without any restriction, e.g. rectangles and lines are used to represent metabolites and metabolic reactions, respectively. Only one thing that users are required is to assign an identity to each shape as a property. On the home menu, selecting the Editing icon to show the list of names of shapes and change their names to an identity, e.g. metabolite names. Upon the loading of graphical data, the software generates a file including the relationship between the positions on a pathway map and metabolite identities. When users load quantitative data, the software utilizes the relationship to visualize the color with related information, e.g. statistical results. III. RESULTS AND DISCUSSION We developed pathway visualization tool without editing function but can import graphical data yielded by generic drawing tools. As an example of pathway, we depicted primary pathways, including glycolysis, tricarboxylic acid (TCA) cycle, pentose phosphate pathway (PPP), urea cycle, pyrimidine synthesis, purine synthesis, -glutamyl cycle,

7660

choline pathway, and amino acid synthesis. Metabolites related to oxidative stress, such as reduced and oxidized glutathione, and methylation-related one-carbon pathways, such as glycine and S-adenosyl-L-methione (SAM) are also included.

developed profile-based database (categorized into level (3)), named mouse multiple tissue metabolome database (MMMDB) which include data of multiple tissues obtained from single mice using non-targeted analysis, thus, included both annotated and unannotated datasets [39].

This pathway can visualize tumor cancer-specific metabolomic aberrance, especially in the environment of hypoxia and low nutrition [36]. Under such a condition, glycolysis and PPP are activated while TCA cycles are inactivated, due to rapid production of ATP without oxygen consumption. In complement resource of carbons to glucose, activation of other pathways to utilize carbon, e.g. degradation of hydroxyproline and import of glutamic acids, are also visualized. Injection of acetaminophen into liver cause significant reduction of both of reduced (GSH) and oxidative glutathione (GSSH), indicators of oxidative stress, which induces the activation of -glutamil acids that can be detected in both of liver and blood [37, 38]. Importantly, all of these pathways, most of the primary pathways except for lipid synthesis and degradation, can be visualized simultaneously in reasonable size, which help understanding the consistency/inconsistency in the given metabolomic data and previously reported data.

One of limitations of our software is to visualize only quantified concentration of metabolites. Currently, using isotope-labeled metabolites were used for calculate flux of the metabolic pathways. To visualize these flux data is also important to activation/deactivation of individual reactions. FluxMap [40, 41] is one of the tools to incorporate such type of data. A recent review article [42] also lists such tools.

In contrast to microarray data, database including metabolic concentrations are few. HMDB include some quantitative information for each metabolite in human biofluid, such as blood, saliva and urine. However, profile data (set of metabolite concentration) is not available. We firstly

We thank Kanako Niigata at Institute for Advanced Biosciences, Keio University, to help the development of the PathwayMap software. This work was supported by the research grants from Yamagata Prefecture and Tsuruoka City.

IV. CONCLUSION We developed a tool to visualize metabolomic pathway that can import the graphical data drawn by generic software. This software also visualizes quantitative data with statistic results on the pathway, which facilitates the modification of the well-established pathway to add newly discovered regulations. ACKNOWLEDGMENT

Fig. 1. Snapshot of PathwayMap.

7661

REFERENCES [1] M. Sugimoto, M. Kawakami, M. Robert, T. Soga, and M. Tomita, "Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis," Curr Bioinform, vol. 7, pp. 96-108, Mar 2012. [2] M. Sugimoto, A. Hirayama, T. Ishikawa, M. Robert, R. Baran, K. Uehara, et al., "Differential metabolomics software for capillary electrophoresis-mass spectrometry data analysis," Metabolomics, vol. 6, pp. 27-41, 2010. [3] M. Sugimoto, D. T. Wong, A. Hirayama, T. Soga, and M. Tomita, "Capillary electrophoresis mass spectrometry-based saliva metabolomics identified oral, breast and pancreatic cancer-specific profiles," Metabolomics, vol. 6, pp. 78-95, Mar 2010. [4] D. S. Wishart, D. Tzur, C. Knox, R. Eisner, A. C. Guo, N. Young, et al., "HMDB: the Human Metabolome Database," Nucleic Acids Res, vol. 35, pp. D521-6, Jan 2007. [5] M. Sugimoto, A. Hirayama, M. Robert, S. Abe, T. Soga, and M. Tomita, "Prediction of metabolite identity from accurate mass, migration time prediction and isotopic pattern information in CE-TOFMS data," Electrophoresis, vol. 31, pp. 2311-8, Jul 2010. [6] M. Sugimoto, S. Kikuchi, M. Arita, T. Soga, T. Nishioka, and M. Tomita, "Large-scale prediction of cationic metabolite identity and migration time in capillary electrophoresis mass spectrometry using artificial neural networks," Anal Chem, vol. 77, pp. 78-84, Jan 1 2005. [7] M. Kanehisa, "Molecular network analysis of diseases and drugs in KEGG," Methods Mol Biol, vol. 939, pp. 263-75, 2013. [8] M. Tanabe and M. Kanehisa, "Using the KEGG database resource," Curr Protoc Bioinformatics, vol. Chapter 1, p. Unit1 12, Jun 2012. [9] P. D. Karp, M. Riley, M. Saier, I. T. Paulsen, S. M. Paley, and A. Pellegrini-Toole, "The EcoCyc and MetaCyc databases," Nucleic Acids Res, vol. 28, pp. 56-9, Jan 1 2000. [10] P. D. Karp, M. Riley, S. M. Paley, and A. Pellegrini-Toole, "The MetaCyc Database," Nucleic Acids Res, vol. 30, pp. 59-61, Jan 1 2002. [11] D. S. Wishart, C. Knox, A. C. Guo, R. Eisner, N. Young, B. Gautam, et al., "HMDB: a knowledgebase for the human metabolome," Nucleic Acids Res, vol. 37, pp. D603-10, Jan 2009. [12] A. Frolkis, C. Knox, E. Lim, T. Jewison, V. Law, D. D. Hau, et al., "SMPDB: The Small Molecule Pathway Database," Nucleic Acids Res, vol. 38, pp. D480-7, Jan 2010. [13] T. Jewison, Y. Su, F. M. Disfany, Y. Liang, C. Knox, A. Maciejewski, et al., "SMPDB 2.0: big improvements to the Small Molecule Pathway Database," Nucleic Acids Res, vol. 42, pp. D478-84, Jan 2014. [14] S. Okuda, T. Yamada, M. Hamajima, M. Itoh, T. Katayama, P. Bork, et al., "KEGG Atlas mapping for global analysis of metabolic pathways," Nucleic Acids Res, vol. 36, pp. W423-6, Jul 1 2008. [15] A. V. Antonov, E. E. Schmidt, S. Dietmann, M. Krestyaninova, and H. Hermjakob, "R spider: a network-based analysis of gene lists by combining signaling and metabolic pathways from Reactome and KEGG databases," Nucleic Acids Res, vol. 38, pp. W78-83, Jul 2010. [16] P. K. Sreenivasaiah, S. Rani, J. Cayetano, N. Arul, and H. Kim do, "IPAVS: Integrated Pathway Resources, Analysis and Visualization System," Nucleic Acids Res, vol. 40, pp. D803-8, Jan 2012. [17] A. Morgat, E. Coissac, E. Coudert, K. B. Axelsen, G. Keller, A. Bairoch, et al., "UniPathway: a resource for the exploration and annotation of metabolic pathways," Nucleic Acids Res, vol. 40, pp. D761-9, Jan 2012. [18] N. Kono, K. Arakawa, R. Ogawa, N. Kido, K. Oshita, K. Ikegami, et al., "Pathway projector: web-based zoomable pathway browser using KEGG atlas and Google Maps API," PLoS One, vol. 4, p. e7710, 2009. [19] N. Yu, J. Seo, K. Rho, Y. Jang, J. Park, W. K. Kim, et al., "hiPathDB: a human-integrated pathway database with facile visualization," Nucleic Acids Res, vol. 40, pp. D797-802, Jan 2012. [20] M. G. Conte, S. Gaillard, N. Lanau, M. Rouard, and C. Perin, "GreenPhylDB: a database for plant comparative genomics," Nucleic Acids Res, vol. 36, pp. D991-8, Jan 2008. [21] J. Cui, P. Li, G. Li, F. Xu, C. Zhao, Y. Li, et al., "AtPID: Arabidopsis thaliana protein interactome database--an integrative platform for plant systems biology," Nucleic Acids Res, vol. 36, pp. D999-1008, Jan 2008. [22] A. Subramanian, P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. Gillette, et al., "Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles," Proc Natl Acad Sci U S A, vol. 102, pp. 15545-50, Oct 25 2005. [23] Z. Jiang and R. Gentleman, "Extensions to gene set enrichment," Bioinformatics, vol. 23, pp. 306-13, Feb 1 2007.

[24] J. Xia and D. S. Wishart, "MSEA: a web-based tool to identify biologically meaningful patterns in quantitative metabolomic data," Nucleic Acids Res, vol. 38, pp. W71-7, Jul 2010. [25] J. Xia, N. Psychogios, N. Young, and D. S. Wishart, "MetaboAnalyst: a web server for metabolomic data analysis and interpretation," Nucleic Acids Res, vol. 37, pp. W652-60, Jul 2009. [26] J. Xia and D. S. Wishart, "Metabolomic data processing, analysis, and interpretation using MetaboAnalyst," Curr Protoc Bioinformatics, vol. Chapter 14, p. Unit 14 10, Jun 2011. [27] J. Xia and D. S. Wishart, "Web-based inference of biological patterns, functions and pathways from metabolomic data using MetaboAnalyst," Nat Protoc, vol. 6, pp. 743-60, Jun 2011. [28] J. Xia, R. Mandal, I. V. Sinelnikov, D. Broadhurst, and D. S. Wishart, "MetaboAnalyst 2.0--a comprehensive server for metabolomic data analysis," Nucleic Acids Res, vol. 40, pp. W127-33, Jul 2012. [29] A. Bugrim, T. Nikolskaya, and Y. Nikolsky, "Early prediction of drug metabolism and toxicity: systems biology approach and modeling," Drug Discov Today, vol. 9, pp. 127-35, Feb 1 2004. [30] S. Ekins, E. Kirillov, E. A. Rakhmatulin, and T. Nikolskaya, "A novel method for visualizing nuclear hormone receptor networks relevant to drug metabolism," Drug Metab Dispos, vol. 33, pp. 474-81, Mar 2005. [31] J. Eichner, L. Rosenbaum, C. Wrzodek, H. U. Haring, A. Zell, and R. Lehmann, "Integrated enrichment analysis and pathway-centered visualization of metabolomics, proteomics, transcriptomics, and genomics data by using the InCroMAP software," J Chromatogr B Analyt Technol Biomed Life Sci, vol. 966, pp. 77-82, Sep 1 2014. [32] B. H. Junker, C. Klukas, and F. Schreiber, "VANTED: a system for advanced data analysis and visualization in the context of biological networks," BMC Bioinformatics, vol. 7, p. 109, 2006. [33] C. Klukas and F. Schreiber, "Integration of -omics data and networks for biomedical research with VANTED," J Integr Bioinform, vol. 7, p. 112, 2010. [34] H. Mehlhorn and F. Schreiber, "DBE2 - management of experimental data for the VANTED system," J Integr Bioinform, vol. 8, p. 162, 2011. [35] H. Rohn, A. Junker, A. Hartmann, E. Grafahrend-Belau, H. Treutler, M. Klapperstuck, et al., "VANTED v2: a framework for systems biology applications," BMC Syst Biol, vol. 6, p. 139, 2012. [36] A. Hirayama, K. Kami, M. Sugimoto, M. Sugawara, N. Toki, H. Onozuka, et al., "Quantitative metabolome profiling of colon and stomach cancer microenvironment by capillary electrophoresis time-of-flight mass spectrometry," Cancer Res, vol. 69, pp. 4918-25, Jun 1 2009. [37] T. Soga, M. Sugimoto, M. Honma, M. Mori, K. Igarashi, K. Kashikura, et al., "Serum metabolomics reveals gamma-glutamyl dipeptides as biomarkers for discrimination among different forms of liver disease," J Hepatol, vol. 55, pp. 896-905, Oct 2011. [38] T. Soga, R. Baran, M. Suematsu, Y. Ueno, S. Ikeda, T. Sakurakawa, et al., "Differential metabolomics reveals ophthalmic acid as an oxidative stress biomarker indicating hepatic glutathione consumption," J Biol Chem, vol. 281, pp. 16768-76, Jun 16 2006. [39] M. Sugimoto, S. Ikeda, K. Niigata, M. Tomita, H. Sato, and T. Soga, "MMMDB: Mouse Multiple Tissue Metabolome Database," Nucleic Acids Res, vol. 40, pp. D809-14, Jan 2012. [40] C. Krach, A. Junker, H. Rohn, F. Schreiber, and B. H. Junker, "Flux visualization using VANTED/FluxMap," Methods Mol Biol, vol. 1191, pp. 225-33, 2014. [41] T. Dandekar, A. Fieselmann, S. Majeed, and Z. Ahmed, "Software applications toward quantitative metabolic flux analysis and modeling," Brief Bioinform, vol. 15, pp. 91-107, Jan 2014. [42] Y. Toya, N. Kono, K. Arakawa, and M. Tomita, "Metabolic flux analysis and visualization," J Proteome Res, vol. 10, pp. 3313-23, Aug 5 2011.

7662

Visinets: a web-based pathway modeling and dynamic visualization tool.

cnvCurator: an interactive visualization and editing tool for somatic copy number variations.

Metabolomic and Lipidomic Profiling Identifies The Role of the RNA Editing Pathway in Endometrial Carcinogenesis.

CRISPR as a strong gene editing tool.

Cas9 gene editing technology studies.

ESG: automated protein function prediction servers enhanced with Gene Ontology visualization tool.

GIV: A Tool for Genomic Islands Visualization.

VennBLAST—whole transcriptome comparison and visualization tool.

Vanno: a visualization-aided variant annotation tool.

RNAseqViewer: visualization tool for RNA-Seq data.

Outsourcing strategies in bioanalysis.

The future of WHO: outsourcing?

Human genome editing as a tool to establish causality.

Fulfilling the dream of a perfect genome editing tool.

Outsourcing and contract services.

MetaMapR: pathway independent metabolomic network analysis incorporating unknowns.

Cytoscape: the network visualization tool for GenomeSpace workflows.

Writing in the air: A visualization tool for written languages.

Elviz - exploration of metagenome assemblies with an interactive visualization tool.

A computed tomographic data-based vibrant bonebridge visualization tool.

Mass++: A Visualization and Analysis Tool for Mass Spectrometry.

Fluorescent phallotoxin, a tool for the visualization of cellular actin.

CoExpNetViz: Comparative Co-Expression Networks Construction and Visualization Tool.

Cytoscape: the network visualization tool for GenomeSpace workflows.