NOVA: a software to analyze complexome profiling data.

Bioinformatics Advance Access published October 9, 2014

NOVA :

a software to analyze complexome profiling data

¨ Ackermann 1 , Heinrich Heide 2,3 , Lea Bleier 2 , Stefan Heiko Giese 1 , Jorg ¨ 2 , Ilka Wittig 2,3 , Ulrich Brandt 2,4 , Ina Koch 1∗ Drose 1

Associate Editor: Dr. Janet Kelso

ABSTRACT Summary: We introduce NOVA, a software for the analysis of complexome profiling data. NOVA supports the investigation of the composition of complexes, cluster analysis of the experimental data, visual inspection and comparison of experiments, and many other features. Availability and implementation: NOVA is licensed under the Artistic License 2.0. It is freely available at http://www.bioinformatik.unifrankfurt.de. NOVA requires at least Java 7 and runs under Linux, Microsoft Windows, and Mac OS. Contact: [email protected]

1 INTRODUCTION To facilitate diverse cellular functions proteins can associate to form complex molecular machineries. Complexome profiling (CP, Heide et al. 2012) utilizes blue native gel electrophoresis (BNE) to separate intact proteins and protein complexes up to a molecular weight of 10 MDa (Schägger and Jagow, 1991; Wittig et al., 2006) or even 60 MDa (Strecker et al., 2010) using special large pore gels (LP-BNE). After the separation the gel strip is cut into 60 equally sized slices. Migration profiles are reconstructed by applying mass spectrometry (LC-MS/MS) to identify the proteins contained in each slice. The relative abundance of each protein is calculated by label-free LC/MS-based quantification (Heide et al., 2012). Through comparison of these profiles by hierarchical clustering, groups of co-migrating proteins are recognized, indicating the composition of quaternary structures and functional complexes (Wessels et al., 2009; Foster et al., 2006; Andersen et al., 2003). The method has been successfully applied to analyze mitochondrial complexes in rats (Heide et al., 2012) and humans (Wessels et al., 2013) as well as to explore complex formation in plants and bacteria (Takabayashi et al., 2013). Typically, experimental data sets contain up to thousands of migration profiles which cannot be manually processed. Bioinformatic tools are needed to efficiently handle and visualize these data. Though several programs like Cluster 3.0 (de Hoon et al., ∗ To

whom correspondence should be addressed

2004), Java Treeview (Saldanha, 2004) or the MultiExperiment Viewer (Saeed et al., 2003) have been applied, there is to the best of our knowledge no tool available that supplies all functionalities required for the visualization and evaluation of CP data.

2

FEATURES

We developed NOVA - a flexible, interactive tool for the analysis and visual inspection of complexome profiling data. Protein abundance profiles obtained by other protein separation techniques, such as density gradient centrifugation or size exclusion chromatography, may also be analyzed with NOVA. Data sets can be imported as XLS, XLSX, CSV or TXT files. NOVA displays the migration profiles as a heat map, see Fig. 1A, providing mouse functionality for visual inspection and data management. Links to databases like UniProt (Magrane and UniProt Consortium, 2011) enable the fast access to additional information about the proteins. For the identification of complex formation changes, for example caused by a knockdown of a specific assembly factor, heat maps of multiple CP data can be compared. NOVA supports the normalization of data by range, maximum values, profile area, and unit vector. Migration profiles of selected proteins can be compared, applying the module profile viewer, see Fig. 1B. The clustering of migration profiles is the basic step for the prediction of the composition of protein complexes. For this purpose, NOVA implements the distance measures of Euclid, Pearson, Manhattan and hierarchical clustering covering single, complete, average, and Wards linkage. A tree viewer displays the cluster hierarchy, see Fig. 1C. Proteins and/or entire mass ranges can be excluded from the clustering process allowing the investigation of specific data subsets. Clustered data can be exported as XLS, XLSX, CSV or TXT files, figures as JPEG, PNG, TIFF or BMP files.

3

SUMMARY

is a freely available software tool for the evaluation and visualization of CP data sets. It supports highly flexible and interactive inspection, exploration, and analysis of complexome data. The implemented analysis techniques focus on hierarchical clustering NOVA

© The Author (2014). Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]

1

Downloaded from http://bioinformatics.oxfordjournals.org/ at Trent University on October 9, 2014

Molecular Bioinformatics Group, Institute of Computer Science, Faculty of Computer Science and Mathematics, Cluster of Excellence Frankfurt ”Macromolecular Complexes”, Goethe-University, Robert-Mayer-Str. 11-15, 60325 Frankfurt am Main, Germany 2 Molecular Bioenergetics Group, Medical School, Cluster of Excellence Frankfurt ”Macromolecular Complexes”, Goethe-University, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany 3 Functional Proteomics, SFB815 core unit, Medical School, Goethe-University, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany 4 Nijmegen Centre for Mitochondrial Disorders, Radboud University Medical Centre, Geert Grooteplein Zuid 10, 6525 GA, Nijmegen, The Netherlands

A S3 S2 S1 S 0 S3 S2 S1 S0

III2IV

III2

III2IV III2

C

Fig. 1. NOVA’s graphical user interface (GUI) displaying a clustered CP data set from rat hearth mitochondria (Heide et al., 2012): (A) The heat map represents gel migration profiles. Each row displays the profile of an individual protein. A label at the end of each row identifies the protein. The background color of the label indicates a known membership of the protein to a complex (here 10 subunits of respiratory complex C III and 3 subunits of C I). On the top of the heat map a mass scale shows the expected mass of proteins assembled in the gel slices. The migration profiles are clustered (here Pearson correlation, average linkage). Left, the corresponding cluster subtree is aligned to the heat map. A cluster of proteins in the cluster tree is highlighted, corresponding to the selected rows of the heat map. (B) A diagram displays the migration profiles of the selected proteins. Each peak of the consensus profile corresponds to a protein assembly, functional complex, or supercomplex. These assemblies can be identified by their distinct mass. For example, the selected proteins are members of the respiratory homodimeric complex III2 , the complex assembly III2 IV, and the series of supercomplexes S0 – S3 . (C) The complete cluster tree. The tree viewer allows to navigate through the heat map and to explore migration profiles of subgroups of proteins.

methods that are the current standard approach to study the protein composition of quaternary structures. NOVA provides functionalities to assist the study of different experimental conditions, knockout experiments and disease related changes. It is already applied by various research groups working with complexome profiling data.

ACKNOWLEDGEMENT We gratefully thank Valentina Strecker, Kim-Kristin Prior, Jens Einloft and Jörg Kuharev for testing and many valuable suggestions. Funding: This work was partly supported by the Excellence Initiative of the German Federal and State Governments (EXC 115). Conflict of Interest: None declared.

REFERENCES Andersen, J. S., Wilkinson, C. J., Mayor, T., Mortensen, P., Nigg, E. A., and Mann, M. (2003). Proteomic characterization of the human centrosome by protein correlation profiling. Nature, 426(6966), 570–574. de Hoon, M., Imoto, S., Nolan, J., and Miyano, S. (2004). Open source clustering software. Bioinformatics, 20(9), 1453–1454. Foster, L. J., de Hoog, C. L., Zhang, Y., Zhang, Y., Xie, X., Mootha, V. K., and Mann, M. (2006). A Mammalian Organelle Map by Protein Correlation Profiling. Cell, 125(1), 187–199. Heide, H., Bleier, L., Steger, M., Ackermann, J., Dröse, S., Schwamb, B., Zörnig, M., Reichert, A. S., Koch, I., Wittig, I., and Brandt, U. (2012). Complexome Profiling Identifies TMEM126B as a Component of the Mitochondrial Complex I Assembly Complex. Cell Metabolism, 16(4), 538–549.

2

Magrane, M. and UniProt Consortium (2011). UniProt Knowledgebase: a hub of integrated protein data. Database, 2011, 1 – 13. Saeed, A. I., Sharov, V., White, J., Li, J., Liang, W., Bhagabati, N., Braisted, J., Klapa, M., Currier, T., Thiagarajan, M., Sturn, A., Snuffin, M., Rezantsev, A., Popov, D., Ryltsov, A., Kostukovich, E., Borisovsky, I., Liu, Z., Vinsavich, A., Trush, V., and Quackenbush, J. (2003). TM4: a free, open-source system for microarray data management and analysis. Biotechniques, 34(2), 374–378. Saldanha, A. J. (2004). Java Treeview–extensible visualization of microarray data. Bioinformatics, 20(17), 3246–3248. Schägger, H. and Jagow, G. (1991). Blue native electrophoresis for isolation of membrane protein complexes in enzymatically active form. Analytical Biochemistry, 199, 223–231. Strecker, V., Wumaier, Z., Wittig, I., and Schägger, H. (2010). Large pore gels to separate mega protein complexes larger than 10 MDa by blue native electrophoresis: Isolation of putative respiratory strings or patches. Proteomics, 10, 3379–3387. Takabayashi, A., Kadoya, R., Kuwano, M., Kurihara, K., Ito, H., Tanaka, R., and Tanaka, A. (2013). Protein co-migration database (PCoM -DB) for Arabidopsis thylakoids and Synechocystis cells. SpringerPlus, 2(1), 148. Wessels, H. J. C. T., Vogel, R. O., van den Heuvel, L., Smeitink, J. A., Rodenburg, R. J., Nijtmans, L. G., and Farhoud, M. H. (2009). LC-MS/MS as an alternative for SDS-PAGE in blue native analysis of protein complexes. Proteomics, 9(17), 4221–4228. Wessels, H. J. C. T., Vogel, R. O., Lightowlers, R. N., Spelbrink, J. N., Rodenburg, R. J., van den Heuvel, L. P., van Gool, A. J., Gloerich, J., Smeitink, J. A. M., and Nijtmans, L. G. (2013). Analysis of 953 Human Proteins from a Mitochondrial HEK293 Fraction by Complexome Profiling. PLoS ONE, 8(7), e68340. Wittig, I., Braun, H. P., and Schägger, H. (2006). Blue-Native PAGE. Nature Protocols, 1, 418–428.

Downloaded from http://bioinformatics.oxfordjournals.org/ at Trent University on October 9, 2014

B

w4CSeq: software and web application to analyze 4C-seq data.

A software system to record and analyze digitized cell images.

ProSight Lite: graphical software to analyze top-down mass spectrometry data.

A New Paradigm to Analyze Data Completeness of Patient Data.

FLIMX: A Software Package to Determine and Analyze the Fluorescence Lifetime in Time-Resolved Fluorescence Data from the Human Eye.

A New Bliss Independence Model to Analyze Drug Combination Data.

BMDExpress Data Viewer - a visualization tool to analyze BMDExpress datasets.

Using a dyadic logistic multilevel model to analyze couple data.

MICCA: a complete and accurate software for taxonomic profiling of metagenomic data.

How to analyze tumor stage data in clinical research.

Meta-Analyze Approach.

A framework for uniform access to data, software and knowledge.

Itinerary profiling to analyze a large number of protein-folding trajectories.

Thoughtful Methods to Increase Evidence Levels and Analyze Nonparametric Data.

Using biological networks to integrate, visualize and analyze genomics data.

MiXCR: software for comprehensive adaptive immunity profiling.

OriginPro 9.1: scientific data analysis and graphing software-software review.

myFX: a turn-key software for laboratory desktops to analyze spatial patterns of gene expression in Drosophila embryos.

The use of duplex-specific nuclease in ribosome profiling and a user-friendly software package for Ribo-seq data analysis.

Genome-wide profiling to analyze the effects of FXR activation on mouse renal proximal tubular cells.

A Survey of Computational Tools to Analyze and Interpret Whole Exome Sequencing Data.

Microarray profiling to analyze the effect of Snai1 loss in mouse intestinal epithelium.

A Multivariate Computational Method to Analyze High-Content RNAi Screening Data.

A predictive modeling approach to analyze data in EEG-fMRI experiments.