www.proteomics-journal.com

Page 1

Proteomics

Commentary

Proteomics made more accessible

Svein-Ole Mikalsen

Department of Science and Technology, Faculty of Natural and Health Sciences, University of the Faroe Islands, Faroe Islands

Correspondance: Svein-Ole Mikalsen, Department of Science and Technology, Faculty of Natural and Health Sciences, University of the Faroe Islands, Noatun 3, FO-100, Faroe Islands. E-mail: [email protected]

Keywords: Mass spectrometry / Open Source Freeware / Protein identification / Publication Guidelines

Number of words: 1312 Received: 26-Feb-2014; Revised: 26-Feb-2014; Accepted: 11-Mar-2014. This article has been accepted for publication and undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process, which may lead to differences between this version and the Version of Record. Please cite this article as doi: 10.1002/pmic.201400064. This article is protected by copyright. All rights reserved.

www.proteomics-journal.com

Page 2

Proteomics

Abstract Mass spectrometry-based proteomics is a bioinformatic-intensive field. Additionally, the instruments and instrument-related and analytic software are expensive. Some free Internet-based proteomics tools have gained wide usage, but there have not been any single bioinformatic framework that in an easy and intuitive way guided the user through the whole process from analyses to submission. Together, these factors may have limited the expansion of proteomics analyses, and also the secondary use (re-analyses) of proteomic data. Vaudel et al. (Proteomics 2014, this issue) are now describing their Compomics framework that guides the user through all the main steps, from the database generation, via the analyses and validation, and through the submission process to PRIDE, a proteomic databank. Vaudel et al. partly base the framework on tools that they have developed themselves, and partly they are integrating other freeware tools into the workflow. One of the most interesting aspects with the Compomics framework is the possibility of extending mass spectrometry-based proteomics outside the MS laboratory itself. With the Compomics framework, any laboratory can handle large amounts of proteomic data, thereby facilitating collaboration and in-depth data analyses. The described software also opens the potential for any laboratory to re-analyze data deposited in PRIDE.

This article is protected by copyright. All rights reserved.

www.proteomics-journal.com

Page 3

Proteomics

Mass spectrometry-based proteomics has many applications, ranging from the narrowfocused interest in a single protein to broad-scale identifications of proteins in tissues or body fluids, from fundamental curiosity-driven research to searches for diagnostic protein markers of diseases or pathological conditions (e.g., reviewed in [1, 2]). Previously, mass spectrometry was done by laboratories and groups that focused almost exclusively on this technique, and in the case of proteomics, the major groups often had dedicated bioinformaticians in addition to wet-lab people. These groups have often developed software, and some groups made such software freely available as web-based tools. Excellent examples are the ProteinProspector suite (prospector.ucsf.edu) [3] and Mascot from Matrix Science (www.matrixscience.com/search_form_select.html) [4]. The mass spectrometers and the needed accompanying LCs are relatively expensive, also compared with other central high tech methods like next generation DNA sequencing. However, their wide potentials of usage have made them to gradually become more common. Furthermore, there are many groups that have interest in mass spectrometric analyses without having an instrument in their own laboratory. The instrument-specific software is usually a rather costly affair for laboratories with limited economic resources, so these laboratories have had the choice between several less-than-optimal solutions: (i) Rely on your collaborating mass spectrometry laboratory for all analyses; (ii) costly travels to the mass spectrometry laboratory; (iii) freeware with user un-friendly interfaces; (iv) slowly and painfully establish your own bioinformatic competence as you cannot achieve financing of a bioinformatician; or (v) combinations of these or other solutions. Vaudel et al. [5] are now rewriting some of the rules of the play by their Compomics framework for proteomics (http://compomics.com/bioinformatics-for-proteomics): This article is protected by copyright. All rights reserved.

www.proteomics-journal.com

Page 4

Proteomics



Partly by making new tools available as open source freeware.



Partly by integrating other open source freeware into a workflow that covers most of the workflow from the instrument and all the way until the submitting the data to PRIDE [6], an online repository.



Partly by writing one of the most user-friendly tutorials I have seen for MS-centered freeware (and instrument-specific commercial software, for that matter). This does not necessarily extend to the manuals for all the other freeware that Vaudel et al. suggest could be integrated into their workflow.



Partly by making the graphical interface user-friendly and intuitive.

In principle, if we used all the freeware described in the Compomics tutorials, we would only need the instrument-specific software for the data collection itself. However, it may still be more convenient (for most people) to use the instrument software for calibration and to make .mgf files containing the spectral data. From my point of view, the Compomics package (and the softwares that can be integrated into the workflow), distinguish itself in three ways: 

The users must make their own databases (in FASTA format). This is explained in detail for UniProt, but is also easily done from several other databases. This gives the possibility to limit the size of the databases by downloading more or less selected groups of sequences. If you only are interested in searching in a specific species, like humans or other well investigated model organisms, this may be step of inconvenience, as most search machines offer such possibilities. After all, your database should be regularly updated. On the other hand, as all analytical processes are done locally, there is probably a considerable time saving, avoiding the traffic

This article is protected by copyright. All rights reserved.

www.proteomics-journal.com

Page 5

Proteomics

jams on the Internet. When working on unpublished sequences or in a species where no genome is available, a local database is a valuable feature, because it is easy to manually modify your database. It will then be possible to combine homologs from other species together with peptides and proteins sequenced in-house, facilitating the gradual digging into unknown proteomes. 

The Compomics search interface (SearchGUI) combines two search machines, X!Tandem and OMSSA, with the consequence that a higher number of peptides is identified and better protein scores are achieved. The results are inspected in PeptideShaker, including the peptide and protein validation.



Integration with data submission tools. It is now commonly demanded that the data are deposited in online databanks, so they can be inspected by manuscript reviewers and much more importantly, they can be reanalyzed in the future by yourself or other researchers, for example by "reshaking" in PeptideShaker. Although this type of reanalysis has not been much employed in proteomics, it has shown its value in genomic sequence and expression databanks (e.g., [7]). Furthermore, the repositories are a rich source of data for improving future identifications [6]. Of course, MIAPE guidelines (www.psidev.info/miape) are followed during the submission process.

There are few, if any, restrictions in operational system requirements, except for a reasonably powerful computer with large memory and large hard disk. The Compomics framework is by no means the only open source freeware for these purposes (reviewed in [8]), but it appears to be the most complete, the most user-friendly and, at the same time, flexible freeware. Vaudel et al. are now making proteomics more accessible in two ways: This article is protected by copyright. All rights reserved.

www.proteomics-journal.com

Page 6

Proteomics

first, by giving the user more control into the proteomic analyses, and second, by facilitating the expansion of proteomics into environments where the financing, and not creativity, is a limiting factor.

Acknowledgements The mass spectrometric work in the author's laboratory is supported by the Faroese Research Council and Statoil.

The author has declared no conflict of interest.

References [1]

Aebersold, R. and Mann, M., Mass spectrometry-based proteomics. Nature 2003, 422, 198207.

[2]

Behrens, T., Bonberg, N., Casjens, S., Pesch, B., et al., A practical guide to epidemiological practice and standards in the identification and validation of diagnostic markers using a bladder cancer example. Biochim. Biophys. Acta 2014, 1844, 145-155.

[3]

Chalkley, R. J., Baker, P. R., Huang, L., Hansen, K. C., et al., Comprehensive analysis of a multidimensional liquid chromatography mass spectrometry dataset acquired on a quadrupole selecting, quadrupole collision cell, time-of-flight mass spectrometer: II. New developments in Protein Prospector allow for reliable and comprehensive automatic analysis of large datasets. Mol. Cell. Proteomics 2005, 4, 1194-1204.

This article is protected by copyright. All rights reserved.

www.proteomics-journal.com

[4]

Page 7

Proteomics

Perkins, D. N., Pappin, D. J., Creasy, D. M. and Cottrell, J. S., Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20, 3551-3567.

[5]

Vaudel, M., Venne, A. S., Berven, F. S., Zahedi, R. P., et al., Shedding light on black boxes in protein identification. Proteomics 2014, in press.

[6]

Martens, L., Hermjakob, H., Jones, P., Adamski, M., et al., PRIDE: the proteomics identifications database. Proteomics 2005, 5, 3537-3545.

[7]

Rung, J. and Brazma, A., Reuse of public genome-wide gene expression data. Nat. Rev. Genet. 2013, 14, 89-99.

[8]

Perez-Riverol, Y., Wang, R., Hermjakob, H., Muller, M., et al., Open source libraries and frameworks for mass spectrometry based proteomics: A developer's perspective. Biochim. Biophys. Acta 2014, 1844, 63-76.

This article is protected by copyright. All rights reserved.

Proteomics made more accessible.

MS-based proteomics is a bioinformatic-intensive field. Additionally, the instruments and instrument-related and analytic software are expensive. Some...
344KB Sizes 2 Downloads 3 Views