Chapter 3 Plant Proteomics: From Genome Sequencing to Proteome Databases and Repositories Katsumi Sakata and Setsuko Komatsu Abstract Proteomic approaches are useful for the identification of functional proteins. These have been enhanced not only by the development of proteomic techniques but also in concert with genome sequencing. In this chapter, 30 databases and Web sites relating to plant proteomics are reviewed and recent technologies relating to data collection and annotation are surveyed. Key words Plant genome, Proteome, Database, 2-DE, MS, Shotgun proteomics, Annotation

Abbreviations EST MS n-DE SDS-PAGE

1

Expressed sequence tag Mass spectrometry n-Dimensional electrophoresis Sodium dodecyl sulfate-polyacrylamide gel electrophoresis

Introduction Comprehensive approaches for biomarker searching and functional analysis include proteomics, transcriptomics, and metabolomics. Proteomic approaches allow us to directly analyze the proteins that exist in a living body just at the emergence of biological functions [1]. The identification of proteins has been dramatically enhanced by mass spectrometers and homology searches against large-scale genome sequence data [2]. Furthermore, many proteins have been discovered in relation to the differentiation and growth of plants, as well as proteins differentially expressed depending on the environmental conditions, based on proteomic techniques that can comprehensively analyze protein interactions [3, 4]. As with genomic data, large volumes of proteomic data have been accumulated in recent years coupled with the development of high-throughput analysis methodologies.

Jesus V. Jorrin-Novo et al. (eds.), Plant Proteomics: Methods and Protocols, Methods in Molecular Biology, vol. 1072, DOI 10.1007/978-1-62703-631-3_3, © Springer Science+Business Media, LLC 2014

29

30

Katsumi Sakata and Setsuko Komatsu

Table 1 Large-scale plant genome sequencing projects [5] and representative proteome databases Order Dicot

Monocot

Organism

Proteome database

Malpighiales California poplar, Cassava(P) Fabales

Soybean, Medicago, Lotus(P)

Proteomics of Oilseeds, Soybean Proteome Databes, SoyKB, ProMEX

Brassicales

Arabidopsis, Papaya(P)

PPDB, AtProteome, ProMEX, SUBA, PhosPhAt, AT_ CHLORO, AtPID, plprot, TAIR

Rhamnales

Wine grape

Solanales

Tomato(P), Potato(P) plprot

Poales

Rice, Sorghum, Corn, Purple false brome(P)

PPDB, Rice Proteome Database, OryzaPG-DB, plprot, DIPOS, PRIN

The phylogenetic tree among the orders is based on the Angiosperm Phylogeny Group classification [6]. The letter “P” attached to an organism name means that large-scale sequencing analysis of the genome is still in progress [5]

These proteomic data also need to be integrated and organized in databases that enable us to retrieve, leverage, and share public data through up-to-date computational technology such as the latest data management systems and Web-interface techniques. This overview of the large-scale genome sequencing studies and the status of proteomic data repositories will provide a guide for datasets currently available and also those that should be prepared in future studies. We positioned large-scale sequencing projects [5] and major proteomic databases on a phylogenetic tree of flowering plants that was based on the biological classification by the Angiosperm Phylogeny Group [6] (Table 1). In the dicot plants, large-scale genome sequences have been analyzed across various orders of the biological classification. In the monocot plants large-scale genome sequences are localized in the order Poales, which includes the major cereal grain species. Many proteomic databases have been developed for the model dicot plant Arabidopsis thaliana such as AtProteome [7], ProMEX [8], and SUBA [9]. For monocot plants, rice databases such as the Rice Proteome Database [10] and OryzaPG-DB [11] have been developed. The PPDB [12, 13] covers two major species in dicots and monocots, A. thaliana and maize (Zea mays). The Proteomics of Oilseeds [14] and Soybean Proteome Database [15] store proteomic data primarily based on 2-DE in legumes. The plprot [16] database specializes in plant plastids and covers A. thaliana, tobacco, and rice. AtPID [17] contains large-scale protein–protein interaction data. Such molecular interaction databases have been

Plant Proteome Databases and Repositories

31

increasing, now including resources such as DIPOS and PRIN for rice [18, 19]. Furthermore, several databases have been organizing omics including transcriptome data and metabolomic pathways, including ProMEX [8], the Soybean Proteome Database [15], and SoyKB [20]. In this chapter, we also introduce the Web-based proteomic tools and prediction programs GelMap [21], iLoc-Plant [22], MRMaid [23], Musite [24], PeptideAtlas SRM Experiment Library (PASSEL) [25], PredPlantPTS1 [26], ProteoRed MIAPE [27], and a general database relating to plant proteomics, Proteomics Identifications (PRIDE) database [28]. Based on the efforts of the Human Proteome Organization Proteomics Standards Initiative (HUPO PSI-MI), molecular interaction data have been standardized and a common query interface (PSI Common QUery InterfaCe, PSICQUIC, http://code.google. com/p/psicquic/) was developed [29]. This service allows the querying of nearly 30 molecular interaction databases all together. Interfaces such as these are desirable in the field of plant proteomics. Genomic and proteomic studies have an interdependent relationship such that the development of one enhances development of the other. In the last decade, the availability of nucleotide sequences has accelerated the identification of proteins obtained by mass spectrometry (MS) and/or protein sequencers [30]. This acceleration has been greatly enhanced by the availability of the genomic sequences that have replaced EST sequences in recent years. The interdependency in the reverse direction might be referred to as proteogenomics [31–34]. It aims at improving genome annotation by using proteomic information based on MS. The information based on proteogenomics is also applicable to several analyses including frameshifts of coding sequences and posttranslational modifications such as N-terminal methionine excision, signal peptides, and proteolysis [34, 35]. The proteogenomics approach has been applied to A. thaliana [36].

2

Plant Proteome Databases and Repositories Here we summarize the representative databases and Web sites relating to plant proteomics. Each section is presented in alphabetical order. The URLs are tabulated in Table 2.

2.1 Arabidopsis Thaliana Databases 2.1.1 AT_CHLORO

This database was developed for chloroplast proteome data from A. thaliana. The proteome datasets were extracted from Arabidopsis leaves. LC-MS/MS-based analysis was used to identify ~1,300 proteins from more than 10,000 unique peptide sequences. The partitioning of each protein in the three chloroplast compartments was validated by using a semiquantitative proteomics approach (spectral count). The chloroplasts were purified by Percoll density gradients and SDS-PAGE [37].

32

Katsumi Sakata and Setsuko Komatsu

Table 2 URLs for plant proteome databases and Web sites Arabidopsis thaliana databases AT_CHLORO AtPID AtProteome pep2pro PhosPhAt SUBA TAIR Rice databases DIPOS OryzaPG-DB PRIN Rice proteome database RiceRBP Other databases GabiPD Medicago PhosphoProtein DB P3DB plprot PPDB PRIDE ProMEX Proteomics of Oilseeds Seed Proteome Web Portal Soybean Proteome Database SoyKB Web-based tools GelMap

2.1.2

AtPID

www.grenoble.prabi.fr/at_chloro/ www.megabionet.org/atpid/webfile/ fgcz-atproteome.unizh.ch/ fgcz-pep2pro.uzh.ch/ phosphat.mpimp-golm.mpg.de/db.html suba.plantenergy.uwa.edu.au/ www.arabidopsis.org/ csb.shu.edu.cn/dipos oryzapg.iab.keio.ac.jp/ bis.zju.edu.cn/prin/ gene64.dna.affrc.go.jp/RPD/ www.bioinformatics2.wsu.edu/ cgi-bin/RiceRBP/home.pl www.gabipd.org/ phospho.medicago.wisc.edu p3db.org/ www.plprot.ethz.ch/ ppdb.tc.cornell.edu/ www.ebi.ac.uk/pride/ promex.pph.univie.ac.at/promex/ www.oilseedproteomics.missouri.edu/ www.seed-proteome.com/ proteome.dc.affrc.go.jp/Soybean/ soykb.org/

iLoc-Plant MRMaid Musite PASSEL PredPlantPTS1

www.gelmap.de/ www.jci-bioinfo.cn/iLoc-Plant www.mrmaid.info/ musite.net/ www.peptideatlas.org/passel/ ppp.gobics.de/

ProteoRed MIAPE web toolkit

www.proteored.org/MIAPE/

AtPID includes predicted protein–protein interactions for A. thaliana [17]. Interactions are predicted from ortholog interactions, microarray profiles, GO analyses [38], conserved domains, and genomic contexts. The database contains ~28,000 protein–protein interaction pairs with ~23,000 pairs generated from prediction methods. The remaining ~5,000 pairs were manually created from the literature and/or from enzyme complexes in KEGG [39].

Plant Proteome Databases and Repositories

33

2.1.3 AtProteome

This database contains an organ-specific proteome map for A . thaliana. The protein identification information is displayed by proteogenomic mapping of the peptides onto the genome. Additional information on the identification of proteins is linked, such as the amino acid sequence of the first splice variant of the protein and the detected peptides sorted by their position in the protein [7].

2.1.4 pep2pro

This database is a further development of AtProteome and provides proteome information on A. thaliana [40]. The Web site shows the protein identification information by proteogenomic mapping of the peptides onto the genome. The TAIR9 dataset is provided as the default dataset. The pep2pro data analysis pipeline also handles data export to the PRIDE database [28] and data retrieval by the MASCP Gator (http://gator.masc-proteomics. org/).

2.1.5 PhosPhAt

PhosPhAt is a database of phosphorylation sites in A. thaliana. The database contains ~1,200 defined tryptic peptides matching ~1,000 distinct proteins. Phosphorylation sites are marked as “defined” if the precise location of the phosphorylated amino acid has been unambiguously determined by mass spectrometric analysis [41, 42].

2.1.6

SUBA

This database focuses on protein localization in A. thaliana. It stores more than 6,700 nonredundant proteins observed in ten distinct subcellular locations. Various experimental methods were used for the localization, such as chimeric fluorescent fusion proteins, MS, literature references, and location prediction software based on amino acid sequences [9].

2.1.7

TAIR

A comprehensive genome database for A. thaliana, The Arabidopsis Information Resource (TAIR), also provides data repositories for Arabidopsis proteomics resources: (1) primary protein sequences, (2) protein domains, (3) protein structures including 3-D structure images, (4) protein–protein interactions, (5) biochemical properties including enzymes and biochemical pathways [43].

2.2 2.2.1

Rice Databases DIPOS

DIPOS provides information on interacting proteins in rice (Oryza sativa), where the interactions are predicted using two computational methods, interologs and domain-based methods [18]. An interolog is a conserved interaction between a pair of proteins that have interacting homologs in another organism [44]. The database stores nearly 15 M pairwise interactions among 27,746 proteins. Each interaction is assigned a confidence score, and biological explanations of pathways and interactions are also provided.

34

Katsumi Sakata and Setsuko Komatsu

2.2.2 OryzaPG-DB

Proteins stored in this database were extracted from rice (Oryza sativa) and MS analysis was conducted based on the shotgun proteomics approach [11]. The proteins contained in the database were compared with protein data from full-length cDNA sequence databases such as RAP-DB [45]. This approach should enable detection of definitively functional proteins compared to the similar Rice Proteome Database [10], which is based on conventional protein sequencing. Nearly 3,200 genes are covered by the peptides identified by searching the product ion spectra against the protein, cDNA, transcript, and genome databases.

2.2.3

PRIN

This Web site provides a prediction of protein–protein interactions in Oryza sativa [19]. It is based on a method known as interologs [44], and six model organisms where large-scale protein–protein interaction experiments have been applied: yeast, worm, fruit fly, human, E. coli, and Arabidopsis. Some 76,585 nonredundant rice protein interaction pairs have been predicted among 5,049 rice proteins.

2.2.4 Rice Proteome Database

This database contains proteome data from rice (Oryza sativa cv. Nipponbare) based on 2-DE techniques. The database stores more than 20 reference maps based on 2-DE of proteins from rice tissues and subcellular compartments. The reference maps comprise more than 10,000 identified proteins showing tissue and subcellular localization, corresponding to ~4,100 separate protein entries in the database. Amino acid sequences were determined by protein sequencers and MS, which were up-to-date technologies at the time the database was developed [10].

2.2.5 RiceRBP

This database contains 257 experimentally identified RNA-binding proteins (RBPs) in rice. Many of these have not previously been predicted to be RBPs. For each identified protein, information is provided on transcript and protein sequences, predicted protein domains, details of the experimental identification, and whether antibodies have been generated for public use [46].

2.3

This database provides integrated plant “omics” data, and was developed as part of the German initiative for Genome Analysis of the Plant Biological System (GABI) [47]. Data from different “omics” are integrated and interactively connected. 2-D electrophoresis gel images were collected from different tissues of A . thaliana and Brassica napus. Stored data relating to phosphorylation have links with external data in the PhosPhAt database through related data in the Gene GreenCard database.

2.3.1

Other Databases GabiPD

2.3.2 Medicago PhosphoProtein Database

This database stores phosphoprotein, phosphopeptide, and phosphosite data specific to Medicago truncatula, the model system for legume biology. It includes 3,457 phosphopeptides that contain

Plant Proteome Databases and Repositories

35

3,404 nonredundant sites of phosphorylation on 829 proteins. Through a Web-based interface, users can browse identified proteins or search for proteins of interest, and also conduct BLAST searches of the database using peptide sequences and phosphorylation motifs as queries [48]. 2.3.3

P3DB

This database stores plant protein phosphorylation site data, organizing information on 32,963 nonredundant sites collected from 23 experimental studies of six plant species. The data can be searched for a protein of interest using an integrated BLAST module to query similar sequences with known phosphorylation sites [49].

2.3.4

plprot

The plprot is a plastid proteome database that provides information about the proteomes of chloroplasts, etioplasts, and undifferentiated plastids. The database stores more than 2,000 proteins. The basic module integrates a homology search and comparative information on the proteomes of different plastid types. Data from A. thaliana, tobacco, and rice are contained in the database [16].

2.3.5

PPDB

PPDB is a Plant Proteome DataBase for A. thaliana and maize (Zea mays). PPDB was developed for plant plastids and expanded to the whole plant proteome. The name of the database was changed accordingly, using the same abbreviation, from Plastid PDB to Plant PDB. The current database has large-scale proteomic data in diverse forms: (1) 5,000 identified proteins both in Arabidopsis and maize, (2) 80 published Arabidopsis proteome datasets from subcellular compartments or organs linked to each locus, and (3) 1,500 Arabidopsis proteins manually assigned subcellular locations. Information from MS-based identification and posttranslational modification is available for each identified protein [12, 13].

2.3.6

PRIDE

The PRIDE database provides standardized MS proteomics data. It is one of the main repositories of proteomics data that have been generated by MS approaches. Recently, it has become a database that various journals in the field are supporting and even mandating deposition of proteomics data in. Datasets are stored in PRIDE without modification or reanalysis, and the research community can access the original results obtained by the research group [28].

2.3.7

ProMEX

ProMEX is a database of MS/MS reference spectra (mostly Orbitrap precursor ion mass data) from plants and microbes [8, 42, 60]. The data were generated based on liquid chromatography coupled to ion trap MS (LC-IT-MS). This current release 2.9/2012 contains 51,793 tryptic peptide product ion spectra entries of 27,886 different peptide sequence entries from Medicago truncatula, Chlamydomonas reinhardtii, Bradyrhizobium japonicum, Arabidopsis thaliana, Phaseolus vulgaris, Lotus japonicus, Lotus corniculatus,

36

Katsumi Sakata and Setsuko Komatsu

Lycopersicon esculentum, Solanum tuberosum, Nicotiana tabacum, Sinorhizobium meliloti, Glycine max, and Zea mays. Furthermore a search algorithm is implemented which allows to search single spectra and mzXML-LC-MS/MS runs against the database. Protein data are linked to “omes” such as metabolites, pathways, and transcripts. Peptide identification was based on peptide mass fingerprinting [50]. 2.3.8 Proteomics of Oilseeds

This database stores reference maps of soybean (Glycine max cv. Maverick) proteins based on 2-DE. The samples were collected during seed filling in the plant and analyzed at 2, 3, 4, 5, and 6 weeks after flowering. It contains expression profiles for 679 protein spots, from which 422 proteins representing 216 nonredundant proteins were identified [14].

2.3.9 Seed Proteome Web Portal

This Web site provides information both on quantitative seed proteomic data and on seed-related protocols. As a proteomic database, it gives access to 475 different Arabidopsis seed proteins annotated from 2-DE maps, including quantitative data according to the accumulation profile of each protein during the germination process. The Web site also provides protocols that the authors have routinely used for Arabidopsis seed proteome studies, such as procedures for sample preparation, electrophoresis coupled with gel analysis in 2-D electrophoresis, and protein identification by mass spectrometry [51].

2.3.10 Soybean Proteome Database

The current version of this database contains 23 reference maps of soybean (G. max cv. Enrei) proteins based on 2-DE [15]. The samples were collected from several organs, tissues, and organelles. The reference maps include 8,262 detected proteins and 672 identified proteins, or proteins for which a sequence or a peptide peak has been determined. An omics aspect is also included as a table linked to temporal expression profiles that reveals relationships among 106 mRNAs, 51 proteins, and 89 metabolites that vary over time under flooding stress (Fig. 1). The Web interface is representative of proteome databases based on 2-DE, and was developed using the Make2DDB II environment [52], which serves a standardized search function for the stored protein spots based on the accession number, description of the protein, and isoelectric point/molecular weight range. The database focuses on the seedling stage in soybean, 0–7 days after seedling emergence, and in this it differs from the Proteomics of Oilseeds [14].

2.3.11

SoyKB abbreviates Soybean Knowledge Base and stores multiple “omes” including temporal profiles of proteins detected from soybean (G. max cv. Williams 82) [20]. The proteomic data are available for seeds, roots, and root hairs and for multiple conditions.

SoyKB

Plant Proteome Databases and Repositories

37

Fig. 1 A representative omics aspect in the Soybean Proteome Database [15]. (a) Proteome, transcriptome, and metabolome data are associated by an omics table. Proteomic data include 2-DE images and temporal expression profiles in the seedling stage in soybean. Transcriptomic and metabolomic data also include temporal profiles of the entities. Metabolites are mapped on metabolic pathways, as are associated mRNAs and proteins. (b) An example omics table in the database. The table indicates significant relationships between mRNAs, proteins, and metabolites with cells of the same color

Protein sequences and structures are linked to information about the genes such as gene models. Furthermore, metabolomics data from the SoyMetDB database [53] are incorporated into SoyKB. 2.4 Web-Based Proteomic Tools and Prediction Programs 2.4.1

GelMap

2.4.2 iLoc-Plant

The Web site provides a tool for spot visualization on gel images [21]. Users can upload gel images and coordinates/information spreadsheets. The Web site also gives access to functional annotation of identified proteins defined by the user, and annotation of several proteins per analyzed protein “spot” according to MS data. Previously, this type of gel image-based database had to be opened through a server computer that users set up by themselves using a software tool such as the Make 2D-DB II package [52]. With GelMap a user can open the gel-based image data through the GelMap Web site without having to use a server. This Web site provides a tool for prediction of subcellular localization of plant proteins with single or multiple sites. A new prediction

38

Katsumi Sakata and Setsuko Komatsu

method, the “multi-labeled learning” approach, was developed that can handle systems containing both single- and multiplelocation plant proteins. An overall success rate of 71 % was demonstrated in the authors’ report [22]. 2.4.3

MRMaid

Selected reaction monitoring (SRM), also called multiple reaction monitoring (MRM), is a tool for targeted quantitative proteomics [23]. This Web site helps users design SRM assays by suggesting peptides and product ions to monitor based on millions of experimental spectra from the PRIDE database [28]. By using data from the public repository PRIDE, MRMaid covers an increasing number of species as the coverage of PRIDE grows. Transitions by the Web site for 25 A. thaliana proteins were evaluated experimentally, and found capable of quantifying 23 of these proteins.

2.4.4

Musite

Musite is a Web site developed to predict phosphorylation sites based solely on protein sequences. It can be also downloaded as a stand-alone tool. Phosphorylation data from A. thaliana, B. napus, G. max, M. truncatula, O. sativa, and Z. mays were collected for cross-species testing. It was reported that the model for A. thaliana could be extended to other organisms, and the overall plant model from Musite was reported to be better than other plant-specific prediction tools in prediction accuracy [24].

2.4.5

PASSEL

The PeptideAtlas project has developed an analysis pipeline to identify peptides by tandem mass spectrometry (MS/MS), statistically validate the identifications, and map the identified sequences to the genomes of eukaryotic organisms. The PASSEL is a component of the PeptideAtlas project and a proteomic data repository for the collection and representation of data from SRM measurements. Users can submit, disseminate, and reuse SRM experimental results from analysis of biological samples [25].

2.4.6 PredPlantPTS1

Several computational approaches have been developed to predict peroxisomal proteins carrying the peroxisome targeting signal type 1 (PTS1). The Web site PredPlantPTS1 enables plant-specific prediction of PTS1 proteins based on a machine learning approach that is able to predict PTS1 proteins for higher plants (spermatophytes) with high accuracy [26].

2.4.7 ProteoRed MIAPE Web Toolkit

This Web site is a bioinformatics tool and performs several functions related to proteomic data standards, the HUPO-PSI’s (Proteomics Standards Initiative) standard data formats and Minimum Information About a Proteomics Experiment (MIAPE) guidelines: (1) verifying that reports fulfill the minimum information requirements of the corresponding MIAPE modules, highlighting inconsistencies or missing information; (2) converting several XML-based data standards directly into human-readable MIAPE reports stored

Plant Proteome Databases and Repositories

39

within the ProteoRed MIAPE repository; and (3) performing the reverse operation, allowing users to export from MIAPE reports into XML files for computational processing, data sharing, or public database submission [27].

3

Data Collection and Annotation of Proteins We would like to illustrate the two major proteomics workflows: (i) protein extraction, electrophoresis (1-DE or 2-DE), cutting out bands (by 1-DE) or spots (by 2-DE), and MS and (ii) protein extraction and MS. The type-(i) approach is conventional, while the type-(ii) approach has been developed in more recent years and is also referred to as “shotgun proteomics” [54]. In proteomic analyses based on 2-DE (the type-(i) approach), the refined and extracted proteins are separated in the range of isoelectric point 3–10, in the first dimension, and molecular weight 10–100 kDa, in the second dimension [55]. A spot is cut out from the gel and the mass spectrum of the peptide is measured after reductive alkylation, enzymatic digestion, and desalination. Based on the obtained mass list and software searches against protein and nucleic acid databases, we detect the amino acid sequences matching the corresponding peptide mass spectrum data to identify the protein. To improve sensitivity, as well as the conventional Coomassie Brilliant Blue dye, fluorescent staining is also used extensively [2]. In this approach, we can use a gaseous phase protein sequencer instead of MS to determine the amino acid sequence. The improvement of analysis precision and software development for MS have made possible a proteomics analysis not reliant on electrocataphoresis (the type-(ii) approach) [56, 57]. In this new approach, the refined or the extracted protein is digested with enzymes and the mixture of peptides is separated by liquid chromatography (LC) based on their hydrophobicity. The peptides eluted from the LC column are analyzed by MS, such as MS/MS, which is directly connected with the LC. The first MS separates each peptide ion and the second MS decomposes the peptide into fragments and determines the corresponding sequence from the fragmentation pattern. Software identifies the protein based on a homology search against reference databases. There are two derivations of this approach, labeling proteins before the analysis and analysis without labeling. At the annotation stage of identifying proteins, we should give attention to the stability of the protein identifiers. A recent report revealed significant differences in the identifiers among the main protein databases: the International Protein Index (IPI), the UniProt Knowledgebase (UniProtKB), the National Center for Biotechnological Information nr database (NCBI nr), and Ensembl [58]. In the report, it was noted that some entries submitted to a

40

Katsumi Sakata and Setsuko Komatsu

database are deleted afterwards in several months or a couple of years. The report demonstrated that UniProtKB was more stable than IPI and NCBI gi numbers. In recent years, the data deposited in publicly available proteomic databases have been increasing. There has been concern that automatic curation of repository data is dependent on analyses that could cause misannotation. Both database developers and users should be aware of these potential issues. A posteriori detection of such data encountered in typical proteomics datasets was also studied [59].

4

Conclusion In recent years, a large amount of proteomic information including data related to biological functions has been created through proteomic research interlocked with genomic research. The proteomic databases we have introduced here have made such information publicly available. In plants, only a few species have complete genomic information but proteomic techniques can also be applied to species without genomic information. Proteomics techniques are useful to clarify the biological mechanisms underlying important plant traits, and proteomic databases will contribute to the establishment of technology to regulate and enhance such biological mechanisms and improve plant productivity.

References 1. Komatsu S, Konishi H, Shen S et al (2003) Rice proteomics: a step toward functional analysis of the rice genome. Mol Cell Proteomics 2:2–10 2. Komatsu S, Yano H (2006) Update and challenges on proteomics in rice. Proteomics 6:4057–4068 3. Salekdeh GH, Komatsu S (2007) Crop proteomics: aim at sustainable agriculture of tomorrow. Proteomics 7:2976–2996 4. Komatsu S, Ahsan N (2009) Soybean proteomics and its application to functional analysis. J Proteomics 72:325–336 5. Plant genomes central: genome projects in progress. http://www.ncbi.nlm.nih.gov/ genomes/PLANTS/PlantList.html. Accessed 10 Feb 2012 6. The Angiosperm Phylogeny Group (2009) An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Bot J Linn Soc 161:105–121 7. Baerenfaller K, Grossmann J, Grobei MA et al (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics. Science 320:938–941

8. Hummel J, Niemann M, Wienkoop S et al (2007) ProMEX: a mass spectral reference database for proteins and protein phosphorylation sites. BMC Bioinformatics 8:216 9. Heazlewood JL, Verboom RE, Tonti-Filippini J et al (2006) SUBA: the Arabidopsis subcellular database. Nucleic Acids Res 35:D213–D218 10. Komatsu S, Kojima K, Suzuki K et al (2004) Rice Proteome Database based on twodimensional polyacrylamide gel electrophoresis: its status in 2003. Nucleic Acids Res 32:D388–D392 11. Helmy M, Tomita M, Ishihama Y (2011) OryzaPG-DB: rice proteome database based on shotgun proteogenomics. BMC Plant Biol 11:63 12. Friso G, Giacomelli L, Ytterberg AJ et al (2004) In-depth analysis of the thylakoid membrane proteome of Arabidopsis thaliana chloroplasts: new proteins, new functions, and a plastid proteome database. Plant Cell 16:478–499 13. Sun Q, Zybailov B, Majeran W et al (2008) PPDB, the plant proteomics database at Cornell. Nucleic Acids Res 37:D969–D974

Plant Proteome Databases and Repositories 14. Hajduch M, Ganapathy A, Stein JW et al (2005) A systematic proteomic study of seed filling in soybean. Establishment of high-resolution twodimensional reference maps, expression profiles, and an interactive proteome database. Plant Physiol 137:1397–1419 15. Sakata K, Ohyanagi H, Nobori H et al (2009) Soybean Proteome Database: a data resource for plant differential omics. J Proteome Res 8:3539–3548 16. Kleffmann T, Hirsch-Hoffmann M, Gruissem W et al (2006) plprot: a comprehensive proteome database for different plastid types. Plant Cell Physiol 47:432–436 17. Cui J, Li P, Li G et al (2008) AtPID: Arabidopsis thaliana protein interactome database: an integrative platform for plant systems biology. Nucleic Acids Res 36:D999–D1008 18. Sapkota A, Liu X, Zhao X-M et al (2011) DIPOS: database of interacting proteins in Oryza sativa. Mol Biosyst 7:2615–2621 19. Gu H, Zhu P, Jiao Y et al (2011) PRIN: a predicted rice interactome network. BMC Bioinformatics 12(1):161 20. Joshi T, Patil K, Fitzpatrick MR et al (2012) Soybean Knowledge Base (SoyKB): a web resource for soybean translational genomics. BMC Genomics 13(Suppl 1):S15 21. Senkler M, Braun H-P (2012) Functional annotation of 2D protein maps: the GelMap portal. Front Plant Sci 3:87 22. Wu Z-C, Xiao X, Chou K-C (2011) iLocPlant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites. Mol Biosyst 7:3287–3297 23. Fan J, Mohareb F, Jones AME et al (2012) MRMaid: the SRM assay design tool for Arabidopsis and other species. Front Plant Sci 3:164 24. Yao Q, Gao J, Bollinger C et al (2012) Predicting and analyzing protein phosphorylation sites in plants using Musite. Front Plant Sci 3:186 25. Farrah T, Deutsch EW, Kreisberg R et al (2012) PASSEL: the PeptideAtlas SRM experiment library. Proteomics 12:1170–1175 26. Reumann S, Buchwald D, Lingner T (2012) PredPlantPTS1: a web server for the prediction of plant peroxisomal proteins. Front Plant Sci 3:194 27. Medina-Aunon JA, Martinez-Bartolome S, Lopez-Garcia MA (2011) The ProteoRed MIAPE web toolkit: a user-friendly framework to connect and share proteomics standards. Mol Cell Proteomics 10:M111.008334 28. Martens L, Hermjakob H, Jones P et al (2005) PRIDE: the proteomics identifications database. Proteomics 5:3537–3545 29. Orchard S (2012) Molecular interaction databases. Proteomics 12:1656–1662

41

30. Wijk KJ (2001) Challenges and prospects of plant proteomics. Plant Physiol 126:501–508 31. Gupta N, Tanner S, Jaitly N et al (2007) Whole proteome analysis of post-translational modifications: applications of massspectrometry for proteogenomic annotation. Genome Res 17:1362–1377 32. Ansong C, Purvine SO, Adkins JN et al (2008) Proteogenomics: needs and roles to be filled by proteomics in genome annotation. Brief Funct Genomic Proteomic 7:50–62 33. Jaffe JD, Berg HC, Church GM (2004) Proteogenomic mapping as a complementary method to perform genome annotation. Proteomics 4:59–77 34. Gupta N, Benhamida J, Bhargava V et al (2008) Comparative proteogenomics: combining mass spectrometry and comparative genomics to analyze multiple genomes. Genome Res 18:1133–1142 35. Proteogenomics. http://en.wikipedia.org/ wiki/Proteogenomics 36. Castellana NE, Payne SH, Shen Z et al (2008) Discovery and revision of Arabidopsis genes by proteogenomics. Proc Natl Acad Sci U S A 105:21034–21038 37. Ferro M, Brugiere S, Salvi D et al (2010) AT_ CHLORO, a comprehensive chloroplast proteome database with subplastidial localization and curated information on envelope proteins. Mol Cell Proteomics 9:1063–1084 38. Barrell D, Dimme E, Huntle RP et al (2009) The GOA database in 2009-an integrated gene ontology annotation resource. Nucleic Acids Res 37:D396–D403 39. Kanehisa M, Goto S, Sato Y et al (2012) KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res 40:D109–D114 40. Hirsch-Hoffmann M, Gruissem W, Baerenfaller K (2012) pep2pro: the high-throughput proteomics data processing, analysis, and visualization tool. Front. Plant Sci 3:123 41. Heazlewood JL, Durek P, Hummel J et al (2008) PhosPhAt: a database of phosphorylation sites in Arabidopsis thaliana and a plantspecific phosphorylation site predictor. Nucleic Acids Res 36:D1015–D1021 42. Weckwerth W, Baginsky S, Wijk KV (2008) The multinational Arabidopsis steering subcommittee for proteomics assembles the largest proteome database resource for plant systems biology. J Proteome Res 7:4209–4210 43. Lamesch P, Berardini TZ, Li D et al (2011) The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res 40(D1):D1202-D1210. doi:10.1093/nar/gkr1090 44. Walhout AJ, Sordella R, Lu X et al (2000) Protein interaction mapping in C. elegans

42

45.

46.

47.

48.

49. 50.

51. 52.

Katsumi Sakata and Setsuko Komatsu using proteins involved in vulval development. Science 287:116–122 Ohyanagi H, Tanaka T, Sakai H et al (2006) The Rice Annotation Project Database (RAP-DB): hub for Oryza sativa ssp. japonica genome information. Nucleic Acids Res 34:D741–D744 Doroshenk KA, Crofts AJ, Morris RT et al (2012) RiceRBP: a resource for experimentally identified RNA binding proteins in Oryza sativa. Front Plant Sci 3:90 Usadel B, Schwacke R, Nagel A et al (2012) GabiPD: the GABI Primary Database integrates plant proteomic data with gene-centric information. Front Plant Sci 3:154 Rose CM, Venkateshwaran M, Grimsrud PA et al (2012) Medicago PhosphoProtein Database: a repository for Medicago truncatula phosphoprotein data. Front Plant Sci 3:122 Yao Q, Bollinger C, Gao J et al (2012) P3DB: an integrated database for plant protein phosphorylation. Front Plant Sci 3:206 Wolski W, Lalowski M, Martus P et al (2005) Transformation and other factors of the peptide mass spectrometry pairwise peak-list comparison process. BMC Bioinformatics 6:285 Galland M, Job D, Rajjou L (2012) The seed proteome web portal. Front Plant Sci 3:98 Mostaguir K, Hoogland C, Binz P-A et al (2003) The Make 2D-DB II package: conversion of federated two-dimensional gel electrophoresis databases into a relational format and interconnection of distributed databases. Proteomics 3:1441–1444

53. Joshi T, Yao Q, Franklin LD et al (2010) SoyMetDB: the soybean metabolome database. Proceedings of IEEE International Conference on Bioinformatics & Biomedicine (BIBM 2010), Hong Kong, pp 203–208 54. Washburn MP, Wolters D, Yates JR (2001) Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotechnol 19:242–247 55. O'Farrell PH (1975) High resolution twodimensional electrophoresis of proteins. J Biol Chem 250:4007–4021 56. Komatsu S, Wada T, Abaléa Y et al (2009) Analysis of plasma membrane proteome in soybean and application to flooding stress response. J Proteome Res 8:4487–4499 57. Nouri M-Z, Komatsu S (2010) Comparative analysis of soybean plasma membrane proteins under osmotic stress using gel-based and LC MS/MS-based proteomics approaches. Proteomics 10:1930–1945 58. Griss J, Cote RG, Gerner C et al (2011) Published and perished? The influence of the searched protein database on the long-term storage of proteomics data. Mol Cell Proteomics 10:M111.008490 59. Foster JM, Degroeve S, Gatto L et al (2011) A posteriori quality control for the curation and reuse of public proteomics data. Proteomics 11:2182–2194 60. Wienkoop S, Staudinger C, Hoehenwarter W et al (2012) ProMEX: a mass spectral reference database for plant proteomics. Front Plant Sci 3:125

Plant proteomics: from genome sequencing to proteome databases and repositories.

Proteomic approaches are useful for the identification of functional proteins. These have been enhanced not only by the development of proteomic techn...
386KB Sizes 0 Downloads 0 Views