Molecular BioSystems View Article Online

Published on 13 March 2015. Downloaded by University of California - San Diego on 13/04/2015 07:37:47.

PAPER

Cite this: DOI: 10.1039/c5mb00124b

View Journal

Characterization of proteins in S. cerevisiae with subcellular localizations† Lei Yang,a Dapeng Hao,a Jizhe Wang,a Xudong Xing,a Yingli Lv,a Yongchun Zuo*b and Wei Jiang*a Acquiring comprehensive knowledge of protein in various subcellular localizations is one of the fundamental goals in cell biology and proteomics. Although recent large-scale experimental and proteomics studies of S. cerevisiae protein subcellular localizations are archived in various databases, only a few studies use a systems biology approach to characterize S. cerevisiae proteins at a subcellular localization level. Based on the topological properties and biological properties of S. cerevisiae proteins, we have compared, contrasted and analyzed the statistical properties across eight different subcellular localizations. Significant differences are found in all topological properties and biological properties among eight protein categories. Network topology analysis indicates that the nuclear proteins differ from the other seven protein categories, and

Received 9th February 2015, Accepted 13th March 2015

tend to have the most important topological properties and play an important role in the network, including the highest degree, core number, and betweenness centrality. In the light of the above, we hope

DOI: 10.1039/c5mb00124b

these findings presented in this study may provide important help for protein subcellular localization

www.rsc.org/molecularbiosystems

subcellular localizations.

prediction in S. cerevisiae and provide many new insights for understanding the proteins directly from

1. Introduction A typical cell contains a large number of proteins that are located in specific compartments or organelles, referred to as subcellular localizations.1,2 These subcellular localizations are critical for proteins to perform their function properly. Thus, the subcellular distributions of proteins can provide useful clues for understanding the biological function of proteins.3 For example, the knowledge of protein subcellular localizations is very useful for identifying drug targets during the process of drug development and pathways that regulate the biological processes at the cellular level.4 Therefore, acquiring comprehensive information on the subcellular localizations of proteins is one of the fundamental goals in both cell biology and proteomics. Although, the information about protein subcellular localizations can be acquired by conducting several high throughput experimental techniques, most of these experiments are both costly and time consuming. In particular, the speed of discovering new protein sequences a

College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, PR China. E-mail: [email protected]; Fax: +86 451 8661 5922; Tel: +86 451 8666 9617 b The Key Laboratory of Mammalian Reproductive Biology and Biotechnology of the Ministry of Education, College of Life Sciences, Inner Mongolia University, Hohhot 010021, PR China. E-mail: [email protected]; Fax: +86 471 5227683; Tel: +86 471 5227683 † Electronic supplementary information (ESI) available. See DOI: 10.1039/ c5mb00124b

This journal is © The Royal Society of Chemistry 2015

has increased greatly in the post-genomic era. Thus, there is a huge gap between protein sequences with and without subcellular localization information. To fill this gap, many computational methods have been developed for prediction of subcellular localization from sequence-based features,5–8 such as amino acid composition and pseudo amino acid composition.9–13 The detailed information for predicting protein subcellular localization was provided in two review papers.14,15 Moreover, many new algorithms were established for identifying the subcellular localization in recent years.4,16–24 Proteins do not carry out their functions independently, but interact with each other in the protein–protein interaction (PPI) networks. With the availability of a large number of protein interaction datasets for multiple species, such as HRPD, MIPS, DIP and BioGrid, on the website in recent years,25–29 the study of the PPI networks is becoming increasingly important to study the process of biological systems in a cell. However, most of the networks are too complex to be easily understood. Using graph theoretical concepts to investigate the topological properties of the complex networks, this problem can be overcome. In recent years, with the rapid identification of PPI networks, the topological analysis of large-scale data sets to study the topological properties of different protein groups in the PPI networks has begun to prevail.30–37 In addition, the recent advent of experiments that measure biological properties on a genome-wide scale has motivated us to conduct a statistical study on gene activity patterns in cells. However, until recently, neither network-based topological analysis

Mol. BioSyst.

View Article Online

Published on 13 March 2015. Downloaded by University of California - San Diego on 13/04/2015 07:37:47.

Paper

Molecular BioSystems

nor biological properties have been used in the dataset of yeast proteins at a subcellular localization level. In this study, we present the analysis of the topological properties and biological properties of S. cerevisiae proteins in different subcellular localizations. First, the S. cerevisiae proteins are downloaded from the Saccharomyces Genome database.38 Second, using experimentally validated subcellular localization information presented in the MIPS database,39 the S. cerevisiae proteins are classified into eight categories: (1) cytoplasmic, (2) endoplasmic reticulum, (3) extracellular, (4) golgi, (5) integral membrane, (6) mitochondrial, (7) nuclear, and (8) plasma membrane proteins. Third, we map the proteins into the yeast physical interaction network, determine the topological properties, identify significant differences in all topological properties among eight categories, and the analysis results show that the nuclear proteins tend to have the most important topological properties and play an important role in the yeast physical interaction network. Finally, 12 biological properties including the phyletic age,40 dN/dS value,41 mRNA expression level (EL), codon adaptation index (CAI),42 and codon bias index (CBI)43 are also used to characterize the S. cerevisiae proteins at a subcellular localization level. Based on the analyses, significant differences are also found in all biological properties among eight protein categories. As demonstrated by a series of recent publications44–49 in response to the guidelines proposed in ref. 9, as cited by the authors, to establish a really useful sequencebased statistical predictor for a biological system, we need to consider the following procedures: (a) construct or select a valid benchmark dataset to train and test the statistical model; (b) formulate the biological sequence samples using an effective mathematical expression that can truly reflect their intrinsic correlation with the target to be analyzed; (c) introduce or develop a powerful test to operate the statistical model; (d) properly perform some tests to objectively evaluate the anticipated results of the statistical model; and (e) establish a user-friendly web-server for the statistical model that is accessible to the public. Below, let us describe how to deal with these steps.

2. Materials and methods 2.1.

Datasets

Protein–protein interaction datasets of S. cerevisiae were downloaded from the Saccharomyces Genome database (http://yeast genome.org/) on April 28, 2014.38 Only the S. cerevisiae physical interactions are used in our study. 3687 nodes and 23 369 edges are contained in the main component of the yeast physical interaction network. The subcellular localization of each protein for S. cerevisiae was downloaded from the MIPS database (http:// mips.helmholtz-muenchen.de/genre/proj/yeast/) on February 20, 2014.39 As pointed out in a recent review paper,50 many proteins have multiple subcellular locations. To deal with these multiplex proteins, a series of web-servers have been developed, such as iLoc-Euk,16 iLoc-Hum,4 iLoc-Plant,18 iLoc-Gpos,20 iLoc-Gneg,16 iLoc-Virus,19 and iLoc-Animal;21 they can be used to cope with the multiple location problems in eukaryotic, human, plant,

Mol. BioSyst.

Fig. 1 Composition of our dataset according to the subcellular localizations for S. cerevisiae proteins.

Gram-positive, Gram-negative, viral, and animal proteins, respectively. In this study, a protein will be removed if it corresponds to multi-subcellular localizations. 2851 yeast proteins have the unique subcellular localization, which are classified into eight categories: (1) 1179 cytoplasmic proteins, (2) 160 endoplasmic reticulum proteins, (3) 26 extracellular proteins, (4) 47 golgi proteins, (5) 69 integral membrane proteins, (6) 592 mitochondrial proteins, (7) 694 nuclear proteins, and (8) 84 plasma membrane proteins (Fig. 1). The other subcellular localizations are not used in our dataset because too few proteins are contained in these categories in the MIPS database (http://mips.helmholtz-muenchen.de/genre/proj/yeast/). The Gene Ontology annotations,51 codon adaptation index (CAI),42 codon bias index (CBI),43 frequency of optimal codons (Fop),52 and the protein length of S. cerevisiae genes were retrieved from the Saccharomyces Genome database (http://yeastgenome.org/) on April 28, 2014.38 The Interpro dataset,53 KEGG dataset54 and Pfam dataset55 of S. cerevisiae were retrieved from org.Sc.sgd.db (version 2.14.0) using the R software (version 3.0.2). The YEASTRACT database (http://www.yeastract.com/)56 which is a database for providing publicly available information on transcription regulatory associations in S. cerevisiae is used to provide the transcription factors and their target genes in this study. These datasets were downloaded from the YEASTRACT database (http://www.yeastract.com/) on April 28, 2014. The phyletic age of genes is defined by the evolutionarily most distant species group in which homologs can be found.57 The number of homologous genes (family size) in the same genome is defined as paralogous gene number. In this study, the phyletic ages and paralogous gene numbers of S. cerevisiae were downloaded from the OGEE database (http://ogeedb.embl.de) (build: 304).40 The number of non-synonymous substitutions per non-synonymous site is denoted as dN, whereas the number of synonymous substitutions per synonymous site is denoted as dS. The dN/dS for each gene is computed by dividing dN by dS, which can be used as an indicator of selective pressure acting on a gene.41 In 2005, Wall et al. measured the adjusted dN/dS values for S. cerevisiae,58

This journal is © The Royal Society of Chemistry 2015

View Article Online

Published on 13 March 2015. Downloaded by University of California - San Diego on 13/04/2015 07:37:47.

Molecular BioSystems

Paper

and these datasets (http://www.pnas.org/content/suppl/2005/ 03/29/0501761102.DC1/01761Table 4.html) were used in our study for studying the statistical significances among yeast proteins in different subcellular localizations. The mRNA expression levels (EL) of S. cerevisiae genes were obtained from the work of Greenbaum et al. (http://bioinfo.mbb.yale.edu/genome/expression/ translatome/ref.txt),59 which is a comprehensive reference dataset for 6250 S. cerevisiae genes. 2.2.

Topological properties

In this study, the degree, clustering coefficient,60 topological coefficient,61 average shortest path (ASP), closeness centrality, and betweenness centrality62 are calculated using the NetworkAnalyzer software. The core number63,64 and average degree of nearest neighbors (ADNN)65 are calculated using the MatlabBGL package. The edge percolation component (EPC),66 maximum neighborhood component (MNC)67 and density of maximum neighborhood component (DMNC)67 are calculated using the Java plug-in cytoHubba, which is implemented in Cytoscape software (version 2.8.3). The Moduland software is used to calculate the bridgeness,68 community centrality68 and overlap.68 Finally, the mcode cluster score69 is calculated using MCODE software. In addition, the common function index (CFI) is used to measure the amount of common GO annotations of adjacent nodes, which is defined as:70–72 X CFIðiÞ ¼ dijB þ dijC þ dijM j

where j represents any node adjacent to node i. dBij , dCij and dM ij are the deepest ontology depths of common function shared by i and j in the category of biological process (B), cellular component (C) and molecular function (M), respectively. 2.3.

Biological properties

In this study, the number of regulating transcription factors (TFs), KEGG pathways in which a S. cerevisiae gene is involved, Interpro motifs contained in a S. cerevisiae protein, and Pfam protein families in which a S. cerevisiae protein is involved are defined as transcription factor (TF), KEGG, motif and family numbers, respectively. In addition, the phyletic age, dN/dS value, mRNA EL, paralogous gene number, CAI, CBI, Fop, and protein length are also used in this study. The properties of each S. cerevisiae gene are regarded as the properties of this gene product. The detailed information on each property is provided in Table 1. 2.4.

Statistical analysis

In this study, the Kruskal–Wallis (KW) test served as a nonparametric test that can be used for assessing the statistical significance of the measures among proteins in different subcellular localizations. Correlations are estimated by using nonparametric statistics (Spearman’s rank correlation coefficients). The Z-test is used to calculate the P-values for Spearman’s correlations. All the statistical analyses used in this study are done using the freely available R software (version 3.0.2).

This journal is © The Royal Society of Chemistry 2015

Table 1 Properties and their corresponding numbers in different subcellular localizationsa

Property

Cyt

ER

Ext Golgi IM Mit Nuc PM

Topological properties TF number Phyletic age dN/dS Paralogous gene number Motif number Family number KEGG number Protein length CAI mRNA EL CBI Fop

760 1177 866 599 415 986 993 412 1179 1179 1154 1179 1179

96 160 127 99 46 141 142 78 160 160 153 160 160

8 26 19 5 15 25 24 12 26 26 26 26 26

40 47 46 27 16 45 44 7 47 47 47 47 47

17 69 47 29 41 60 58 5 69 69 67 69 69

334 591 437 348 155 508 504 163 592 592 555 592 592

599 692 532 373 102 605 596 224 692 692 679 692 692

43 84 78 33 53 79 80 25 84 84 83 84 84

a

In this table, Cyt indicates cytoplasm, ER indicates endoplasmic reticulum, Ext indicates extracellular, IM indicates integral membrane, Mit indicates mitochondrion, Nuc indicates nucleus, and PM indicates plasma membrane.

3. Results and discussion 3.1.

Analysis of topological properties

In this section, we analyze the topological properties of yeast proteins with different subcellular localizations in the yeast physical interaction network; all the results are shown in Table 2, Fig. 2, 3A and B and Fig. S1–S3 (ESI†). First, we investigate the differences in degrees of proteins with different subcellular localizations. As shown in Table 2, a significant category difference is found among the degrees (P-value o 2.20  1016, KW test). In the yeast physical interaction network, the nuclear proteins have the highest average degree (average degree = 20.8), followed by the golgi proteins (average degree = 12.4). In contrast, the lowest average degree is observed for the integral membrane proteins (average degree = 4.1). It is easy to understand the high degrees of the nuclear proteins, because many essential biological processes take place in the nucleus, and more proteins would interact with them, as the nuclear proteins play a crucial role in controlling gene expression and mediating the replication of DNA during the cell cycle. The violin plots, as shown in Fig. 2, combined the box plot with density traces, and show a more detailed representation of the degrees for different subcellular localizations. It is evident that the proteins in each category show an appreciable spread of degrees, and the distributions for the nuclear proteins with the highest degree levels are more spread out than for those with lower degree levels. The results in Fig. 2 illustrate that the degree distributions are roughly exponential, although the degree distribution tails are somewhat longer. The violin plot shape may reflect the fact that the degree of many proteins is at a basal level whereas only a smaller number of proteins have extremely high degrees in the yeast physical interaction network. Notably, the compared results show that the cumulative frequency distribution of degrees of the nuclear proteins stands out from those of the other seven categories, as the nuclear proteins have the highest average degree among all categories (Fig. 3B).

Mol. BioSyst.

View Article Online

Paper

Published on 13 March 2015. Downloaded by University of California - San Diego on 13/04/2015 07:37:47.

Table 2

Molecular BioSystems Topological properties among different subcellular localizationsa

Property

Cyt

ER

Ext

Golgi

IM

Mit

Nuc

PM

P-value

Degree ADNN Clustering coefficient Topological coefficient ASP Core number Closeness centrality Betweenness centrality EPC MNC DMNC Mcode cluster score Bridgeness Community centrality Overlap CFI

9.1 32.281 0.231 0.224 4.080 4.8 0.250 3736 20.685 5.7 0.424 2.953 43.597 1253 13.465 171.0

9.8 18.310 0.369 0.274 4.383 4.9 0.232 3483 9.507 6.5 0.583 4.520 18.066 998 10.119 229.4

4.5 26.213 0.213 0.228 4.455 3.3 0.229 643 8.075 3.0 0.394 2.708 10.376 1196 14.771 47.3

12.4 21.032 0.374 0.211 4.253 5.6 0.239 4864 10.482 8.3 0.572 5.048 31.327 1622 8.205 241.6

4.1 34.915 0.115 0.262 4.426 2.8 0.229 1125 5.680 2.5 0.191 1.278 40.119 236 11.005 45.9

7.0 24.766 0.262 0.232 4.690 3.4 0.217 5202 5.588 4.8 0.361 2.303 27.357 739 6.556 203.2

20.8 34.410 0.375 0.195 3.816 8.5 0.265 6363 62.161 15.1 0.646 6.492 83.624 15 481 10.644 643.6

6.6 27.672 0.197 0.273 4.370 3.2 0.233 4217 8.109 3.488 0.313 1.631 34.717 422.977 14.473 112.14

o2.20 o2.20 o2.20 o4.27 o2.20 o2.20 o2.20 o6.69 o2.20 o2.20 o2.20 o2.20 o2.20 o2.20 o2.20 o2.20

               

1016 1016 1016 102 1016 1016 1016 1010 1016 1016 1016 1016 1016 1016 1016 1016

a

In this table, Cyt indicates cytoplasm, ER indicates endoplasmic reticulum, Ext indicates extracellular, IM indicates integral membrane, Mit indicates mitochondrion, Nuc indicates nucleus, and PM indicates plasma membrane; the best results are shown by bold values.

Fig. 2 Violin plots of the degree of S. cerevisiae proteins in eight different subcellular localizations. The distributions of the nuclear proteins are more spread out than those of proteins in other subcellular localizations.

As per the definition of ADNN provided, the ADNN is the average degree of nearest neighbors of a node in the network. The higher the ADNN of a node, the more likely this node tends to interact with hubs in the PPI network. The average ADNN for each category is also calculated and compared. The comparative results indicate that the average ADNN values for cytoplasmic, endoplasmic reticulum, extracellular, golgi, integral membrane, mitochondrial, nuclear, and plasma membrane proteins are 32.281, 18.310, 26.213, 21.032, 34.915, 24.766, 34.410, and 27.672, respectively. Therefore, the integral membrane proteins have the highest average ADNN, followed by the subset of proteins with the nuclear localization. Although the integral membrane proteins have the lowest average degree among eight categories, the integral membrane proteins are more likely to

Mol. BioSyst.

interact with high degree nodes in the yeast physical interaction network. So, we want to know why do the integral membrane proteins have higher ADNN than other seven categories? To answer this question, we quantify and compare the correlation pattern in the yeast physical interaction network. The correlations between the degrees and ADNNs are shown in Fig. 3C. As illustrated in Fig. 3C, the ADNN shows a gradual decline with the degree, suggesting that the low degree nodes are more likely to associate with nodes of high degree. This may be the reason why the integral membrane proteins have the highest average ADNN among eight categories. Besides, these results may reflect the factor that the integral membrane proteins act as the gateways to cells, which give them more chances to interact with important proteins in cells.73 The clustering coefficient is defined to measure the modularity of the neighborhood of a node. In the PPI network, the proteins with high clustering coefficient tend to involve in the close-connected module, whereas the proteins with low clustering coefficient tend to involve in the inter-modular module. Some previous work found that most of the nuclear proteins were significantly enriched in protein complexes, and the protein complexes are more likely to have a high clustering coefficient.74 Based on the above results, we expect that the nuclear proteins would have higher clustering coefficient. As shown in Table 2, the average clustering coefficient of the nuclear proteins is indeed higher than those of other seven categories, and the difference between them is significant (P-value o 2.20  1016; KW test). By contrast, the average clustering coefficient of the integral membrane proteins is the lowest among eight categories (average clustering coefficient = 0.115, Table 2). The topological coefficient is a classical measure of the extent to which a protein shares interaction partners with other proteins in the networks. The difference in the topological coefficient among proteins in eight subcellular localizations is investigated by the KW test, and the difference among them is significant (P-value o 4.27  102; KW test). The comparative results indicate that the average topological coefficient for the

This journal is © The Royal Society of Chemistry 2015

View Article Online

Published on 13 March 2015. Downloaded by University of California - San Diego on 13/04/2015 07:37:47.

Molecular BioSystems

Paper

Fig. 3 (A) Average degrees with error bars of eight protein categories. Error bar is the standard error. The nuclear proteins have the highest average degree in eight protein categories. (B) Cumulative fraction distributions of the degrees for eight protein categories. The cumulative frequency distribution of the nuclear proteins stands out from those of the other seven categories, as the nuclear proteins have the highest average degree in all categories. (C) Degree versus ADNN for 3687 S. cerevisiae proteins in the yeast physical interaction network. The line shown is the linear regression. (D) Plots of excess retention for eight protein categories. In this figure, we can clearly see that the excess retentions of the nuclear and plasma membrane proteins display the marked increase with the core numbers, whereas the other six categories display the marked decrease with the core numbers. This plot clearly demonstrates that among all subcellular localizations, the nuclear and plasma membrane proteins are more likely to be located in the central place of the yeast physical interaction network.

subset of proteins with endoplasmic reticulum localization is 0.274, which is the highest average value in this study. Although, the nuclear proteins have the highest average clustering coefficient among eight subcellular categories, the average topological coefficient of the nuclear proteins is the lowest when compared with other seven categories. ASP is used to measure the efficiency of information or mass transport in the network, which has an important role in communication and transport. A short ASP indicates the quick transfer of information and reduced costs. In the metabolic network, the ASP can be used to judge the efficiency of mass transfer. In this study, as for the ASP, the nuclear proteins are on average lower than those of other seven categories, highly significant according to the KW test (P-value o 2.20  1016; KW test), revealing that the nuclear proteins could quickly convey information on the yeast physical interaction network. In contrast, the highest average ASP is observed for the mitochondrial proteins. The difference in the core number of the eight protein categories is also investigated by the KW test, and there is a significant category difference between the core numbers (P-value o 2.20  1016; KW test). According to the results listed in Table 2, the average core number for cytoplasmic, endoplasmic reticulum, extracellular, golgi, integral membrane, mitochondrial, nuclear, and plasma membrane proteins are 4.8, 4.9, 3.3, 5.6, 2.8,

This journal is © The Royal Society of Chemistry 2015

3.4, 8.5, and 3.2, respectively, which suggest that the nuclear proteins have on average 2.99 times more core numbers than integral membrane proteins. The violin plots, as shown in Fig. S2 (ESI†), illustrate that the density traces of the nuclear proteins are higher than those of the other seven categories in the region of high core numbers. Moreover, it can be found that the cumulative fraction distribution of the core numbers for the nuclear proteins stands out from those of the other categories (Fig. S3, ESI†). Finally, we plot the excess retention64,75 of eight subcellular categories. As illustrated in Fig. 3D, the excess retentions of the nuclear and plasma membrane proteins display the marked increase with the core numbers, whereas the other six categories display the marked decrease with the core numbers. The analysis results presented above confirmed that among all subcellular localizations, the nuclear proteins are more likely to be located at the backbone of the yeast physical interaction network. The closeness centrality, which is defined as the reciprocal of the ASP, is used to measure the centrality of the proteins in the yeast physical interaction network. A protein with a high closeness centrality means that this protein is located at the central place of the network. In this study, the closeness centrality is also found to be statistically discernible among eight subcellular categories. As shown in Table 2, the average closeness centrality is significantly higher for the nuclear proteins than

Mol. BioSyst.

View Article Online

Published on 13 March 2015. Downloaded by University of California - San Diego on 13/04/2015 07:37:47.

Paper

for the other seven categories (P-value o 2.20  1016; KW test). In contrast, the average closeness centrality for the mitochondrial proteins is 0.217, which is the lowest among eight subcellular categories. In addition to closeness centrality, the betweenness centrality is also calculated to measure the centrality of a protein in the network, which is defined as the number of shortest paths that pass through a protein in the network. As shown in Table 2, we find that the betweenness centrality measure supports the observation that the nuclear proteins show much stronger centrality than proteins in other seven categories, and the P-value is less than 6.69  1010. In this study, Hubba, a recently developed web-based service,67 is designed to explore the important nodes or hubs by three characteristic analysis methods in the yeast physical interaction network, including the EPC, MNC and DMNC. Some previous studies have shown that proteins with high degrees were more likely to be hubs in the PPI networks.76 Based on our analysis of the degree of eight protein categories, we expect that the nuclear proteins would have high EPC, MNC and DMNC when compared with proteins located in other seven subcellular localizations. As shown in Table 2, the average EPC, MNC and DMNC values of the nuclear proteins are 62.161, 15.1, and 0.646, respectively, which are significantly higher than those of other seven categories, and all the corresponding P-values are less than 2.20  1016. These results further confirmed that the nuclear proteins with larger degrees are hubs and important nodes in the yeast physical interaction network. The difference in the mcode cluster score of eight protein categories is also investigated. The compared results indicate that the average mcode cluster scores for cytoplasmic, endoplasmic

Molecular BioSystems

reticulum, extracellular, golgi, integral membrane, mitochondrial, nuclear, and plasma membrane proteins are 2.953, 4.520, 2.708, 5.048, 1.278, 2.303, 6.492, and 1.631, respectively, and the difference among them is significant (P-value o 2.20  1016; KW test). The results mean that the nuclear proteins have the highest average mcode cluster scores and integral membrane have the lowest average mcode cluster scores in the yeast physical interaction network. In other words, the nuclear proteins are more likely to construct the mcode clusters in the yeast physical interaction network than other protein categories. Additionally, a Java plug-in ModuLand is applied to the yeast physical interaction network, which calculates the bridgeness, community centrality, and overlap for each node. Bridgeness represents the smaller of the two modular assignments of a node in two adjacent modules, summed up for every module pairs.68 The bridgeness of a node will be high if this node behaves as a bridge between multiple pairs of modules in many cases, or between a single pair in the yeast physical interaction network. The community centrality is defined as the sum of local influence zones of all network nodes.68 The overlap value of a node is calculated from module assignment values at the second hierarchical level.68 As for bridgeness and community centrality, there are positive Spearman correlation values between these two properties and the degree (r = 0.7858 with P-value o 2.20  1016 for bridgeness, Fig. 4A; r = 0.9458 with P-value o 2.20  1016 for community centrality, Fig. 4B; Z-test). On the basis of the eight protein categories described earlier, the bridgeness, community centrality, and overlap are compared among proteins in eight categories, and there are significant differences in these properties among eight categories (Table 2).

Fig. 4 (A) Degree versus (A) bridgeness, (B) community centrality, (C) overlap, and (D) CFI. The line shown is the linear regression.

Mol. BioSyst.

This journal is © The Royal Society of Chemistry 2015

View Article Online

Published on 13 March 2015. Downloaded by University of California - San Diego on 13/04/2015 07:37:47.

Molecular BioSystems

Paper

Intriguingly, the bridgeness and community centrality of the nuclear proteins are the highest when compared with proteins with the other seven subcellular localizations. These results are expected, as the nuclear proteins have important roles and show many important topological properties in the yeast physical interaction network, and there is a significant correlation between these two properties and the degree (Fig. 4A and B). By contrast, it can also be noted that the average overlap values of cytoplasmic, endoplasmic reticulum, extracellular, golgi, integral membrane, mitochondrial, nuclear, and plasma membrane proteins are 13.465, 10.119, 14.771, 8.205, 11.005, 6.556, 10.644, and 14.473, respectively, and the difference in overlap values among eight protein categories is significant according to the KW test (P-value o 2.20  1016; KW test), revealing the highest average overlap of the extracellular proteins and the lowest average overlap of the mitochondrial proteins. In this case, the middling average overlap level is observed for the nuclear proteins. We speculate that these differences probably result from the structures of the second hierarchical level in the yeast physical interaction network. As shown in Fig. 4C, no significant correlation is found between the degree and the overlap value, and this may be the reason for difference in results obtained in this case. The CFI is used to measure the amount of common GO function of adjacent nodes in the yeast physical interaction network. As shown in Table 2, high CFI values can be observed for the nuclear proteins, low CFI values for the extracellular and integral membrane proteins, and middling CFI values for the cytoplasmic, endoplasmic reticulum, golgi, mitochondrial and plasma membrane proteins. These results are expected, as there is a positive Spearman correlation between the CFI and the degree (Spearman correlation coefficient, 0.9209 with P-value o 2.20  1016, Fig. 4D; Z-test), and the average degree value for each protein category obtained above. 3.2.

Analysis of biological properties

In this section, the biological properties of eight protein categories are analyzed; all the compared results are illustrated in Table 3, Fig. S4–S6 (ESI†). As shown in some previous research studies, the nodes with low degrees tend to be regulated by a larger number of transcription factors than the hubs in the yeast physical interaction

Table 3

network.33 These observations provide indirect evidence that nodes with high TF numbers are likely to be on the periphery in the yeast physical interaction network, whereas nodes with low TF numbers tend to be hubs in the yeast physical interaction network. As shown in Table 2, high average degree can be observed for the nuclear proteins, and low average degrees for the extracellular, integral membrane and plasma membrane proteins. Based on the above results, we expect that the nuclear proteins would be regulated by a small number of transcription factors, and the extracellular, integral membrane and plasma membrane proteins would be regulated by a large number of transcription factors. Investigation of the TF numbers shows that there is a difference in the TF numbers among eight protein categories (P-value o 2.2  1016; KW test). The average TF number of the nuclear proteins is the second lowest among eight protein categories, and the extracellular proteins have the highest average TF number. Phyletic age is defined by the evolutionarily most distant species group in which homologs can be found.57 Some previous studies found that the phyletic age of genes is correlated with several functional features, structure, and evolution of genes.57 The eight protein categories are subjected to the evolutionary characterization in which the phyletic age for each category is determined. The average phyletic age for each category is calculated and compared among eight protein categories. According to the KW test, there is a significant category difference in the phyletic age (P-value o 8.55  1013; KW test). As shown in Table 3, the average phyletic ages for cytoplasmic, endoplasmic reticulum, extracellular, golgi, integral membrane, mitochondrial, nuclear, and plasma membrane proteins are 4.1, 3.6, 2.4, 3.5, 3.6, 3.8, 3.6, and 3.9, respectively. Therefore, the extracellular proteins are the youngest in eight protein categories. In contrast, the oldest average phyletic age is observed for a subset of proteins with a cytoplasmic localization. We then seek to analyze the yeast proteins with respect to their dependence of subcellular localizations on the evolutionary rates. To compare evolutionary rates of eight protein categories, dN/dS is used. The average dN/dS values of all protein categories are shown in Table 3. From Table 3, we can observe that dN/dS values of the extracellular proteins are also significantly different

Biological properties among different subcellular localizationsa

Property

Cyt

ER

Ext

Golgi

IM

Mit

Nuc

PM

P-value

TF number Phyletic age dN/dS Paralogous gene number Motif number Family number KEGG number Protein length mRNA EL CAI CBI Fop

34.6 4.1 0.075 2.9 2.1 1.4 2.0 492.9 6.341 0.228 0.180 0.500

33.9 3.6 0.076 3.3 1.5 1.2 2.1 417.8 2.797 0.177 0.139 0.482

59.2 2.4 0.105 2.4 1.7 1.4 1.8 626.2 2.455 0.221 0.241 0.536

26.0 3.5 0.065 3.3 1.6 1.3 2.0 593.3 2.176 0.156 0.087 0.456

33.9 3.6 0.085 4.3 2.0 1.2 1.4 541.7 0.997 0.121 0.055 0.406

29.9 3.8 0.070 4.0 1.9 1.3 2.5 418.0 1.520 0.151 0.094 0.450

28.9 3.6 0.083 3.0 2.0 1.4 1.7 541.6 1.614 0.152 0.074 0.441

44.1 3.9 0.071 12.0 2.4 1.2 1.1 698.5 1.756 0.165 0.135 0.468

o2.20 o8.55 o1.27 o1.12 o9.55 o7.66 o1.42 o2.20 o4.15 o2.20 o2.20 o2.20

           

1016 1013 102 1016 107 107 1011 1016 107 1016 1016 1016

a In this table, Cyt indicates cytoplasm, ER indicates endoplasmic reticulum, Ext indicates extracellular, IM indicates integral membrane, Mit indicates mitochondrion, Nuc indicates nucleus, and PM indicates plasma membrane; the best results are shown by bold values.

This journal is © The Royal Society of Chemistry 2015

Mol. BioSyst.

View Article Online

Published on 13 March 2015. Downloaded by University of California - San Diego on 13/04/2015 07:37:47.

Paper

from those of the other seven categories (P-value o 8.55  1013; KW test), with the average dN/dS values of extracellular proteins higher than those of others (average dN/dS = 0.105). By this we mean that the extracellular proteins are under the most relaxed selective pressure in all protein categories, which is consistent with the evolutionary rate analysis of Wang et al.77 In the evolutionary rate analysis of Wang et al., they demonstrated that the human extracellular proteins were under the most relaxed selective pressure in all protein categories, and the human nuclear proteins were the most conserved in all protein categories.77 However, in this study, the golgi proteins are found to be the most conserved category. The difference between our analysis and the previous studies may be explained by two factors. First, different subcellular localizations were used in the previous studies and our study. The other possibility is that the yeast and human proteins in their own subcellular localizations have different evolutionary patterns, leading to the difference in the dN/dS value from different subcellular localizations. The paralogous gene number is defined as the number of homologous genes in the same genome. Based on this definition, we can investigate the paralogous gene numbers of yeast among proteins in eight subcellular localizations. Compared with other protein categories, the plasma membrane proteins, on average, have more paralogous genes in the same genome, highly significant according to the KW test (P-value o 1.12  1016; KW test). In general, more motifs contained in a protein sequence indicate that the protein has more biological functions. The family number can reflect the number of functional domains contained in the protein sequence. For the motif number and family number of each category, the KW test is used to examine the difference in motif number and family number among proteins in eight subcellular localizations. The significant differences in the motif number and family number in eight protein categories are found (P-value o 9.55  107 for motif number, and P-value o 7.66  107 for family number; KW test). Furthermore, we consider the average KEGG numbers and protein lengths of eight protein categories, the compared results are shown in Table 3. The mitochondrial proteins are involved in many important tasks, such as cellular metabolism, signaling, and cellular differentiation, and may play a role in the aging process.78 Based on these results, we expect that the mitochondrial proteins may be involved in more KEGG pathways. As expected, we indeed find that the average KEGG number of mitochondrial proteins is the largest among eight protein categories, and the difference in KEGG numbers is significant according to the KW test (P-value o 1.42  1011; KW test). In general, the length of a protein sequence is determined by its function, and a long protein sequence means that more biological information and function may be contained in the protein. Analysis of the protein length of eight protein categories shows that the protein lengths of the plasma membrane proteins are on average longer than those of other categories, highly significant according to the KW test (P-value o 2.20  1016; KW test), suggesting that more biological information may be contained in the plasma membrane protein sequences. The recent availability of experiments that measure mRNA EL on a genome-wide scale gives us an opportunity to obtain a

Mol. BioSyst.

Molecular BioSystems

comprehensive view of gene activity patterns in cells. Previous research has investigated the relationship between protein subcellular localizations and mRNA ELs for yeast.79 This research has shown that low mRNA ELs can be observed for the nuclear and membrane proteins, high mRNA ELs can be observed for the cytoplasmic proteins, and middling mRNA ELs can be observed for the endoplasmic reticulum and golgi proteins.79 In this study, the compared results show that the average mRNA ELs for cytoplasmic, endoplasmic reticulum, extracellular, golgi, integral membrane, mitochondrial, nuclear, and plasma membrane proteins are 6.341, 2.797, 2.455, 2.176, 0.997, 1.520, 1.614, and 1.756, respectively, which are in good agreement with previous reports,79 highly significant as assessed by the KW test (P-value o 4.15  107; KW test). The mRNA ELs in the categories of ‘transcription’ and ‘transport’ are low. By contrast, the proteins in the ‘protein synthesis’ and ‘energy’ categories often have high mRNA ELs. Proteins with nuclear localizations are often involved in transcription; proteins with cell membrane localizations are often involved in extracellular transport. The high mRNA EL of the cytoplasmic protein is largely due to the ribosomal proteins, which are in the cytoplasm. In addition, the correlation between mRNA ELs and subcellular localizations may be related to the volumes of the various subcellular compartments. For example, the volumes of the cytoplasm are much larger than those of the other compartments. To achieve the same effective concentration, the mRNA EL for freely diffusing proteins destined for larger compartments may need to be higher than for smaller ones.79 We speculate that these may be the major reasons for the results obtained in our study. In DNA or RNA sequences, the similarity between the synonymous codon usage of a gene and the synonymous codon frequency of a reference set can be quantified by the CAI. The CAI is the one of the most widespread indicators for measuring codon usage bias, and important genes can be characterized by high CAI values in different species.42 The CAI can be used as a strong proxy for the expression level of a gene in many bacteria and small eukaryotes.33 Based on the observations discussed above, the CAI values of each protein category would be similar to the results of the mRNA ELs. As listed in Table 3, the results presented here, for all the subcellular localizations studied, are consistent with the results of mRNA ELs, highly significant according to the KW test with the P-value o 2.20  1016. To further estimate the codon usage bias of proteins in eight subcellular localizations, the CBI and Fop of S. cerevisiae proteins are also investigated. The CBI is another often used indicator to reflect the codon usage bias. Fop is an indicator to measure the optimization level of synonymous codon choice of each gene for the translation process. For the measures of the CBI and Fop, the KW tests are applied for the comparison among eight protein categories; significant differences are found in the CAI and Fop, and all the P-values are less than 2.20  1016 when assessed by the KW tests. As shown in Table 3, the extracellular proteins have the highest average CBI value, followed by cytoplasmic, endoplasmic reticulum, plasma membrane, mitochondrial, golgi, nuclear, and integral membrane proteins. Similar results are obtained for analysis of the Fops. By this we mean that the overall trend of high codon usage bias

This journal is © The Royal Society of Chemistry 2015

View Article Online

Molecular BioSystems

Published on 13 March 2015. Downloaded by University of California - San Diego on 13/04/2015 07:37:47.

in extracellular proteins, middling codon usage bias in golgi, nuclear and plasma membrane proteins, and low codon usage bias levels in integral membrane proteins can be observed to be in agreement with that in the CAI, CBI and Fop.

4. Conclusion In this work, 16 topological properties and 12 biological properties are used to analyze S. cerevisiae proteins at a subcellular localization level. This is the first time these topological and biological properties are systematically compared for S. cerevisiae proteins in different subcellular localizations. Significant differences are found in these properties among proteins in eight subcellular localizations. Compared with other protein categories, the nuclear proteins, on average, tend to have a higher degree, clustering coefficient, core number, closeness centrality, betweenness centrality, EPC, MNC, DMNC, mcode cluster score, bridgeness, community centrality, and CFI. In addition, the average ASP and topological coefficient of the nuclear proteins are the lowest in all categories. In other words, the proteins with a nuclear localization possess an important role in the yeast physical interaction network for most of the topological properties. Analysis of 12 biological properties further elucidates significant differences among eight protein categories. The extracellular proteins, on average, have the highest average TF number, CBI and Fop, and the youngest phyletic age, under the most relaxed selective pressure among eight protein categories. The plasma membrane proteins have the highest paralogous gene number and motif number in all categories, and possess longer protein sequences than those of other seven categories. The nuclear proteins are involved in more protein families, however more KEGG pathways are contained in the mitochondrial proteins. Finally, the highest expression level and CAI value can be observed for the cytoplasmic proteins. We anticipate that these findings should help us for protein subcellular localization prediction, and contribute to elucidation of the functional mechanisms of a biological system. However, some limitations still exist in our study that suggest avenues for future work, for example, the reasons behind these results needed more effort to investigate. With more and more dataset available on the websites, we can investigate the proteins with different subcellular localizations in more species, and similar trends may be detectable in these species. As pointed out by Chou and Shen,80 and emphasized and demonstrated in a series of recent publications,45–49,81–83 user-friendly and publicly accessible web-servers present the future direction for developing practically more useful models, simulated methods, predictors, or demonstrating new and novel structures; we shall make efforts in our future work to provide a web-server for the approach and findings presented in this paper.

Conflict of interest The authors declare that they have no conflict of interest.

This journal is © The Royal Society of Chemistry 2015

Paper

Acknowledgements This work was supported by the Scientific Research Fund of Heilongjiang Provincial Health Department (No. 2012797).

References 1 A. Kumar, S. Agarwal, J. A. Heyman, S. Matson, M. Heidtman, S. Piccirillo, L. Umansky, A. Drawid, R. Jansen and Y. Liu, Genes Dev., 2002, 16, 707–719. 2 G. Kumar and S. Ranganathan, BMC Bioinf., 2010, 11, S9. 3 W. K. Huh, J. V. Falvo, L. C. Gerke, A. S. Carroll, R. W. Howson, J. S. Weissman and E. K. O’Shea, Nature, 2003, 425, 686–691. 4 K. C. Chou, Z. C. Wu and X. Xiao, Mol. BioSyst., 2012, 8, 629–641. 5 K. C. Chou, Proteins, 2001, 43, 246–255. 6 H. Lin, H. Ding, F. B. Guo, A. Y. Zhang and J. Huang, Protein Pept. Lett., 2008, 15, 739–744. 7 H. Lin, H. Ding, F. B. Guo and J. Huang, Mol. Diversity, 2010, 14, 667–671. 8 C. Ding, L. F. Yuan, S. H. Guo, H. Lin and W. Chen, J. Proteomics, 2012, 77, 321–328. 9 K. C. Chou, J. Theor. Biol., 2011, 273, 236–247. 10 K. C. Chou, Bioinformatics, 2005, 21, 10–19. 11 H. Ding and D. M. Li, Amino Acids, 2015, 47, 329–333. 12 P. P. Zhu, W. C. Li, Z. J. Zhong, E. Z. Deng, H. Ding, W. Chen and H. Lin, Mol. BioSyst., 2015, 11, 558–563. 13 H. Lin, W. Chen, L. F. Yuan, Z. Q. Li and H. Ding, Acta Biotheor., 2013, 61, 259–268. 14 K. C. Chou and H. B. Shen, Anal. Biochem., 2007, 370, 1–16. 15 K. Nakai, Adv. Protein Chem., 2000, 54, 277–344. 16 X. Xiao, Z. C. Wu and K. C. Chou, PLoS One, 2011, 6, e20592. 17 K. C. Chou, Z. C. Wu and X. Xiao, PLoS One, 2011, 6, e18258. 18 Z. C. Wu, X. Xiao and K. C. Chou, Mol. BioSyst., 2011, 7, 3287–3297. 19 X. Xiao, Z. C. Wu and K. C. Chou, J. Theor. Biol., 2011, 284, 42–51. 20 Z. C. Wu, X. Xiao and K. C. Chou, Protein Pept. Lett., 2012, 19, 4–14. 21 W. Z. Lin, J. A. Fang, X. Xiao and K. C. Chou, Mol. BioSyst., 2013, 9, 634–644. 22 M. Mandal, A. Mukhopadhyay and U. Maulik, Med. Biol. Eng. Comput., 2015, 1–14, DOI: 10.1007/s11517-014-1238-7. 23 S. Mei, J. Theor. Biol., 2012, 310, 80–87. 24 A. Dehzangi, R. Heffernan, A. Sharma, J. Lyons, K. Paliwal and A. Sattar, J. Theor. Biol., 2015, 364, 284–294. 25 G. D. Bader, D. Betel and C. W. V. Hogue, Nucleic Acids Res., 2003, 31, 248–250. 26 C. Stark, B. J. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz and M. Tyers, Nucleic Acids Res., 2006, 34, D535–D539. 27 C. Von Mering, M. Huynen, D. Jaeggi, S. Schmidt, P. Bork and B. Snel, Nucleic Acids Res., 2003, 31, 258–261. 28 I. Xenarios, D. W. Rice, L. Salwinski, M. K. Baron, E. M. Marcotte and D. Eisenberg, Nucleic Acids Res., 2000, 28, 289–291. 29 T. K. Prasad, R. Goel, K. Kandasamy, S. Keerthikumar, S. Kumar, S. Mathivanan, D. Telikicherla, R. Raju, B. Shafreen and A. Venugopal, Nucleic Acids Res., 2009, 37, D767–D772. 30 J. Xu and Y. Li, Bioinformatics, 2006, 22, 2800–2805.

Mol. BioSyst.

View Article Online

Published on 13 March 2015. Downloaded by University of California - San Diego on 13/04/2015 07:37:47.

Paper

31 M. Zhu, L. Gao, X. Li, Z. Liu, C. Xu, Y. Yan, E. Walker, W. Jiang, B. Su and X. Chen, J. Drug Targeting, 2009, 17, 524–532. 32 M. Kotlyar, K. Fortney and I. Jurisica, Methods, 2012, 57, 499–507. 33 H. W. Han, S. H. Bae, Y. H. Jung and J. Moon, FEBS Lett., 2013, 587, 444–451. 34 H. W. Han, J. H. Ohn, J. Moon and J. H. Kim, Nucleic Acids Res., 2013, 41, 9209–9217. ´si and 35 M. A. Yildirim, K. I. Goh, M. E. Cusick, A. L. Baraba M. Vidal, Nat. Biotechnol., 2007, 25, 1119–1126. 36 J. Wang, S. Zhang, Y. Wang, L. Chen and X. S. Zhang, PLoS Comput. Biol., 2009, 5, e1000521. 37 L. Yang, J. Wang, H. Wang, Y. Lv, Y. Zuo and W. Jiang, J. Theor. Biol., 2014, 349, 82–91. 38 J. M. Cherry, C. Adler, C. Ball, S. A. Chervitz, S. S. Dwight, E. T. Hester, Y. Jia, G. Juvik, T. Roe and M. Schroeder, Nucleic Acids Res., 1998, 26, 73–79. ¨ldener, G. Mannhaupt, 39 H. W. Mewes, D. Frishman, U. Gu ¨tter, ¨nsterko K. Mayer, M. Mokrejs, B. Morgenstern, M. Mu S. Rudd and B. Weil, Nucleic Acids Res., 2002, 30, 31–34. 40 W. H. Chen, P. Minguez, M. J. Lercher and P. Bork, Nucleic Acids Res., 2012, 40, D901–D906. 41 L. D. Hurst, Trends Genet., 2002, 18, 486–487. 42 P. M. Sharp and W. H. Li, Nucleic Acids Res., 1987, 15, 1281–1295. 43 J. L. Bennetzen and B. Hall, J. Biol. Chem., 1982, 257, 3026–3031. 44 W. Chen, P. M. Feng, E. Z. Deng, H. Lin and K. C. Chou, Anal. Biochem., 2014, 462, 76–83. 45 H. Lin, E. Z. Deng, H. Ding, W. Chen and K. C. Chou, Nucleic Acids Res., 2014, 42, 12961–12972. 46 H. Ding, E. Z. Deng, L. F. Yuan, L. Liu, H. Lin, W. Chen and K. C. Chou, BioMed Res. Int., 2014, 286419. 47 Y. Xu, X. Wen, L. S. Wen, L. Y. Wu, N. Y. Deng and K. C. Chou, PLoS One, 2014, 9, e105018. 48 Z. Liu, X. Xiao, W. R. Qiu and K. C. Chou, Anal. Biochem., 2015, 474, 69–77. 49 X. Xiao, J. L. Min, W. Z. Lin, Z. Liu, X. Cheng and K. C. Chou, J. Biomol. Struct. Dyn., 2015, 14, 1–13. 50 K. C. Chou, Mol. BioSyst., 2013, 9, 1092–1100. 51 M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight and J. T. Eppig, Nat. Genet., 2000, 25, 25–29. 52 T. Ikemura, J. Mol. Biol., 1981, 151, 389–409. 53 S. Hunter, P. Jones, A. Mitchell, R. Apweiler, T. K. Attwood, A. Bateman, T. Bernard, D. Binns, P. Bork and S. Burge, Nucleic Acids Res., 2012, 40, D306–D312. 54 M. Kanehisa, S. Goto, S. Kawashima, Y. Okuno and M. Hattori, Nucleic Acids Res., 2004, 32, D277–D280. 55 A. Bateman, L. Coin, R. Durbin, R. D. Finn, V. Hollich, S. Griffiths-Jones, A. Khanna, M. Marshall, S. Moxon

Mol. BioSyst.

Molecular BioSystems

56

57 58

59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83

and E. L. L. Sonnhammer, Nucleic Acids Res., 2004, 32, D138–D141. M. C. Teixeira, P. Monteiro, P. Jain, S. Tenreiro, A. R. Fernandes, N. P. Mira, M. Alenquer, A. T. Freitas, A. L. Oliveira and ´-Correia, Nucleic Acids Res., 2006, 34, D446–D451. I. Sa W. H. Chen, K. Trachana, M. J. Lercher and P. Bork, Mol. Biol. Evol., 2012, 29, 1703–1706. D. P. Wall, A. E. Hirsh, H. B. Fraser, J. Kumm, G. Giaever, M. B. Eisen and M. W. Feldman, Proc. Natl. Acad. Sci. U. S. A., 2005, 102, 5483–5488. D. Greenbaum, R. Jansen and M. Gerstein, Bioinformatics, 2002, 18, 585–596. D. J. Watts and S. H. Strogatz, Nature, 1998, 393, 440–442. D. S. Goldberg and F. P. Roth, Proc. Natl. Acad. Sci. U. S. A., 2003, 100, 4372–4376. L. C. Freeman, Math. Soc. Sci., 1982, 3, 291–304. S. Wachi, K. Yoneda and R. Wu, Bioinformatics, 2005, 21, 4205–4208. S. Wuchty and E. Almaas, Proteomics, 2005, 5, 444–449. S. Maslov and K. Sneppen, Science, 2002, 296, 910–913. C. S. Chin and M. P. Samanta, Bioinformatics, 2003, 19, 2413–2419. C. Y. Lin, C. H. Chin, H. H. Wu, S. H. Chen, C. W. Ho and M. T. Ko, Nucleic Acids Res., 2008, 36, W438–W443. ´cs, R. Palotai, M. S. Szalay and P. Csermely, PLoS I. A. Kova One, 2010, 5, e12528. G. D. Bader and C. W. Hogue, BMC Bioinf., 2003, 4, 2. Y. C. Hwang, C. C. Lin, J. Y. Chang, H. Mori, H. F. Juan and H. C. Huang, Mol. BioSyst., 2009, 5, 1672–1678. L. Yang, Y. Lv, T. Li, Y. Zuo and W. Jiang, J. Theor. Biol., 2014, 358, 61–73. L. Yang, J. Wang, H. Wang, Y. Lv, Y. Zuo and W. Jiang, Biochem. Biophys. Res. Commun., 2014, 448, 473–479. E. Wallin and G. V. Heijne, Protein Sci., 1998, 7, 1029–1038. Z. C. Li, Y. H. Lai, L. L. Chen, C. Chen, Y. Xie, Z. Dai and X. Y. Zou, Mol. BioSyst., 2013, 9, 658–667. S. Wuchty, Genome Res., 2004, 14, 1310–1314. ´si and Z. N. Oltvai, H. Jeong, S. P. Mason, A. L. Baraba Nature, 2001, 411, 41–42. X. Wang, R. Wang, Y. Zhang and H. Zhang, Genome Biol. Evol., 2013, 5, 1291–1297. K. Henze and W. Martin, Nature, 2003, 426, 127–128. A. Drawid, R. Jansen and M. Gerstein, Trends Genet., 2000, 16, 426–430. K. C. Chou and H. B. Shen, Nat. Sci., 2009, 1, 63–92. W. Chen, H. Lin, P. M. Feng, C. Ding, Y. C. Zuo and K. C. Chou, PLoS One, 2012, 7, e47843. W. Chen, P. M. Feng, H. Lin and K. C. Chou, Nucleic Acids Res., 2013, 41, e68. Y. Xu, J. Ding, L. Y. Wu and K. C. Chou, PLoS One, 2013, 8, e55844.

This journal is © The Royal Society of Chemistry 2015

Characterization of proteins in S. cerevisiae with subcellular localizations.

Acquiring comprehensive knowledge of protein in various subcellular localizations is one of the fundamental goals in cell biology and proteomics. Alth...
2MB Sizes 2 Downloads 5 Views