Conserved Proteins Are Fragile Raquel Assis*,1 and Alexey S. Kondrashov2 1

Department of Biology, Pennsylvania State University Department of Ecology and Evolutionary Biology, Department of Computational Medicine and Bioinformatics, and the Life Sciences Institute, University of Michigan *Corresponding author: E-mail: [email protected]. Associate editor: Hideki Innan 2

Abstract

Key words: evolutionary rate, selective constraint, protein evolution, protein essentiality, natural selection. toxic when genes are highly expressed (Drummond et al. 2005; Drummond et al. 2006; Drummond and Wilke 2008; Geiler-Samerotte et al. 2011). The second hypothesis, also supported by the strong negative correlation between evolutionary rate and number of protein–protein interactions (Krylov et al. 2003; Lemos et al. 2005), is that of protein misinteraction avoidance, in which selection acts against mutations that produce nonfunctional interactions between proteins (Levy et al. 2012; Yang et al. 2012). These hypotheses are each well supported by theoretical and empirical studies (Zeldovich et al. 2007; Drummond and Wilke 2008; Wolf et al. 2010; Yang et al. 2010; Levy et al. 2012; Serohijos et al. 2012; Yang et al. 2012). Additionally, both are consistent with a more general hypothesis that selection acts primarily against mutations that disrupt the physiochemical properties of a protein by modifying its amino acid composition, i.e., that a protein’s evolutionary rate is constrained by its fragility. To determine the relative contributions of functional importance and fragility to selective constraint, we compared selection against nonsense mutations to selection against missense mutations in proteins with varying levels of sequence conservation. A nonsense mutation replaces an amino acid of a protein with a premature stop codon, truncating its product and nearly always abolishing its function. Hence, selection against nonsense mutations can be used as a proxy for the essentiality of a protein’s function. In contrast, a missense mutation replaces one amino acid with another, often altering the physiochemical properties of the resulting protein. Thus, selection against missense mutations can be used as a measure of a protein’s fragility.

ß The Author 2013. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected]

Mol. Biol. Evol. 31(2):419–424 doi:10.1093/molbev/mst217 Advance Access publication November 7, 2013

419

Letter

Proteins evolve at different rates. Although positive selection is responsible for some of this diversity, most variation in evolutionary rates is due to differences in selective constraint among proteins. A number of factors contribute to the level of selective constraint on a protein-coding gene, including its translation rate, expression level and breadth, and interactions with other molecules (Herbeck and Wall 2005; Koonin and Wolf 2006; Pal et al. 2006; Vitkup et al. 2006). However, one dichotomy is clear: A protein may be under strong selective constraint because its function is important and cannot be lost, or it may be under strong constraint because it is fragile, such that a large proportion of amino acid replacements result in considerable fitness losses. Determining the relative contributions of functional importance and fragility to protein evolutionary rate is a longstanding problem in evolutionary biology (Wilson et al. 1977; Wolf et al. 2010). Although it was originally believed that strong selective constraint on a protein is indicative of functional importance, several studies have shown that differences between evolutionary rates of essential and nonessential proteins are quite small (Hurst and Smith 1999; Hirsh and Fraser 2001; Jordan et al. 2002; Krylov et al. 2003; Wall et al. 2005; Pal et al. 2006; Wolf 2006). In contrast, a strong negative correlation exists between evolutionary rate and gene expression level (Pal et al. 2001; Krylov et al. 2003; Drummond et al. 2005; Lemos et al. 2005), leading to the proposition of two alternative hypotheses for evolutionary rate variation among proteins. The first is the mistranslation-induced misfolding (MIM) hypothesis, in which selection acts against protein misfolding, a phenomenon that is thought to be particularly

Downloaded from http://mbe.oxfordjournals.org/ at Universite Laval on June 18, 2014

Levels of selective constraint vary among proteins. Although strong constraint on a protein is often attributed to its functional importance, evolutionary rate may also be limited if a protein is fragile, such that a large proportion of amino acid replacements reduce its fitness. To determine the relative contributions of essentiality and fragility to selective constraint, we compared relationships of selection against nonsense mutations (snon ) and selection against missense mutations (smis ) to protein sequence conservation (Ka ). As expected, snon is greater than smis ; however, the correlation between smis and Ka is nearly three times stronger than the correlation between snon and Ka . Moreover, examination of relationships to gene expression level, tissue specificity, and number of protein–protein interactions shows that smis is more strongly correlated than snon to all three measures of biological function. Thus, our analysis reveals that slowly evolving proteins are under strong selective constraint primarily because they are fragile, and that this association likely exists because allowing a protein to function improperly, rather than removing it from a biological network, can negatively affect the functions of other molecules it interacts with and their downstream products.

MBE

Assis and Kondrashov . doi:10.1093/molbev/mst217

C

Missense Nonsense

0.8

0.6 3

5

7

9 11 13 15 17 19 21 23 25

Number of nonsense alleles

Density 1

3

5

7

9 11 13 15 17 19 21 23 25

Number of missense alleles

0.0

1

0.0

0.1

0.2

0.4

0.6

0.5 0.4 0.2

0.3

Proportion

0.5 0.4 0.3

Proportion

0.2 0.1 0.0

0.7

0.7

B

0.6

A

significantly larger than those against missense mutations (fig. 1C; P < 2.2  1016, Mann–Whitney U test). Thus, as expected, it is more deleterious to abolish the function of a protein than to alter its amino acid composition. Moreover, the narrow distribution of selection coefficients against nonsense mutations is indicative of the similarly strong deleterious effects of abolishing protein functions, whereas the broad distribution of selection coefficients against missense mutations reflects the variable physiochemical changes associated with different types of amino acid replacements. To investigate the relationships between selection against nonsense and missense mutations and selective constraint, we computed the Ka between D. melanogaster and D. ananassae orthologs of each protein-coding gene. Because Ka is a measure of amino acid sequence divergence, low Ka values are equivalent to high sequence conservation. As expected, Ka is negatively correlated to both snon (fig. 2A) and smis (fig. 2B), showing that the strength of selection against either class of mutation increases with sequence conservation. However, the correlation between Ka and smis is significantly stronger than the correlation between Ka and snon (P < 1.8  10262, t-test). Thus, variation in selective constraint appears to depend more on mutations that alter a protein’s amino acid content than on those that abolish its function. A caveat of using selection against nonsense and missense mutations as quantifiers of functional importance and fragility, respectively, is that the effects of specific mutations depend on a number of factors. For example, a nonsense mutation may not abolish function if it results in the removal of a few terminal codons, whereas a missense mutation may actually abolish function if it produces an amino acid with radically different physiochemical properties. Thus, we repeated our analysis after removing nonsense mutations within the last five non-stop codons (30 nonsense mutations) and missense mutations resulting in radically different amino acids. To classify missense mutations by their physiochemical effects, we utilized the Grantham matrix, which quantifies distances between amino acids by their differences in

−5

−4

−3

−2

−1

0

1

log(s)

FIG. 1. Selection against nonsense and missense mutations. Site frequency spectra for (A) nonsense and (B) missense polymorphisms. For clarity, the first 25 bins are shown individually, and the last bin contains all remaining polymorphisms. (C) Distributions of selection coefficients against nonsense and missense mutations.

420

Downloaded from http://mbe.oxfordjournals.org/ at Universite Laval on June 18, 2014

To assess selective constraint against nonsense and missense mutations, we examined polymorphism data in 13,182 protein-coding genes from 162 lines of Drosophila melanogaster. We estimated absolute selection coefficients for each gene by s ¼ =q, where  is the mutation rate of the gene and q is the mutant allele frequency (Sunyaev et al. 2001; Gorlov et al. 2006). Thus, we computed selection coefficients for nonsense mutations by snon ¼ non =qnon and for missense mutations by smis ¼ mis =qmis . For a particular gene, we calculated non and mis by multiplying the number of sites at which the respective mutation could occur by the pernucleotide per-generation mutation rate, which we scaled by the proportion and rates of possible transitions and transversions for that mutation type. We used a per-nucleotide per-generation mutation rate of 3.5  109 and a relative transition:transversion rate of 2:1, which were both derived from D. melanogaster mutation accumulation lines (Keightley et al. 2009). Assuming that all possible mutations in a gene are represented by polymorphism data, qnon ¼ q non Pnon and qmis ¼ q mis Pmis , where q non is the average nonsense polymorphism frequency, Pnon is the proportion of nonsense polymorphic sites, q mis is the average missense polymorphism frequency, and Pmis is the proportion of missense polymorphic sites (Gorlov et al. 2006). When Pnon ¼ 0 or Pmis ¼ 0 for a particular gene, we set qnon or qmis to the minimum corresponding frequency observed in all genes. Because rare variants are incompletely represented by polymorphism data, our calculations may result in underestimations of qnon and qmis and, consequently, in overestimations of snon and smis . However, selection coefficients obtained by our calculations are consistent with previous estimates for Drosophila (discussed later; Haddrill et al. 2010). Moreover, our analysis was primarily concerned with relationships between selection coefficients and other properties of proteins, rather than with their specific values or relationships to each another. We identified 1,548 nonsense and 181,192 missense polymorphisms in D. melanogaster protein-coding genes. There is a greater skew toward low-frequency states for nonsense (fig. 1A) than for missense (fig. 1B) polymorphisms, and selection coefficients against nonsense mutations are

MBE B

ρ = -0.18*

ρ = -0.50*

-5

-4

-4

-3

)

mis log(s

-2 -3

) log(s non

-1

-2

0

-1

A

1

Conserved Proteins Are Fragile . doi:10.1093/molbev/mst217

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.8

1.0

D

ρ = -0.18*

ρ = -0.48*

-4

-3

cons log(s ) mis

-2

-6

-4

-5

-3

log(s 5’ ) non

-1

-2

0

-1

C

0.6 Ka

1

Ka

0.0

0.2

0.4

0.6

0.8

1.0

Ka

0.0

0.2

0.4

0.6

0.8

1.0

Ka 0

FIG. 2. Selection against nonsense and missense mutations as functions of Ka . Ka is plotted against (A) snon , (B) smis , (C) s5non , and (D) scons mis . Linear regression lines are depicted in red, and Spearman correlation coefficients are shown in the top right corner of each plot. *P < 0.001.

chemical composition, polarity, and molecular volume (Grantham 1974). Missense mutations with Grantham’s distances 100 are conservative, whereas those with greater distances are radical. We identified 92 30 nonsense polymorphisms and 20,270 radical missense polymorphisms. As ex0 pected, selection against 30 nonsense mutations (s3non ) is significantly weaker than against upstream nonsense muta0 tions (s5non ; P < 2.2  1016, Mann–Whitney U test), and selection against radical missense mutations (srad mis ) is significantly stronger than selection against conservative 1016, Mann–Whitney missense mutations (scons mis ; P < 2.2  50 U test). However, correlations of snon and scons mis to Ka (fig. 2C and D) are similar to those of snon and smis to Ka (fig. 2A and B), further supporting the hypothesis that variation in selective constraint is primarily driven by differences in fragility, rather than in functional essentiality, of proteins.

To explore the biological factors underlying the greater contribution of a protein’s fragility to selective constraint, we examined relationships of Ka , snon , and smis to gene expression level, tissue specificity (inversely correlated to expression breadth), and number of protein–protein interactions (table 1; see Materials and Methods for details). Ka is negatively correlated to expression level (also see Pal et al. 2001; Krylov et al. 2003; Drummond et al. 2005; Lemos et al. 2005), positively correlated to tissue specificity (also see Duret and Mouchiroud 2000; Winter et al. 2004; Zhang and Li 2004), and negatively correlated to number of protein–protein interactions (also see Krylov et al. 2003; Lemos et al. 2005), illustrating the clear association between strong selective constraint in proteins with frequent and broad biological functions. However, the relationship between selective constraint and these three biological properties differs when considering 421

Downloaded from http://mbe.oxfordjournals.org/ at Universite Laval on June 18, 2014

0.0

MBE

Assis and Kondrashov . doi:10.1093/molbev/mst217 Table 1. Spearman Correlations between Selective Constraint and Three Measures of Biological Function. Expression level (FPKM) Tissue specificity (s) Protein–protein interactions

Ka 0.24* 0.38* 0.36*

snon 0.06* 0.07* 0.16*

smis 0.24* 0.18* 0.29*

*P < 0.001.

422

Materials and Methods Sequence Analysis Reference sequences and annotation data for D. melanogaster and D. ananassae protein-coding genes were downloaded from the University of Santa Cruz (UCSC) Genome Browser

Downloaded from http://mbe.oxfordjournals.org/ at Universite Laval on June 18, 2014

selection against nonsense and missense mutations individually. Although snon is weakly negatively correlated to expression level and tissue specificity, it is positively correlated to number of protein–protein interactions, suggesting that the importance of a protein’s function is likely not related to how it is expressed, but rather to the number of proteins it interacts with. In contrast, smis is positively correlated to expression level, negatively correlated to tissue specificity, and positively correlated to number of protein–protein interactions. Thus, fragile protein-coding genes tend to have high expression levels, broad expression, and high connectivity in protein– protein interaction networks. Correlations of expression level, tissue specificity, and number of protein–protein interactions to smis are all significantly stronger than to snon (P < 3.3  10184, P < 7.0  1026, and P < 1.9  1036, respectively; t-tests), suggesting that the biological role of a protein is more affected by missense than by nonsense mutations. In particular, although there are strong correlations between smis and both gene expression level and tissue specificity that are consistent with relationships for Ka , there is only a weak correlation observed between snon and either of these measures, and the direction of the correlation between snon and expression level is opposite to that between Ka and expression level. This contrast indicates that, although the severity of modifying the expression of a gene by a missense mutation depends on the level and breadth of its expression, turning off the expression of a gene via a nonsense mutation is highly deleterious regardless of how much or where the gene is expressed. Moreover, although the correlations of snon and smis to number of protein–protein interactions are both positive, consistent with the relationship observed for Ka , the correlation is much stronger for smis . We hypothesize that this difference is due to downstream effects on protein–protein interaction networks. A nonsense mutation removes a protein from interactions and, hence, is more deleterious when the protein participates in many interactions, resulting in a positive correlation between snon and number of protein– protein interactions. Although a missense mutation can also either weaken or abolish the interactions of a protein by altering its binding affinities, it can alternatively modify the function of a protein while leaving its binding affinities unchanged. Hence, a missense mutation can have deleterious effects on the functions of an entire protein complex, which may explain the stronger correlation between smis and number of protein–protein interaction partners. It is important to note that our analysis reveals only that a protein’s evolutionary rate is limited primarily by its fragility and that its fragility is strongly correlated to several biological

properties. A population-level examination of fitness effects cannot, however, elucidate the precise biological mechanism underlying a protein’s fragility. Many factors may contribute to a protein’s fragility, including susceptibility to misfolding (MIM; Drummond et al. 2005; Drummond et al. 2006; Drummond and Wilke 2008; Geiler-Samerotte et al. 2011), formation of undesirable interactions or modifications of existing interactions (protein misinteraction avoidance; Krylov et al. 2003; Lemos et al. 2005), and changes in protein translation rate or localization. Although the strong correlations we observe between fragility and gene expression level, tissue specificity, and number of protein–protein interactions support a biological basis for fragility, it is impossible to use this approach to discriminate among different variables and determine their relative contributions to fragility. However, although beyond the scope of the current study, determining how various biological factors influence the ability of proteins to tolerate amino acid replacements is key to understanding protein sequence and functional evolution. The weak link between protein essentiality and evolutionary rate (Hurst and Smith 1999; Hirsh and Fraser 2001; Jordan et al. 2002; Krylov et al. 2003; Wall et al. 2005; Wolf 2006) has long puzzled evolutionary biologists. With the recent availability of large functional datasets for many species, a complex story has emerged, showing that a number of factors contribute to variation in selective constraint among proteins (Duret and Mouchiroud 2000; Pal et al. 2001; Krylov et al. 2003; Winter et al. 2004; Zhang and Li 2004; Drummond et al. 2005; Lemos et al. 2005; Drummond et al. 2006; Drummond and Wilke 2008; Wolf et al. 2010; GeilerSamerotte et al. 2011; Levy et al. 2012; Yang et al. 2012). Together, these findings are consistent with the hypothesis that proteins are conserved primarily because they are fragile, rather than because their functions are important. By separately measuring constraint against nonsense and missense mutations in proteins with different levels of sequence conservation, we were able to disentangle the effects of selection against functional importance and fragility, revealing that fragility is indeed a greater contributor to selective constraint on proteins. We also found that gene expression level, tissue specificity, and number of protein–protein interactions are all more strongly correlated to a protein’s fragility than to its functional importance. Moreover, our results are robust to the exclusion of singletons (see supplementary table S1 and figs. S1 and S2, Supplementary Material online, which correspond to table 1 and figs. 1C and 2, respectively). Thus, our study illustrates that fragility plays a more dominant role than essentiality in protein evolution and that this effect may be due to higher fitness costs associated with impairment, rather than inactivation, of the biological networks in which evolutionarily conserved proteins participate.

MBE

Conserved Proteins Are Fragile . doi:10.1093/molbev/mst217

Relationships between pairs of variables shown in Figure 2 and supplementary figure S2 (Supplementary Material online) were determined by computing Spearman’s rank correlation coefficients and significance levels. We tested the significance of differences between correlations involving snon and smis by computing t test statistics for dependent correlations sharing one variable. All statistical analyses were performed in the R software environment (R Development Core Team 2009).

at http://genome.ucsc.edu/ (last accessed November 21, 2013), and orthologous genes in D. melanogaster and D. ananassae were obtained from the Drosophila ortholog table downloaded from http://www.flybase.org (last accessed November 21, 2013). Genome sequences for 162 D. melanogaster inbred lines were downloaded from the Drosophila Genetic Reference Panel (DGRP) website at http://www. hgsc.bcm.tmc.edu/projects/dgrp/ (last accessed November 21, 2013). Orthologous regions were extracted from these genomic sequences and aligned to corresponding D. melanogaster and D. ananassae reference genes with MACSE (Ranwez et al. 2011). PAML (Yang 2007) was used to calculate the Ka between orthologs in D. melanogaster and D. ananassae (Yang 2007).

Supplementary figures S1 and S2 and table S1 are available at Molecular Biology and Evolution online (http://www.mbe. oxfordjournals.org/).

Functional Analysis

Acknowledgments

Paired-end RNA-seq reads from D. melanogaster male body, female body, carcass, male head, female head, testis, accessory gland, and ovary tissues were downloaded from the modENCODE site at http://www.modencode.org/ (last accessed November 21, 2013). We used Bowtie2 (Langmead et al. 2009) to align reads to transcript sequences based on annotation files downloaded from http://www.flybase.org (last accessed November 21, 2013) and eXpress (Roberts and Pachter 2013) to calculate the number of fragments per kilobase of exon per million fragments mapped, or FPKM (Trapnell et al. 2010), for each gene. Quantile normalization of expression data was performed using the affy package of Bioconductor in the R software environment (R Development Core Team 2009). Expression levels were estimated as the total FPKM for male and female body tissues. We computed the tissue specificity index, , for each gene using the following formula:  N  P Ei 1  loglogEmax  ¼ i¼1 , N1

The authors thank Drs Yegor Bazykin, Michael DeGiorgio, and two anonymous reviewers for their helpful comments on this manuscript. This work was supported by NIH fellowship F32 GM100673-02 awarded to R.A.

Statistical Analyses Mann–Whitney U tests were used to compare distributions of selection coefficients against nonsense and missense mutations (fig. 1C).

References Assis R, Zhou Q, Bachtrog D. 2012. Sex-biased transcriptome evolution in Drosophila. Genome Biol Evol. 4:1189–1200. Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH. 2005. Why highly expressed proteins evolve slowly. Proc Natl Acad Sci U S A. 102:14338–14343. Drummond DA, Raval A, Wilke CO. 2006. A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol. 23:327–337. Drummond DA, Wilke CO. 2008. Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 134:341–352. Duret L, Mouchiroud D. 2000. Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol Biol Evol. 17:68–74. Geiler-Samerotte KA, Dion MF, Budnik BA, Wang SM, Hartl DL, Drummond DA. 2011. Misfolded proteins impose a dosage-dependent fitness cost and trigger a cytosolic unfolded protein response in yeast. Proc Natl Acad Sci U S A. 108:680–685. Gorlov IP, Kimmel M, Amos CI. 2006. Strength of purifying selection against different categories of the point mutations in the coding regions of the human genome. Hum Mol Genet. 15:1143–1150. Grantham R. 1974. Amino acid difference formula to help explain protein evolution. Science 185:862–864. Haddrill PR, Loewe L, Charlesworth B. 2010. Estimating the parameters of selection on nonsynonymous mutations in Drosophila pseudoobscura and D. miranda. Genetics 185:1381–1396. Herbeck HT, Wall DP. 2005. Converging on a general model of protein evolution. Trends Biotechnol. 23:485–487. Hirsh AE, Fraser HB. 2001. Protein dispensability and rate of evolution. Nature 411:1046–1049. Hurst LD, Smith NG. 1999. Do essential genes evolve slowly? Curr Biol. 9: 747–750. Jordan IK, Rogozin IB, Wolf YI, Koonin EV. 2002. Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12:962–968. Keightley PD, Trivedi U, Thomson M, Oliver F, Kumar S, Blaxter ML. 2009. Analysis of the genome sequence of three Drosophila melanogaster spontaneous mutation accumulation lines. Genome Res. 19: 1195–1201. Koonin EV, Wolf YI. 2006. Evolutionary systems biology: links between gene evolution and function. Curr Opin Biotechnol. 17:481–487. Krylov DM, Wolf YI, Rogozin IB, Koonin EV. 2003. Gene loss, protein sequence divergence, gene dispensability, expression level, and

423

Downloaded from http://mbe.oxfordjournals.org/ at Universite Laval on June 18, 2014

where N is the number of tissues, Ei is the expression in tissue i, and Emax is the maximum expression of the gene in all tissues (Yanai et al. 2005; Larracuente et al. 2008).  ranges from 0 to 1, with larger  values indicating greater tissue specificity. We applied this calculation to FPKM values from carcass, male head, female head, testis, accessory gland, and ovary tissues. Our analysis was restricted to expressed genes, i.e., those for which FPKM 1 in at least one tissue (Assis et al. 2012). Seven protein–protein interaction data sets were obtained from the Drosophila Interactions Database (DroID) at http:// www.droidb.org (last accessed November 21, 2013) and one from FlyBase at http://www.flybase.org (last accessed November 21, 2013). Numbers of protein–protein interactions were estimated by counting unique interaction partners for each gene.

Supplementary Material

Assis and Kondrashov . doi:10.1093/molbev/mst217

424

quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 28: 511–515. Vitkup D, Kharchenko P, Wagner A. 2006. Influence of metabolic network structure and function on enzyme evolution. Genome Biol. 7: R39. Wall DP, Hirsh AE, Fraser HB, Kumm J, Giaever G, Eisen MB, Feldman MW. 2005. Functional genomic analysis of the rates of protein evolution. Proc Natl Acad Sci U S A. 102:5483–5488. Wilson AC, Carlson SS, White TJ. 1977. Biochemical evolution. Annu Rev Biochem. 46:573–639. Winter EE, Goodstadt L, Ponting CP. 2004. Elevated rates of protein secretion, evolution, and disease among tissue-specific genes. Genome Res. 14:54–61. Wolf YI. 2006. Coping with the quantitative genomics ‘elephant’: the correlation between gene dispensability and evolution rate. Trends Genet. 22: 354–357. Wolf YI, Gopich IV, Lipman DJ, Koonin EV. 2010. Relative contributions of intrinsic structural-functional constraints and translation rate to the evolution of protein-coding genes. Genome Biol Evol. 2:90–199. Yanai I, Benjamin H, Shmoish M, et al. (12 co-authors). 2005. Genomewide midrange transcription profiles reveal expression level relationships in human tissue specifications. Bioinformatics 21:650–659. Yang Z. 2007. PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol. 24:1586–1591. Yang J, Liao B, Zhuang S, Zhang J. 2012. Protein misinteraction avoidance causes highly expressed proteins to evolve slowly. Proc Natl Acad Sci U S A. 109:5158–5159. Yang J, Zhuang S, Zhang J. 2010. Impact of translational error-induced and error-free misfolding on the rate of protein evolution. Mol Syst Biol. 6:421. Zeldovich KB, Chen P, Shakhnovich EI. 2007. Protein stability imposes limits on organism complexity and speed of molecular evolution. Proc Natl Acad Sci U S A. 104:16152–16157. Zhang L, Li WH. 2004. Mammalian housekeeping genes evolve more slowly than tissue-specific genes. Mol Biol Evol. 21:236–239.

Downloaded from http://mbe.oxfordjournals.org/ at Universite Laval on June 18, 2014

interactivity are correlated in eukaryotic evolution. Genome Res. 13: 2229–2235. Langmead B, Trapnell C, Pop M, Salzberg SL. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10:R25. Larracuente AM, Sackton TB, Greenberg AJ, Wong A, Singh ND, Sturgill D, Zhang Y, Oliver B, Clark AG. 2008. Evolution of protein-coding genes in Drosophila. Trends Genet. 24:114–123. Lemos B, Bettencourt BR, Meiklejohn CD, Hartl DL. 2005. Evolution of proteins and gene expression levels are coupled in Drosophila and are independently associated with mRNA abundance, protein length, and number of protein-protein interactions. Mol Biol Evol. 22:1345–1354. Levy ED, De S, Teichmann SA. 2012. Cellular crowding imposes global constraints on the chemistry and evolution of proteomes. Proc Natl Acad Sci U S A. 109:20461–20466. Pal C, Papp B, Hurst LD. 2001. Highly expressed genes in yeast evolve slowly. Genetics 158:927–931. Pal C, Papp B, Lercher MJ. 2006. An integrated view of protein evolution. Nat Rev Genet. 7:337–348. R Development Core Team. 2009. R: a language and environment for statistical computing. Vienna (Austria): R Foundation for Statistical Computing. Ranwez V, Harispe S, Delsuc F, Douzery EJP. 2011. MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons. PLoS One 6:e22594. Roberts A, Pachter L. 2013. Streaming fragment assignment for real-time analysis of sequencing experiments. Nat Methods. 10:71–73. Serohijos AWR, Rimas Z, Shakhnovich EI. 2012. Protein biophysics explains why highly abundant proteins evolve slowly. Cell Rep. 2: 249–256. Sunyaev S, Ramensky V, Koch I, Lathe W 3rd, Kondrashov AS, Bork P. 2001. Prediction of deleterious human alleles. Hum Mol Genet. 10: 591–597. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. 2010. Transcript assembly and

MBE

Conserved proteins are fragile.

Levels of selective constraint vary among proteins. Although strong constraint on a protein is often attributed to its functional importance, evolutio...
3MB Sizes 0 Downloads 0 Views