B R I E F C O M M U N I C AT I O N doi:10.1111/evo.12195

WHY WE ARE NOT DEAD ONE HUNDRED TIMES OVER Brian Charlesworth1,2 1

Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT,

United Kingdom 2

E-mail: [email protected]

Received April 2, 2013 Accepted June 14, 2013 Data Archived: Dryad doi:10.5061/dryad.jk832 The possibility of pervasive weak selection at tens or hundreds of millions of sites across the genome, suggested by recent studies of silent site DNA sequence variation and divergence, raises the problem of the survival of the population in the face of the large genetic load that may result. Two alternative resolutions of this problem are presented for populations where recombination is sufficiently frequent that different sites under selection evolve independently. One invokes weak stabilizing selection, of the magnitude compatible with abundant silent site variability. This can be shown to produce only a modest genetic load, due to the effectiveness of even weak stabilizing selection in keeping the trait mean close to the optimum. The other invokes soft selection, whereby individuals compete for a limiting resource whose abundance determines the absolute fitness of the population. Weak purifying selection at a large number of sites produces only a small variance in fitness among individuals within the population, due to the fact that most sites are fixed rather than polymorphic. Even when it produces a large genetic load, it is compatible with the observations on fitness variance when selection is soft. It may be very difficult to distinguish between these two possibilities. KEY WORDS:

Genetic drift, genetic load, mutation, soft selection, stabilizing selection, purifying selection.

Introduction The genetic load of a population is the extent to which its mean fitness, measured relative to the fitness of the genotype with the highest possible fitness, is reduced below one (Crow 1958). The load can be viewed as the lower bound to the proportion of the population that fails to survive or reproduce as a result of selection. Kondrashov (1995a) has drawn attention to the fact that weak purifying selection acting on many nucleotide sites throughout the genome can cause a very large genetic load compared with the classical case of an infinite population at mutation–selection balance (Haldane 1937; Crow 1958), because numerous slightly deleterious mutations reach high frequencies or fixation as a result of genetic drift (Kimura et al. 1963). He showed that the number of sites under such selection cannot exceed 4Ne , where Ne is the effective population size (Wright 1931), without causing so severe a genetic load that the survival of the population is endangered, unless truncation selection is acting.  C

1

2013 The Author(s). Evolution

Recent analyses of data on DNA sequence polymorphism and/or between-species divergence from both Drosophila (Halligan and Keightley 2006; Zeng and Charlesworth 2010) and human populations (Eory et al. 2010; Ward and Kellis 2012) suggest that, in addition to purifying selection acting on nonsynonymous coding sites, there may be tens or even hundreds of millions of silent sites subject to very weak purifying selection, which is nevertheless strong enough to reduce the probabilities of fixation of slightly deleterious mutations relative to neutral expectation, and to affect their frequency distributions within populations. These observations mean that the question of the resulting genetic load needs to be revisited, as has recently been done for the case of the less numerous, strongly selected nonsynonymous and noncoding sites in humans (Lesecque et al. 2012). The purpose of this article is to reexamine the issue of the genetic load for a finite diploid, randomly mating population in the light of some new results on weak stabilizing and purifying selection

B R I E F C O M M U N I C AT I O N

presented by Charlesworth (2013), who argued that some important genomic traits such as genome size and codon usage may be subject to stabilizing selection (where an intermediate trait value is favored), rather than purifying selection (where an extreme trait value is favored). In Charlesworth (2013), stabilizing selection was modeled by assuming a single quantitative trait, with fitness being proportional to the squared deviation of trait value from the optimum. Mutations affecting the trait occurred at a large number of biallelic sites around the genome. Mutational bias was allowed, such that mutations affecting the trait in one direction could arise more frequently than those with opposite effects, although mutations in each direction occur at each site. The main conclusion relevant to this article was that the expected value of the trait mean at statistical equilibrium under drift, mutation, and selection usually differed only slightly from the optimum, even in the presence of mutational bias, although drift can generate a fairly wide distribution of population means around this expectation (Lande 1976). In addition, a model of purifying selection was analyzed by allowing selection against one of the two alleles at a large number of biallelic sites, with log fitness of an individual being a quadratic function of the number of deleterious mutations carried by it. Approximations for the expectations of the mean and variance of the numbers of mutations per individual over the stationary probability distribution under purifying selection, mutation, and drift were obtained. The parameters used by Charlesworth (2013) to generate numerical results for these two models were intended to check the analytical approximations, and did not represent real populations because of the use of small population sizes and numbers of sites under selection. The load question cannot, therefore, be answered directly from these results. This difficulty can be overcome by using the fact that the existence of abundant polymorphisms at the genomic sites in question, and the relatively small observed reductions to their evolutionary rates relative to neutral expectation, implies that the scaled intensity of selection, γ, on individual nucleotide variants must be located in a narrow region around one (Kondrashov 1995a). (If the selection coefficient against heterozygotes for a deleterious variant at a site is s, γ is defined as 4Ne s.) To obtain rough estimates of the genetic load, γ can therefore simply be equated to 1. Use of this approximation yields useful estimates of the genetic load and genetic variance in fitness when selection is weak in relation to drift, as will be demonstrated later. It will be shown that weak stabilizing selection at very large numbers of sites causes only a modest genetic load, whereas weak purifying selection leads to a load that is close to unity. In contrast, the genetic variance in fitness is moderate with stabilizing selection, and is extremely small with purifying selection

2

EVOLUTION 2013

The Load Under Stabilizing Selection For convenience, a quadratic deviations model of the relation between trait value and fitness is used, following widespread practice (Wright 1935a, b; B¨urger 2000, Chap. 6; Charlesworth 2013). As a further simplification, the optimal trait value is set to 0, so that the fitness of an individual with trait value z (relative to the fitness at z = 0) is given by w(z) = 1 − Sz 2 ,

(1)

where S measures the intensity of stabilizing selection. Let z¯ be the trait mean and Vg the genetic variance for a given population. Using equation (1), the mean of w over all individuals in the population can easily be found (Wright 1935b), yielding the result that the genetic load for the population (i.e., the reduction in mean fitness below 1) is L stab = S(Vg + z¯ 2 ).

(2a)

But genetic drift causes the trait mean and variance to be distributed around their expected values (Lande 1976; B¨urger and Lande 1994). The simplest definition of the load is then the expectation of Lstab over the stationary distribution under mutation, selection, and drift in a finite population L ∗stab = S(Vg∗ + z¯ ∗2 + Vz¯ ),

(2b)

where asterisks indicate expectations over the stationary distribution, and Vz¯ is the variance of z¯ at stationarity. The earlier results are essentially independent of the details of the genetic basis of the trait in question. To make further progress, a specific model of this is needed. Following Charlesworth (2013), consider a diploid, randomly mating population with m exchangeable sites affecting the trait, each segregating for two semidominant variants, with a difference 2a between the genotypic values of homozygotes for the two variants at each site (A1 -type and A2 -type, with the latter causing a higher trait value). Mutational bias with respect to the trait value is allowed, so that mutations from A1 to A2 at a given site occur at rate u, and mutations in the reverse direction arise at rate κu; κ > 1 when mutation is biased in favor of a lower trait value, as is arbitrarily assumed here. Different sites are treated as evolutionarily independent of each other (i.e., linkage disequilibrium is negligible). With mutational bias against A2 variants, the expected trait mean is less than 0, so that all three terms contribute to the load. However, for a wide range of parameter values only the term in Vg∗ need be considered, as can be shown as follows. Vz¯ is approximately equal to 1/(4Ne S) (Lande 1976; Charlesworth 2013), so that the last term on the right of equation (2b) reduces to 1/(4Ne ) (Lande 1980); this is negligibly small for most panmictic natural populations (for which Ne ≥ 10,000: Charlesworth and

B R I E F C O M M U N I C AT I O N

Table 1.

Genetic variances and genetic loads under weak stabilizing selection in large populations.

−6

∗2

Ne (×10 )



0.5 1.0 2.0 4.0 40 ∞

18.6 4.65 1.16 0.291 0.0465 0

Vg∗ (×10−6 ) (simulations)

Vg∗ (×10−6 ) (neutral)

L ∗stab (eq. 2b)

L ∗stab (eq. 3)

0.542 1.15 2.38 4.76 18.7 56.2

0.500 1.00 2.00 4.00 40.0 ∞

0.00542 0.0115 0.0238 0.0476 0.187 0.562

0.00500 0.0100 0.0200 0.0400 0.400 ∞

The mutational bias is k = 2, and the mutation rate per site is 3.75 × 10−9 . The number of sites is m = 5 × 107 , the effect of a mutation at a given site is a = 1, and the strength of stabilizing selection is S = 10−8 . z¯ ∗ was obtained from the expression given in the text. The values of Vg∗ for finite populations in the third column were obtained by rescaling the simulation results in tables 1 and S2 of Charlesworth (2013) to the population sizes and values of m and a used here, with the simulation results for a population size of 100 corresponding to Ne = 106 in the present case. The values of Vg∗ and the load for an infinite population were obtained from the asymptote for large Ne of equation (A6b) of Charlesworth (2013), which gives Vg∗ = mu(1 + κ)/S and L ∗stab = mu(1 + κ).

Charlesworth 2010, p. 228). The deviation of the mean from the optimum can be obtained from equation (A3b) of Charlesworth (2013), which implies that the expected absolute value of z¯ is approximately ln(κ)/(8Ne Sa), so that the second term on the right of equation (2b) is approximately [ln(κ)]2 /(64Ne 2 Sa2 ). Again, for Ne values typical of natural populations, this is likely to be negligible, because ln(κ) is unlikely to exceed four and Ne 2 Sa2 is probably much greater than 1, because it involves the square of Ne , a very large quantity. Unless Ne 2 Sa2 is very small (implying extremely weak selection on the trait as a whole), only the term in SVg∗ need therefore be retained. This can be conveniently represented as follows. Using the standard formula for the additive genetic variance contributed by a single locus (Falconer and Mackay 1996, p. 126), the contribution to Vg∗ from an individual nucleotide site is given by a2 π, where π is the equilibrium nucleotide site diversity per site. Summing over all m sites, we have L ∗stab ≈ Sma 2 π.

(3)

An overestimate of the load can be obtained by using the neutral value for Vg∗ , because stabilizing selection reduces the variance below the neutral value for a given mutation rate and effective population size (Charlesworth 2013). This is given by the result that π = 8Ne κu/(1 + κ) when the frequencies of sites fixed for A1 versus A2 are at statistical equilibrium under neutral mutation and drift (Charlesworth and Charlesworth 2010, p. 274), reflecting the fact that the rate of mutation summed over both directions of change is 2κu/(1 + κ). The expected load can then be roughly estimated as follows, using Drosophila and humans as examples of species with a compact genome and large Ne , and with a large genome and small Ne , respectively. For a Drosophila species, a value of π of 0.01– 0.02 for putatively nearly neutral silent sites is fairly typical, for

example Shapiro et al. (2007). Given the estimated size of the noncoding part of the Drosophila genome that is under some degree of selective constraint (Halligan and Keightley 2006), an upper limit to m of 5 × 107 is plausible, although of course different components of the noncoding part of the genome are likely to experience somewhat different selection pressures (when the loads for each independent trait are small, they can be summed to obtain the approximate total load, so that this is not a serious problem). We can examine the potential load generated by weak stabilizing selection by setting 4Ne Sa2 to 0.04, a value that is sufficient to generate a detectable pressure of selection on individual DNA sequence variants affecting the trait, such that γ is of the order of 1 (Charlesworth 2013; Table 1). With Ne ≈ 106 , as suggested by current estimates of DNA sequence diversity and mutation rate for Drosophila melanogaster (Haag-Liautard et al. 2007; Shapiro et al. 2007), this give Sa2 = 10−8 . Equation (3) with π between 0.01 and 0.02 then yields loads between 0.005 and 0.01. For humans, recent estimates suggest that as many as 10% of noncoding sites may be subject to weak purifying selection (Ward and Kellis 2012). Assume that there are 108 sites under such selection, with π = 0.001 and Ne = 20000, consistent with recent estimates of silent site diversity and mutation rates in humans (The 1000 Genomes Project Consortium; Kong et al. 2012). With 4Ne Sa2 = 0.04, we have Sa2 = 5 × 10−7 , and L ∗stab = 0.05. The accuracy of the neutral approximation used to obtain these load estimates is tested in Table 1, which uses the Drosophila parameters proposed above (with κ = 2), and compares the values of the load obtained from equation (2b) to those from equation (3), for populations with Ne both below and above 106 . Except for the case with Ne = 40 × 106 , the finite populations shown in Table 1 all have directional selection in favor of A2 variants with EVOLUTION 2013

3

B R I E F C O M M U N I C AT I O N

γ ≈ ln(κ) = 0.69 (Charlesworth 2013, eq. 9), due to the slight departure of the expected value of the population mean from the optimum caused by mutational bias and drift. The agreement with the neutral approximation is excellent for most of the finite population results, although there is a tendency for the fit to get worse as Ne increases, and it is very poor for Ne = 40 × 106 . Even for this very high value of 4Ne (with 4Ne Sa2 = 1.6), the load is quite moderate. An apparently paradoxical implication of the results in Table 1 is that the load increases with the effective population size over a wide range of parameter space, for a given set of mutational and selection parameters; in contrast, purifying selection models have the reverse property, except for nearly recessive deleterious mutations where there is a small region of parameter space in which the load increases slightly with Ne (Kimura et al. 1963). The expected load for an infinite population is also much greater than for the finite populations, and is determined by the net mutation rate for sites affecting the trait, as expected from Haldane’s mutational load principle (Haldane 1937). This is because of the fact that when the trait mean coincides with the optimum, any new mutation that occurs is at a selective disadvantage due to stabilizing selection regardless of whether it is A1 to A2 or viceversa, so that all mutations can be treated as deleterious (Wright 1935b; B¨urger 2000, pp. 143–148). The increase in load with Ne arises because the load under weak stabilizing selection is largely determined by the genetic variance, which is an increasing function of population size. The load is not much affected by fixations of alleles, even when selection is relatively weak, because in these examples the expected mean is always held close to the optimum, and the load arising from the variance around this expectation is always very small. Only when the population size becomes extremely small will this latter term start to dominate, and the load then increases as Ne becomes smaller. The results thus show that, with this model of stabilizing selection, the load can be small to moderate, even with tens of millions of sites subject to selection of the strength estimated from population genetic data. There is thus no threat to the survival of the population. However, the relative importance of stabilizing versus purifying selection for synonymous and noncoding sites is currently unknown, so the question of the load under weak purifying selection also needs consideration.

The Load Under Purifying Selection With purifying selection against weakly selected, deleterious mutations at a large number of sites, a similar approach to that of Kondrashov (1995a) can be used, but a simpler derivation will be used here. A large number of sites, m, is assumed, with re-

4

EVOLUTION 2013

versible mutation between allelic types A1 and A2 at each site, with A1 representing the deleterious allele. To include possible effects of epistasis, the log-quadratic fitness model employed in many previous studies is used (e.g., Charlesworth 1990), where 1 ln[w(n)] = −αn − βn 2 . 2

(4)

Here, n measures the number of deleterious (A1 ) mutations carried by an individual; with intermediate dominance, n = n11 + n12 /2, where n11 is the number of homozygous deleterious mutations and n12 is the number of sites that are heterozygous for A1 -type and the fitter A2 variants (n11 and n12 are obtained by summing over sites in the genome). The coefficients α and β measure the linear and quadratic effects of n on fitness. If β > 0, there is synergistic selection against deleterious mutations (positive) epistasis; if β < 0, there is diminishing returns (negative) epistasis; β = 0 corresponds to multiplicative fitnesses (no epistasis). It is useful to note that the selection coefficient against an individual A1 -type mutation at a given site in a population with a mean number of mutations n¯ is given by s ≈ α + βn¯

(5)

(Charlesworth 1990). When n¯ is sufficiently large, the distribution of n among individuals in the population is approximately normal, and the log mean fitness of the population is given by (α2 Vg − 2αn¯ − βn¯ 2 ) 1 ln(w) ¯ = − ln(1 + βVg ) + , 2 2(1 + βVg )

(6)

where Vg is the variance of n among individuals within the population (Charlesworth 1990, eq. A2). If the small terms involving βVg and α2 Vg in equation (6) are neglected, we obtain 1 1 ¯ + βn) ¯ = n(s ¯ − βn). ¯ − ln(w) ¯ ≈ n(α 2 2

(7a)

Taking the expectations over the stationary distribution under mutation, selection, and drift, and using the fact that the ratio of the variance of n¯ to the square of the expected value of n¯ approaches 0 as m increases (Charlesworth 2013; Supporting Information), the approximate expected value of the load over the stationary distribution under selection, mutation, and drift is given by 1 L ∗pur ≈ 1 − exp[−n ∗ (s ∗ − βn ∗ )], 2

(7b)

where s* and n* are the expected values of the selection coefficient and mean number of deleterious mutations, respectively. The minimum load occurs with purely synergistic selection, when the linear coefficient α is 0. Its value can be crudely estimated as follows, following the approach of Kondrashov (1995a). Assume that a given class of sites across the whole genome is subject to some form of weak selection (such as selection on codon

B R I E F C O M M U N I C AT I O N

usage at synonymous sites), with a scaled selection coefficient γ = 4Ne s of approximately 1, as proposed in the Introduction. ¯ ≈ Substituting γ ≈ 1 and α = 0 into equation (5), we have nβ 1/(4N). Substituting this in turn into equation (7b), and assuming that the frequency of sites in their optimal state is approximately 50% (so that n¯ ≈ m), as is the case when γ ≈1 and the mutational bias is moderate (Li 1987; Bulmer 1991; McVean and Charlesworth 1999), we obtain L ∗syn ≈ 1 − exp[−m/(8Ne )].

(8a)

In the absence of synergism (β = 0), a similar calculation yields the expected load with multiplicative fitnesses L ∗mult ≈ 1 − exp[−m/(4Ne )].

(8b)

Equation (8b) is identical to the result of Kondrashov (1995a), derived by a more complex argument. In agreement with his conclusions, equations (8) show that synergism can mitigate the load to some extent, but there is still a very large load when m >> 8Ne , as is likely to be the case. In Drosophila, for example, the number of synonymous codons in the genome is approximately 14 × 106 (Misra et al. 2002). If all of these were subject to purifying selection with γ ≈ 1 and N ≈ 106 , the load with completely synergistic selection is 0.83; it is 0.97 with multiplicative fitnesses. If the much larger number of noncoding sites discussed earlier is used instead (5 × 107 ), the loads become 0.998 and 1.00, respectively. The situation is even worse in species with effective population sizes that are 10-fold or more smaller, and that have many more noncoding sites under selection, such as humans. Thus, in contrast to the situation with stabilizing selection, if there is genome-wide purifying selection of the type described by equation (4) on the scale suggested by the recent findings mentioned earlier, an extremely high level of genetic load is inevitable, in agreement with Kondrashov (1995a). This conclusion recalls the 1960s debate concerning the load generated by genome-wide balancing selection, stimulated by the discovery of abundant variability detected by gel electrophoresis (King 1966; Milkman 1966; Sved et al. 1967; Lewontin 1974).

Soft Selection and Variation in Fitness As with that debate, a possible resolution that avoids discarding purifying selection in favor of an alternative such as stabilizing selection is provided by the possibility that much selection is competitive in nature, that is, there is “soft selection” (Wallace 1968, pp. 427–435). Maynard Smith (1978, pp. 23–26) gave a lucid discussion of the distinction between “hard” and “soft” selection (note that Kondrashov [1995a,b] used the term in a rather different sense), and an explicit model of soft selection and its

relation to epistatic fitness effects has been analyzed by Peck and Waxman (2000). Soft selection implies that the absolute mean fitness of a population is dependent solely on the availability of the resources for which individuals compete. For example, with competition among males for mates, the number of breeding females and their breeding success determines the absolute mean fitness of the males, whereas differences in competitive ability cause variance in male fitness. Note that soft selection does not necessarily imply truncation selection, whereby only the portion of the population above a threshold value of a trait survives or reproduces. Truncation selection may be regarded as the most extreme form of soft selection, and results in the minimum genetic load compared with other forms of directional selection on quantitative traits (Shnol and Kondrashov 1994); it also induces a high degree of synergistic epistasis with respect to fitness (King 1966; Kimura and Maruyama 1966). In general, if we treat fitness as a function of the probability of success in obtaining access to a limiting resource, the mean fitness of the population relative to the fitness of a hypothetical optimal genotype that has a very low chance of being present in the population is essentially irrelevant. However, we need to be assured that the level of variation in fitness is not so large as to be implausible, but is large enough to be compatible with a selection coefficient of the order of 1/(4Ne ). This question can be examined as follows, by determining either the standard deviation of the natural logarithm of fitness or the coefficient of variation; these are approximately equal when the variance in fitness is small, as is the case here. Expressions for the genetic coefficient of variation of fitness and/or standard deviation of log fitness under both stabilizing selection and synergistic selection are derived in the Appendix. With the above Drosophila example of stabilizing selection on noncoding sequences with 4Ne Sa2 = 0.04, equation (A1b) and the discussion following equation (3) imply that the expected value of the coefficient of variation lies between 0.0071 and 0.014. These values are considerably smaller than the observed additive genetic coefficients of variation for individual fitness components in D. melanogaster, such as female fecundity, egg-to-adult viability, and male mating success, reported in Table 1 of Charlesworth and Hughes (2000), so that there is no difficulty in reconciling them with the probable, much higher level of genetic variability in fitness. Even with the human estimate of L ∗stab = 0.05, the coefficient of variation is 0.074, a rather modest value. Under a purifying selection model of selection in Drosophila, and setting γ = 1, π = 0.02, m = 14 × 106 , and Ne = 106 in equation (A3), the expected value of σln(w)pur is equal to 1.32 × 10−4 . Despite the high load in this case, the expected variability in fitness among individuals in the population is unobservably small, reflecting the fact that the load is caused mainly by sites that are fixed for deleterious variants, and not by sites that are

EVOLUTION 2013

5

B R I E F C O M M U N I C AT I O N

segregating and hence contributing to variation in fitness within the population. For a species like humans, with a much smaller effective population size and genetic diversity than Drosophila, plus a much larger potential number of sites under weak purifying selection, it might be thought that there would be a large range of fitness differences under this model. However, because σln(w)pur depends only on the square root of m π, and a low effective population size implies a low π value, this is not in practice a difficulty. Using values of π = 0.001 and N = 20,000, and a very generous estimate of 5 × 108 sites under selection, we obtain σln(w)pur of 0.0088, which is again very small, although potentially observable.

Discussion The results described here offer two different resolutions of the problem of the very high genetic load that is generated by the possible existence of tens or hundreds of millions of sites that are subject to such weak purifying selection that drift causes fixations of slightly deleterious mutations at numerous sites in the genome. First, selection may be stabilizing rather than purifying. Theory shows that, even if mutation is strongly biased and selection is weak, the population mean of a trait is held close to the optimal value (Charlesworth 2013). In this case, the load is generated primarily by the variance in the trait among individuals within a population. The high proportion of sites fixed for one or other variant, which represents the equilibrium state under weak selection, mutation, and drift (Kimura 1981), means that the genetic variance of the trait created by variants at segregating sites is relatively small, and so the load is relatively small. Second, purifying selection may be acting, but selection is competitive in nature (i.e., it is soft rather than hard). The most extreme form of this is truncation selection, which allows an almost limitless number of weakly selected sites to exist with a tolerable genetic load (Kondrashov 1995a). But even if fitnesses are multiplicative, the high levels of fixation at individual sites with sufficiently weak selection mean that tens or hundreds of millions of selected sites may plausibly coexist with a relatively minor amount of variation in fitness among individuals. This raises the question of how to distinguish among these possibilities. Given the severe difficulties in experimentally detecting the very minute selection coefficients discussed here, the most hopeful way forward is probably to devise tests of site frequency spectra for variants at the sites in question that can discriminate between purifying and stabilizing selection from population genetic data alone. Because stabilizing selection with mutational bias can produce effects that are essentially identical to those of purifying selection (Charlesworth 2013), this may be no easy task, if we rely simply on population genetic data. In any event, we can rest assured that there is no difficulty in principle in reconciling the

6

EVOLUTION 2013

continued existence of a population of higher organisms with the operation of weak selection on very large numbers of sites in the genome, when these are evolving independently of each other. The situation is, however, less clear when the rate of recombination is sufficiently low that linkage disequilibrium effects become important. Under stabilizing selection without mutational bias in very large populations, it is known that low rates of recombination enhance population mean fitness because they reduce the variance in fitness (Charlesworth 1993), so that the population is less challenged under both hard and soft selection, but the effect of mutational bias has not been investigated. The potential dangers resulting from the reduced efficacy of selection in finite populations with reduced recombination are well known (Otto and Lenormand 2002; Charlesworth and Charlesworth 2010). ACKNOWLEDGMENTS The author thank N. Barton, R. B¨urger, D. Charlesworth, S. Gl´emin, P. Keightley, A. Kondrashov, R. Lande, X.-S. Zhang, and an anonymous reviewer for their comments on this article. This work was supported by research grant BB/H006028/1 from the Biotechnology and Biological Sciences Research Council of the United Kingdom. LITERATURE CITED Bulmer, M. G. 1991. The selection-mutation-drift theory of synonomous codon usage. Genetics 129:897–907. B¨urger, R. 2000. The mathematical theory of selection, recombination and mutation. John Wiley, Chichester, U.K. B¨urger, R., and R. Lande. 1994. On the distribution of the mean and variance of a quantitative trait under mutation-selection-drift balance. Genetics 138:901–912. Charlesworth, B. 1990. Mutation-selection balance and the evolutionary advantage of sex and recombination. Genet. Res. 55:199–221. ———. 1993. Directional selection and the evolution of sex and recombination. Genet. Res. 61:205–224. ———. 2013. Stabilizing selection, purifying selection and mutational bias in finite populations. Genetics 194. doi: 10.1534/genetics.113.151555. Charlesworth, B., and D. Charlesworth. 2010. Elements of evolutionary genetics. Roberts and Company, Greenwood Village, CO. Charlesworth, B., and K. A. Hughes. 2000. The maintenance of genetic variation in life-history traits. Pp. 369–392 in R. S. Singh and C. B. Krimbas, eds. Evolutionary genetics from molecules to morphology. Cambridge Univ. Press, Cambridge, U.K. Crow, J. F. 1958. Some possibilities for measuring selection intensities in man. Hum. Biol. 30:1–13. Eory, L., D. L. Halligan, and P. D. Keightley. 2010. Distributions of selectively constrained sites and deleterious mutation rates in the hominid and murid genomes. Mol. Biol. Evol. 27:177–192. Falconer, D. S., and T. F. C. Mackay. 1996. An introduction to quantitative genetics. 4th ed. Longman, Lond. Haag-Liautard, C., M. Dorris, X. Maside, S. Macaskill, D. L. Halligan, D. Houle, B. Charlesworth, and P. D. Keightley. 2007. Direct estimation of per nucleotide and genomic deleterious mutation rates in Drosophila. Nature 445:82–85. Haldane, J. B. S. 1937. The effect of variation on fitness. Am. Nat. 71:337–349. Halligan, D. L., and P. D. Keightley. 2006. Ubiquitous selective constraints in the Drosophila genome revealed by a genome-wide sequence comparison. Genome Res. 16:875–884.

B R I E F C O M M U N I C AT I O N

Kimura, M. 1981. Possibility of extensive neutral evolution under stabilizing selection with special reference to non-random usage of synonymous codons. Proc. Natl. Acad. Sci. USA 78:454–458. Kimura, M., and T. Maruyama. 1966. The mutational load with epistatic gene interactions in fitness. Genetics 54:1303–1312. Kimura, M., T. Maruyama, and J. F. Crow. 1963. The mutation load in small populations. Evolution 48:1303–1312. King, J. L. 1966. The gene interaction component of the genetic load. Genetics 53:403–413. Kondrashov, A. S. 1995a. Contamination of the genome by very slightly deleterious mutations: why have we not died 100 times over? J. Theor. Biol. 175:583–594. ———. 1995b. Dynamics of unconditionally deleterious mutations: Gaussian approximation and soft selection. Genet. Res. 65:113–122. Kong, A., M. L. Frigge, G. Masson, S. Besenbacher, P. Sulem, G. Magnussen, S. Gukonsson, A. Sigurdson, A. Jonasdottir, A. Jonasdottir, et al. 2012. Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488:471–475. Lande, R. S. 1976. Natural selection and random genetic drift in phenotypic evolution. Evolution 30:314–334. Lande, R. 1980. Genetic variation and phenotypic evolution during allopatric speciation. Am. Nat. 116:463–479. Lesecque, Y., P. D. Keightley, and A. Eyre-Walker. 2012. A resolution of the mutation load paradox in humans. Genetics 191:1321–1330. Lewontin, R. C. 1974. The genetic basis of evolutionary change. Columbia Univ. Press, New York. Li, W.-H. 1987. Models of nearly neutral mutations with particular implications for non-random usage of synonymous codons. J. Mol. Evol. 24:337–345. Maynard Smith, J. 1978. The evolution of sex. Cambridge Univ. Press, Cambridge, U.K. McVean, G. A. T., and B. Charlesworth. 1999. A population genetic model for the evolution of synonymous codon usage: patterns and predictions. Genet. Res. 74:145–158. Milkman, R. D. 1966. Heterosis as a major cause of heterozygosity in natural populations. Genetics 55:493–495. Misra, S., M. A. Crosby, C. J. Mungall, B. B. Matthews, K. S. Campbell, P. Hradecky, Y. Huang, J. S. Kaminker, G. H. Millburn, S. E. Prochnik, et. al. 2002. Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biol. 3:Research0083. Otto, S. P., and T. Lenormand. 2002. Resolving the paradox of sex and recombination. Nat. Rev. Genet. 3:256–261. Peck, J. R., and D. Waxman. 2000. Mutation and sex in a competitive world. Nature 406:399–404. Shapiro, J. A., W. Huang, C. Zhang, M. Hubisz, J. Lu, D. A. Turissini, S. Fang, H.-Y. Wang, R. R. Hudson, R. Nielsen, et al. 2007. Adaptive genic evolution in the Drosophila genome. Proc. Natl. Acad. Sci. USA 104:2271–2276. Shnol, E. E., and A. S. Kondrashov. 1994. Some relations between different characteristics of selection. J. Math. Biol. 32:835–840. Sved, J. A., T. E. Reed, and W. F. Bodmer. 1967. The number of balanced polymorphisms that can be maintained in a population. Genetics 55:469– 481. The 1000 Genomes Project Consortium. 2010. A map of human genome variation from population-scale sequencing. Nature 467:1061–1073. Wallace, B. 1968. Topics in population genetics. W.W. Norton, New York. Ward, L. D., and M. Kellis. 2012. Evidence of abundant purifying selection in humans for recently acquired regulatory functions. Science 337:1675– 1678. Wright, S. 1931. Evolution in Mendelian populations. Genetics 16:97–159.

———. 1935a. The analysis of variance and the correlation between relatives with respect to deviations from an optimum. J. Genet. 30: 243–256. ———. 1935b. Evolution in populations in approximate equilibrium. J. Genet. 30:257–256. Zeng, K., and B. Charlesworth. 2010. Studying patterns of recent evolution at synonymous sites and intronic sites in Drosophila melanogaster. J. Mol. Evol. 70:116–128.

Associate Editor: S. Glemin

Appendix VARIATION IN FITNESS UNDER STABILIZING SELECTION

Under the quadratic deviations model, there is a nonadditive contribution to the genetic variance in fitness, even with additivity on the scale of the trait itself, which arises from terms contributed by the fourth moment of z (Wright 1935a). It is therefore necessary to obtain the mean square of the deviations of the fitnesses from 1 (given by S2 z4 ) directly from equation (1), to determine the fitness variance, by using the mean value of z4 = [(z − z¯ ) + z¯ ]4 over the distribution of genotypes in the population. Expanding this expression, and neglecting terms involving squares and higher powers of z¯ , because these are small even with mutational bias (Charlesworth 2013), the residual term is the mean of [(z − z¯ )4 + 4(z − z¯ )3 z¯ ]. Assuming normality of the distribution of z, the second term in this expression can be neglected, and the first term is equal to 3Vg2 . From the argument following equation (2b), the load is approximately SVg , so that the variance in fitness deviations is approximated by 3S 2 Vg2 − S 2 Vg2 = 2S 2 Vg2 , and the approximate standard deviation of fitness is √ σw stab ≈ 2 SVg . (A1a) This can be further approximated by √ σw stab ≈ 2 L stab ,

(A1b)

where Lstab is given by equation (2a). The genetic coefficient of variation of fitness is thus ap√ proximated by 2L stab /(1 − Lstab ). For purposes of numerical illustrations, the expected value of Lstab given by equation (2b) can be used. VARIATION IN FITNESS UNDER PURIFYING SELECTION

Using equation (4), with Vg  n¯ as expected for a finite population (Charlesworth 2013), the equilibrium within-population variance in the natural logarithm of fitness is given approximately by the ¯ + 0.5β(n2 − n¯ 2 ), which can mean of the square of α(n − n) ¯ − n)]. ¯ After ¯ + 0.5β[(n − n) ¯ 2 + 2n(n be rewritten as α(n − n)

EVOLUTION 2013

7

B R I E F C O M M U N I C AT I O N

some algebra, and assuming normality of the distribution of n (Charlesworth 1990), so that the third moment of n about n¯ can be neglected and its fourth moment about n¯ equated to 3 Vg2 , we find that 3 ¯ 2 Vg + (βVg )2 , Vln(w) ≈ (α + βn) 4

(A2a)

where the first term on the right-hand side represents the additive genetic variance in log fitness (the multiplicand of Vg is simply s2 , from equation (5)), and the second term is the nonadditive genetic variance. ¯ this expression can be rewritten as Because Vg < n, 3 ¯ 2. Vln(w) < s 2 Vg + (βn) 4

(A2b)

Writing the scaled equilibrium selection coefficient as γ = 4Ne s, we have s = γ/(4Ne ) and βn¯ ≤ γ/(4Ne ) (the equality holds only if

8

EVOLUTION 2013

α = 0), so that this becomes Vln(w)

Why we are not dead one hundred times over.

The possibility of pervasive weak selection at tens or hundreds of millions of sites across the genome, suggested by recent studies of silent site DNA...
185KB Sizes 0 Downloads 0 Views