MUTATION-SELECTION BALANCE WITH STOCHASTIC SELECTION* DANIEL L. HARTL Department of Biological Sciences, Purdue Uniuersity, West Lafayette, Indiana 47907 Manuscript received June 10, 1976 Revised copy received March 14, 1977 ABSTRACT

Diffusion theory has been used to analyze a model of mutation-selection balance in which the selection process is assumed to be stochastic i n time. The limiting outcome of the mutation-stochastic selection process is determined qualitatively by the geometric mean fitnesses of the genotypes, and the conditions for fixation o r polymorphism are similar to those that determine the outcome of the mutation-selection process when selection is constant. However, in the case of a completely recessive allele, detailed numerical study of the polymorphism associated with stochastic selection has shown that the average allele frequency maintained is greater than the equilibrium frequency expected when selection is constant, even when the geometric mean fitness of the recessive homozygotes is identical in the stochastic and deterministic models. Thus, allele frequencies in natural populations that are too high to be plausibly explained by a balance between mutation and constant selection can be accounted for if selection is stochastic.

classical result of the mathematical theory of natural selection is that a Arecessive allele held in a population by a b a l a n c e t w e e n mutation and selection will maintain an equilibrium frequency of .Jp/s, where p is the mutation rate to the recessive allele and s is the selection coefficient against recessiire homozygotes (CROWand KIMURA1970). This paper concerns the characteristics of mutation-selection balance when the selection coefficient varies temporally and randomly. Many authors have studied stochastic selection (HALDANE and JAYAKAR 1963; JENSEN and POLLAK 1969; HARTL and COOK1973, 1974, 1976; GILLESPIE1973a,b; GILLESPIE and LANGLEY 1974; LEVIKSON 1974; KARLINand LEVIKSON 1974; KARLINand LIEBERMAN 1974, 1975; COOKand HARTL 1974, 1975; LEVIKSON and KARLIN1975; NORMAN 1975) ; these studies have showu the cardinal Importance ol the geometric mean €itness in determining the qualitative outcome of selection. Indeed, theorems €or the qualitative outcome o€ zygotic and prezygotic selection based on geometric mean fitnesses are completely analogous to corresponding theorems in the case o i deterministic selection (HARTL 1975). * Work supported by Public Health Service grants GM21732 and GM21623 and by Natlonal Sclence Foundation grant PCM74-19708. The author IS reclplent of Research Career Award GM0002301 Genetics 86: 687-696 July, 1977.

688

D. L. H A R T L

Some results of the present paper confirm the importance of the geometric mean fitness in determining the qualitative outcome ol the mutation- stochastic selection process. However, from a quantitative point of view: the important question pertains to the case in which the mutation-selection balance generates a polymorphic stationary distribution of gene frequency: how does the average gene frequency in this equilibrium stationary distribution compare to the equilibrium gene frequency in the corresponding model with deterministic selection? Specificially, if the geometric mean fitnesses are identical in the stochastic and deterministic models, are the average equilibrium gene frequencies the same? The present inquiry shows that the equilibrium average frequencies are not the same. In particular, the average equilibrium h-equency of a deleterious recessive allele in the stochastic model is greater than the equilibrium frequency in the deterministic model. That is to say, the average equilibrium frequency is greater than d p / < s >,' where 1 - < s > denotes the geometric mean fitness of the recessive homozygotes. The methods used in the analysis are the diffusion methods of LFVIKSOK (1974) and LEVIKSON and KARLIN (1975). (See also FELLER 1954.) Although the theory is applicable t o all levels of dominance, the special case studied in detail pertains to a complete recessive. THE M O D E L A N D I T S D I F F U S I O N A P P R O X I M A T I O N

Let A and a be two alleles at a locus in an infinite, diploid, random mating population. In the deterministic model the genotypes AA, Aa and aa are assumed to have constant fitnesses wll, wle and wZ2,respectively, and mutation from A to a i s assumed to occur at the rate p per generation. Then pq[p(w11-w1z) q(w12-wz2)I - pp(pw11+ qw22) Ap = p2w11 2pqw12 q2w22

+

+

+

where p is the frequency of A and q (=l-p) is the frequency of a. In the diffusion model, we let the fitnesses of AA, Aa and aa at time t be the random variables wit), w l f ) and w::), respectively. The w(fivalues are possibly correlated. with each other, but assumed to be independent from generation to 11 7 I+S(t,) = w 1(2 ~and ) l+SC,t) generation. For convenience define l+S(,t)= zdt) w(,t2). Let time be measured in units of At, each At equal to one generation. Assume that ES,=M,At, ESt=ViAt and ES,S,=Ri,At, with all higher moments o(At).

Assume now that the mutation rate from A to a is given by p ( A t ) , where = p, a constant, and limit p ( A t ) = 0 and limit [P(At>l 2 = 0. Let At+O At+O At p ( t ) and q ( t ) denote the frequencies of A and a at time t. Then, by similar arguments as lead to ( 1) ,

limit

At+O

P (At)

~

At

pq[p(Si--Sz) f ~ ( S Z - S ~- )p]( A t ) p ( l +PSI +qSz) p(t+At)-p(t) =--1 p2s1+ 2pqsz q2s3

+

+

(2)

MUTATION-SELECTION

BALANCE

689

where, on the right-hand side, p and q mean p ( t ) and q ( t ) and the superscripts ( t ) and the Si's have been suppressed. For the diffusion approximation, v (p)-the drift function-is equal to 1 limit - E [ p ( t f A t ) - p ( t ) Ip(t)=p] and 2(p)-the diffusion function-is At+0

At

1 equal to limit - E [ p (&At) -p ( t )Ip ( t )=p] 2 . The drift function is obtained At+0 At from (2) by expanding the denominator in a Taylor series and evaluating the limit. (See APPENDIX.) The result is

(PI

+

PQ [pM1+ ( 4 - p ) Mz-qM3-p3V1-2pq (4-p) V2+q3V3 +pz (1-4q) R,,+pq (p--q)Ri3+q2 (4p-1) R23I .

= -PP

From the square of ( 2 ) and another Taylor series expansion (see APPENDIX) one obtains g2

(PI = P2q2[Pzvv,+ (p-9) 'Vv,+4'vv,-2p(p-~)Ri,-2pqR,,f2q (p-4) RmI .

Both of the above expansions require that lp2S,+2pqS,+q2S3 I < 1, so, in this sense, S,, S,, and S3 must be small. Biologically, this assumption means that the fitnesses fluctuate randomly around near neutrality. LIMITING RESULTS

The function s ( p ) = e x p ( - 2 J p ~ ( x ) / ~ ~ ( x ) d x can ) be used to evaluate the Po local stochastic stability of the mutation-stochastic selection process ( PROHOROV and ROZANOV 1969; see FELDMAN and ROUGHGARDEN 1975, and LEVIKSON 1976, for applications in different contexts). A boundary is attracting (locally stochastically stable) if s(p) is integrable near the boundary; otherwise the boundary is repelling (stochastically unstable). Partial fraction expansion of v (x)/u2 (x) and integration yields

where g ( p ) is continuous and bounded away from zero on [0,1] and

C=

-p.

V1+V2-2R,,

.

Since p > 0, the boundary p = 1 is always stochastically unstable because < 0. The condition f o r local stochastic stability of p = 0 is -2A > -1, which is to say p > (M2-V2/2)-(M3-V3/2) or, to the order of the diffusion approximation, p > Eln(w1,/zuz2).

C

690

D. L. H A R T L

Consider the function S ( p ) = j p s ( z ) d z . Given p ( 0 ) = po and two constants Po and 8, with 0 < E 5 po 5 1-8 < 1, then Prob{p(t) reaches 1-8 before reaching E I po, 0 < E I po 5 1-8 < l } = [S(po)-S(E>]/[S(l-S)-S(E)] = L ( E , S ) (FELLER 1954; LEVIKSON 1974). I n the case p > (M2-V2/2) - (M3-V3/2), S ( 1-8) w as 8 + 0, whereas S (E ) remains bounded as E + 0. Therefore L ( E , ~ +)0 as E,S 0. Consequently, p ( t ) 0 almost surely. (See LEVIKSON 1974 and LEVIKSON and KARLIN1975.) as 8 0 and S ( E ) 00 In the case p < (M,-V,/2)-(M,-V3/2), S(1-8) as E + 0. Both boundaries are repelling, and a globally stable stationary distribution of gene frequency exists if and only if l/u2(p)s(p) is integrable. Now, l/u2(p)s(p) = ~ 9 ” - ~-p)-zB~2e~p[2CJ(l--p)] ((l g* ( p ) where g* ( p ) is continuous on [0,1] and bounded away from zero. This function is integrable around 0 if and only if -2A Eln ( wl2/w2,), then p + 0 almost surely. , ~ ) ,p converges to a stationary distribution. Case 2. If p < E h (Z U ~ ~ / Wthen -~ ~ )process , is recurrent but no stationary Case 3. If p = Eln ( w , , / L u ~the distribution exists.

~

The geometric mean fitness of any genotype is simply eE’n(l+ss). This can be n

shown as follows. Since the geometric mean (GM) is by definition limit II n+ m 311 1 ” [ l+S(j)]+, we have In(GM) = limit - z ln[l+S(j)],in which the right-hand z 7k+ m n ?=I 2. side is obviously Eln(l+S,). To the order of the diffusion approximation, Eln (1+S,) = M, - V,/2, so GM = ey%-vJ2 to the same order. SPECIAL CASES

+

When it exists, the stationary distribution has the form ( p ) = [cnst/u’(p)] e x p [ 2 S v ( p ) / ~ ~ ( p ) d p(STRATONOVICH ] 1963), where the constant is chosen so that S l , + ( p ) d p = 1. Case when a is completely recessive: Let the fitnesses of AA, Aa and aa be given by 1, 1 and l+S,, respectively, where S3 is a random variable with mean M , and variance V3.We have M,=M,=Vl=V2=R,,=Rl,=R2,=0 and v(p) =

691

MUTATION-SELECTION BALANCE

+

pq2(,-M3+q2V3) and u2( p ) = p2q4V3.A stationary distribution exists if and only if ,p < -M3 V3/2.Here -pp

+

e%-v~/2 to the The geometric mean fitness of aa homozygotes is e E 1 n ( 1 + x 3 ) order of the diffusion approximation. Consider, for comparison, a deterministic model with the fitnesses of AA, Aa and aa given by 1, 1 and 1-s, respectively. The equilibrium frequency of a in the deterministic model is 4= d p / s , and the geometric mean fitness of aa homozygotes is 1-s. To make the geometric mean fitness of aa the same in the stochastic and deterministic models, set 1 - s ~ e','v,/2, o r In (1-s) = M3-v3/2. An extreme case to consider is M,=O: that is, a case in which selection acts against aa homozygotes only through the variance in fitness. Figure 1 shows the stationary distribution of 4,the frequency of the a allele, for the case ,p=O.OOOl, M3=0 and ~3=0.020100672as calculated numerically as described in the legend

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Frequency of Recessive Allele

FIGURE 1.-Stationary distribution of the frequency, q,of a recessive allele subject to mutation and stochastic selection. In this example, p = 0.0001, M 3 = 0 and V , = 0.020100672, giving a geometric mean fitness of the recessive homozygote of 0.99. The maximum ordinate of the distribution is 7.2. The mean (arrow) is 0.21, the made is about 0.14. The equilibrium gene frequency in the corresponding deterministic model is 0.10. A tiny part of the distribution near q = 1 has been truncated. Note from equation (3) that as p-tO (i.e., q + l ) , @ ( p ) + W . The rate of growth of @ is extremely slow, however. The ordinate of (s for p = 0.86 is @ = 7.174; for p = 10-5, @ = 0.014; for p = lO-IO, @ = 0.016; for p = lez5, @ = 0.023; for p = lG-50, + = 0.040; for p = 10-99, @ = 0.1234. The part of @ near p = 0 contributes negligibly to the total area, as shown by the fact that estimates of the area are virtually identical whether integrated numerically from p = 10-5 or p = 10-25.

D. L. HARTL

n5 .

0.9999

0.9995 0.999 0.995 0.99 Geometric Mean Fitness of Homozygote

0.95 0.90

FIGURE 2.-Equilibrium allele frequencies of a recessive allele (“constant selection”) and average allele frequencies of the stationary distribution of a recessive allele (“varying selection”) for the case p = 0.00001 when the geometric mean fitnesses of the recessive homozygotes are as shown on the abscissa. Here M , = 0, so selection acts against recessive homozygotes only through the variance in fitness. Note that the average frequency maintained by stochastic selection is greater than the equilibrium frequency maintained by constant selection. Numerical integrations required in order to obtain data for the Figure were carried out using Simpson’s rule and also using a standard program (DCADRE) for cautious Romberg extrapolation. Integrals were calculated over the range 90 =le25 to q1 = l-qo, and the results from both numerical procedures agreed in detail.

of Figure 2. The maximum ordinate of the distribution in Figure 1 is about 7.2. The mean of q (arrow) is 0.21; the mode is about 0.14. Note the extremely stretched out right-hand tail of the distribution. The deterministic selection coefficient corresponding to this stochastic model is the value of s in In ( 1 -s) = M,--T/&?; in particular, s=O.Ol. I n the deterministic model the equilibrium fre0.10. Thus, the average frequency of the recessive allele quency of q is in the stationary distribution of the stochastic model is considerably higher that the equilibrium frequency in the corresponding deterministic model, despite the fact that the geometric mean fitnesses are the same. Indeed, even the modal gene frequency in the stochastic model is higher than the equilibrium frequency in the deterministic model. Figure 2 shows a more detailed comparison between the stochastic and deterministic models. Here the mutation rate is taken to be p. = The curve marked “constant selection” gives the values of d p / s , where 1-s is constant and equal to the geometric mean fitness of aa homozygotes. The curve marked

dx=

-

693

MUTATION-SELECTION BALANCE

“varying selection” presents the average frequency of the a allele in the stationary distribution of the stochastic model where M3=0 and where V , is such that the geometric mean fitness of a aa homozygotes is e-Vd2.As can readily be seen in Figure 2, over a wide range of selection intensities, the average allele frequency of the stationary distribution in the stochastic case is always greater than the equilibrium frequency in the corresponding deterministic case. Indeed, for strong selection intensities, the percentage by which the equilibrium average frequency is increased by stochastic selection is very large. In the deterministic case s=O.Ol, for example, the equilibrium frequency is about 0.032; in the corresponding stochastic case the average frequency at equilibrium is about 0 . 0 9 5 - a n increase of nearly 200%. Case of semidominance: Let the fitness of AA, Aa and aa be 1 , 1+S3/2 and I f s , , respectively, where S, has mean M , and variance V , ; the fitness of the heterozygote is then exactly intermediate between the fitnesses of the homozygotes. I n this case, Ml=Vl=Rl,=R13=0, Mz=M3/2, V,=V3/4 and R,,=V3/2. A stationary distribution exists if and only if p < ( - M 3 / 2 ) ( 3 V 3 / 8 ) .The drift M -V M -V v 3 and diffusion functions are v ( p ) = p [ - p p 2 p21

+

(+)

+ (+)

+

v 3

and 2 ( p ) = - p p ” ( 1 - p ) 2 . Provided it exists, the stationary distribution has the 4 form

Case of complete dominance: Let the fitnesses of AA, Aa and aa be 1, 1+s3 and l+S3, respectively, where S3has mean M , and variance V 3 ;the a allele is therefore dominant to A . Then MZ=M3 and V,=v/,, so fi > ( M 2 - - V 2 / 2 ) - (M,-V3/2) =O. That is to say, p + 0 ( q + 1) almost surely. This answer seems at first to be silly: it says that a deleterious allele subject to stochastic selection but favored by mutation will eventually become fixed. However, consideration of the deterministic model provides a rationalization for the behavior. The deterministic model with fitnesses of AA, Aa and aa given by 1, I-s and 1-s, respectively, has four equilibria (when s > JL): = 0, 4, = [s( 1+ p ) +d( 1 + p ) 2s2-4ps] /2s, 4 3 = [ s (1+ p ) -d( 1 +p) 2s2-4ps1/2s7 4 4 = 1 . If s 2 5 0 p , then these roots are approximately Q1 = 0, ij2= p/s,G3 = 1-p/s, G4 = 1 . The equilibrium is unstable, Qz is locally stable, Q3 is unstable, is locally stable. The domain of attraction of Bz is(0, l - p / s ) ; the domain of attraction of G4 is ( 1 -p/s, 1 ) . Thus, if q E ( 1 -p/s, 1 ) , then q 4 1 . I n the corresponding stochastic model, there is a neighborhood U around 1 such that, if q E U , then q + 1 almost surely. The peculiar limiting result in the stochastic case merely says that, with stochastic selection, the allele frequency q will euentually enter U and, having done sol, will almost surely converge to 1 .

694

D. L. H A R T L DISCUSSION

It is generally assumed that a balance between mutation and deterministic selection is unlikely to maintain an equilibrium frequency of an allele of more than about 1% ; thus 1% is conventionally taken as the frequency above which 1974; FORD1975). Howa n allele is considered to be “plymorphic” (LEWONTIN ever, as shown by the theory in the present paper, stochastic selection can maintain average allele frequencies that are substantially higher than those sustained by deterministic selection, even when the geometric mean fitnesses of the genotypes are identical in both models. It seems to follow that a balance between mutation and stochastic selection ought to be considered as a plausible mechanism for the maintenance of some polymorphisms in natural populations, especially of polymorphisms having alleles with frequencies in the range of, say, 1-10%. SCHWARTZ (1969) has presented one mechanism of stochastic selection in corn. He found that seeds homozygous f o r an alcohol dehydrogenase null allele were unable to germinate if immersed in water for several days, which he interpreted as meaning that such seeds are incapable of carrying on the degree of anaerobic respiration necessary for viability under excessively wet conditions such as occur in nature after flooding or following a heavy rainfall. The mutation studied had been induced by the mutagen ethylmethane sulfonate, but SCHWARTZ correctly emphasizes the relevance of these results in the context of allozyme polymorphisms and temporally varying selection. LITERATURE CITED

COOK,R. D. and D. L. HARTL, 1974 Uncorrelated random environments and their effects on gene frequency. Evolution 28: 265-274. __ , 1975 Stochastic selection in large and small populations. Theor. Pop. Biol. 7:55-63. CROW,J. F. and M. KIMURA,1970 An Introduction to Population Genetics Theory. Harper and Row, New York. M. W. and J. ROUGHGARDEN, 1975 A population’s stationary distribution and chance FELDMAN, of extinction in a stochastic environment, with remarks on the theory of species packing. Theor. Pop. Biol. 7:197-207.

FELLER, W., 1954 Diffusion processes in one dimension. Trans. Amer. Math. SOC.77:1-31. FORD,E. B., 1975 Ecological Genetics. Wiley, New York. GILLESPIE,J., 1973a Polymorphism in random environments. Theor. Pop. Biol. 4: 193-195. -, 1973b Natural selection with parying selection coefficients-a haploid model. Genet. Res. 21 : 115-120. GILLESPIE,J. H. and C. H. LANGLEY, 1974 A general model to account for enzyme variation in natural populations. Genetics 76 : 837-848. J. B. S. and S. D. JAYAKAR, 1963 Polymorphism due to selection of varying direction. HALDANE, J. Genet. 58: 237-242. HARTL,D. L., 1975 Stochastic selection of gametes and zygotes. Pp. 233-242. In: Gamete Competition in Plants and Animals. Edited by D. L. MULCAHY. North-Holland, Amsterdam. HARTL,D. L. and R. D. COOK,1973 Balanced polymorphisms of quasi-neutral alleles. Theor. 1974 Autocorrelated random environments and their Pop. Biol. 4: 163-172. --, effects on gene frequency. Evolution 28: 275-280. __ , 1976 Stochastic selection and the maintenance of genetic variation. Pp. 593-615. In: Population Genetics and Ecology. Edited by S. KARLINand E. NEVO.Academic Press, New York.

MUTATION-SELECTION BALANCE

695

JENSEN, L. and E. POLLAK, 1969 Random selective advantages of a gene in a finite population. J. Appl. Prob. 6: 19-37.

KARLIN,S. and B. LEVIKSON,1974 Temporal variation in selection intensities: Case of small population size. Theor. Pop. Biol. 6: 383-412. KARLIN,S. and U. LIEBERMAN, 1974 Random temporal variation in selection intensities: Case of large population size. Theor. Pop. Biol. 6 : 355-382. -, 1975 Random temporal variation in selection intensities: one-locus two-allele model. J. Math. Biol. 2 : 1-17. LEVIKSON, B., 1974 The effects of random environments on the evolutionary process of gene frequencies: A mathematical analysis. Ph.D. Thesis, Tel-Aviv University, Israel. -, 1976 Regulated growth in random environments. J. Math. Biol. 3: 19-26. LEVIKSON, B. and S . KARLIN,1975 Random temporal variation in selection intensities acting on infinite diploid populations: Diffusion method analysis. Theor. Pop. Biol. 8: 292-300. LEWONTIN, R. C., 1974 The Genetic Basis of Euolutionary Change. Columbia University Press, New York. NORMAN, M. F., 1975 An ergodic theorem for evolution in a random environment. J. A d . Prob. 12: 661-672. PROHOROV, Yu. Vo and Yu. A, ROZANOV, 1969 Probability Theory. Springer-Verlag, New York. SCHWARTZ, D., 1969 An example of gene fixation resulting from selective advantage in suboptimal conditions. Am. Naturalist 103 : 479-481. STRATONOVICH, R. L., 1963 Topics in the Theory of Random Noise. Vol. 1, Gordon and Breach, London. Corresponding editor: R. C. LEWONTIN

APPENDIX

Derivation of the drift function v ( p ) and the diffusion function u 2 ( p ) ~ ( p= ) limit At+ 0

G(p)

1

= limit - E [ p ( t + A t ) - p ( t ) At+O

At

Ip(t)=p12

696

D. L. HARTL

The process converges weakly to a diffusion because limit

At+O

Af

E[p(t+At)-p(t)lp(f)=pIi

= 0 for i 2 3.

Note that the Taylor series expansions carried out above require that 1.

lP2S,f2P9S2f92S31

Mutation-selection balance with stochastic selection.

MUTATION-SELECTION BALANCE WITH STOCHASTIC SELECTION* DANIEL L. HARTL Department of Biological Sciences, Purdue Uniuersity, West Lafayette, Indiana 47...
546KB Sizes 0 Downloads 0 Views