THEOKETICAL

POPULATION

THEORY

The Variance B. S.

WEIR,

37, 235-253 (1990)

of Sample

Heterozygosity’

JOHN REYNOLDS,' AND K.

Department of Statistics, BO.Y 8203, Raleigh,

North Carolina North Carolina

G.

DODDS~

State Unitlersitj 27695-8203

Received July 5, 1989

The variance of sample heterozygosity, averaged over several loci, is studied in a variety of situations. The variance depends on the sampling implicit in the mating system as well as on that explicit in the loci scored and individuals sampled. There are also effects of allelic distributions over loci and of linkage or linkage disequilibrium between pairs of loci. Results are obtained for populations in drift and mutation balance, for infinite populations undergoing mixed self and random mating, and for linite monoecious populations with or without selling. For unlinked loci in drift/mutation balance, variances appear to be lessened more by increasing the number of loci scored than by increasing the number of individuals sampled. For infinite populations under the mixed self and random mating system, however, the reverse is true. Methods for estimating the variance of sample heterozygosity are discussed, with attention being paid to unbalanced data where not all loci are ‘(-’ 1990 Academic Press, Inc scored in all individuals.

INTRODUCTION

It is convenient to be able to characterize the amount of genetic variability in a population with a single summary statistic. Data sets are routinely summarized by means of heterozygosity, either on a single-locus basis or as an average over several loci. In this paper we are concerned with the sample variances of such statistics, with attention being paid to both the sampling implicit in the mating system (“genetic sampling”) and the sampling explicit in the choice of individuals and loci sampled (“statistical sampling”). Two perspectives will be adopted: expected varian’ Paper Number 12270 of the Journal Series of the North Carolina Agricultural Research Service, Raleigh, North Carolina 27695-7601. This investigation was supported in part by NIH Grant GM11546. ’ Present address: Biometric Services, DARF, P.O. Box 500, East Melbourne, Victoria 3002, Australia. ’ Present address: Statistics and Computing Section, lnvermay Agricultural Center, Private Bag, Mosgiel, New Zealand.

235 0040-5809/90 $3.00 Copyright \t 1990 by Academx Press, Inc All rights of reproductmn m any form reserved

236

WEIR, REYNOLDS, AND DODDS

ces will be formulated as an aid to designing sampling schemes, while estimation methods will aid in the interpretation of data already collected. The present treatment of heterozygosity complements earlier studies of inbreeding (Weir et al., 1980), linkage disequilibrium (Weir and Hill, 1980) and gene diversity (Weir, 1989). This paper extends some previous results of Lessard (1981). In the special case where all alleles in an initial population are unique, subsequent homozygosity is equivalent to inbreeding and the treatment of variance of actual inbreeding (Cockerham and Weir, 1983) is appropriate.

PREDICTING VARIANCES

The frequency of heterozygotes, or heterozygosity, at some locus 1 is written as H,, while the value observed in a sample is fi,. If m loci are scored, the average heterozygosity is

The corresponding population value is written as H. Within Populations

When a sample of size n individuals is taken from a population, the count of heterozygotes is binomially distributed over repeated samples from the same population. Within populations, then, the mean and variance of sample heterozygosity are c&d, = H,

Var,(n,)=A

H,(1 -H,).

Properties of average heterozygosity are found most easily with indicator variables, x,,, defined for individual j and locus 1 as 1 X,I = 0

if individual is heterozygous otherwise.

Taking expectations over all samples from the same population gives c&x,, = H, r&,x; = H,

VARIANCE

231

OF HETEROZYGOSITY

cFwxj,xI,/ = H;,

j’#j

&wx,,x,,. = H,,,,

I’#1

b,xj,xj,,,

j’#j,l’#l.

= H,H,,,

These expectations reflect the independence of individuals within a sample, and H,, H,. are one- and two-locus heterozygosities for the population sampled. Expressing average heterozygosity in terms of the indicator variables

allows the variance within populations to be found as Var,(li)=--$[xVar,(R,)+Z /

1 Cov,(R,, / I’#/

=-!&H/W,)+& I

R,,)]

1 (H,r-H,H,,). I /‘#/

(1)

Note that Eq. 1 applies to repeated samples from a single population. Between Populations

It is more likely that a sample heterozygosity will be wanted for making inferences for all populations maintained under the same conditions as that sampled. In other words a random, rather than a fixed, model is likely to have more biological relevance (Cockerham and Weir, 1986). Expectations now refer to repeated samples and to repeated populations, with genetic sampling contributing to the latter aspect. We will employ the same symbols, H and H,, to refer to average and single-locus heterozygosities, but they now refer to expected values over replicate populations as well as over samples. The same indicator variable can be introduced, but there is a difference in the (total) expectations:

&TX,,= H, c&x; = H, 8~Xi,Xi., = M,,

j’#j

&xi/x,,.

= H ,,s,

I’#1

&x,,x,‘,‘=

M,,,,

j’#

j,l’#l.

238

WEIR,

REYNOLDS,

AND

DODDS

The genetic sampling causing replicate populations to differ is reflected in the covariance between individuals within populations, and necessitatesthe quantity M, in place of HT. At a single locus, the sample heterozygosity is still unbiased, but the total variance is not simply a binomial variance and does not disappear as the sample size n becomes very large: &“rB, = H, (2)

Var,(fi,)=(M,-Hf)+i(H,--M,). For an average over m loci, the total variance becomes

The first term in square brackets was identified as the between-populations component of variance by Lessard (1981). We partition components a little differently below. Lessard did not explicitly take into account the effects of finite sample size IZ. It is convenient to introduce components of variance for each of the recognizable sources of variation that contribute to the total variance of heterozygosity. Populations and individuals are assumed to be random effects, but loci are fixed since the same set of loci will generally be scored in subsequent samples. The variance components, for m > 1, can be identified with terms in a linear model for the indicator variable xii, defined for locus I in the jth individual sampled from the ith population. This variable takes the value 1 if that individual is heterozygous at that locus, and 0 otherwise, and can be expressed as

with total variances for this particular set of loci Var.(a,) = f$ Vard,)

= gf/,

Var,C(ayLl=

0;

VarTC(PY),,,l= 4,.

VARIANCE

OF HETEROZYGOSITY

239

The locus term y, has zero variance. In terms of total expectations, note that cq~f,, = H, GX,,,X,,I’ = H,,,,

I’#1

~~xi,,x,i., = M,,

j’#j

&~xij,xoy = M,,,,

j’#

&xijlxi.,.,=

i’#i

H;,

j,l’#I

i’#i,l’#I,

&x,,,x,,,,r = HA,,,

since different populations are assumed to be independent but to have the same expected heterozygosity. When there is more than one locus, m > 1, the variance components are populations a6=m(m-l)l

Ml,, - H,Hr) CC( , ,,+,

individuals within populations 1 2 of’p=m(m-l)

c c (H/r - MO,) , ,,+[

loci by populations o;,=;#M/-Hf)-

,(f-

1) 1 1 CM,,, - H,Hr) / I’#/

loci by individuals within populations

~z~=~~(HI-M,)-,I,‘-~)~ c (H,,s-MO,). I /‘#/ 2

When only one locus, I, is being considered, there are only two components: populations 2

a*u) =a~+a~=M,-Hf individuals within populations aL, =ai,+a&,=H,-M,.

240

WEIR,

REYNOLDS,

AND

DODDS

While this formulation allows the total variance of the indicator variable to be written as

the main advantage is the ease with which variances of averages may now be written down. The average heterozygosity for population i is

so that

which is the same as Eq. (3). Note that this total variance includes the variation among populations. Note also that, in general, the variance can be decreasedby increasing both the number of individuals sampled and the number of loci scored. Similarly, the sample heterozygosity at locus I in population i is 1 II A, = L 1 Hi,,

x,,

and so has total variance

VarA&)=&, +$oflpc,,, which is the same as Eq. (2). Such expressions clarify the role that each source plays in determining the total variance of heterozygosity. They allow sample sizes n, m to be chosen so that future samples will have desired precision. Of course such statements about future samples must necessarily be phrased in terms of the random model, as opposed to the fixed model. Specific cases will be considered in the next section.

SPECIAL

CASIO

The effectsof specific mating systems or allelic distributions can be found most readily by expressing heterozygosities in terms of descent,measures and allelic frequencies. We use the descent measures, shown in Table I,

VARIANCE

OF HETEROZYGOSITY

241

TABLE I Descent Measures Measure”

Identity Relation’

” The one-locus notation follows Cockerham (1971) and the two-locus notation follows Weir, Avery and Hill (1980) (except that in that paper the measures were for double non-identity). Distinct individuals are denoted by X and Y. h An equivalence sign denotes identity by descent. Nothing is implied about genes not shown to be identical. Lower-case letters denote genes at different loci, and primes denote distinct genes. Upper-case subscripts indicate the individuals containing the genes.

which measure identity relative to an infinite reference population mating at random and in Hardy-Weinberg and linkage equilibrium. As examples of the notation introduced in that table, d, is the probability that random individual X has two identical genes, a, a’ at one locus and so does individual Y, while A: is the probability that X has two identical genes a, a’ at one locus and individual Y has two identical genes b, b’ at another locus. If p,, is the frequency of allele u at locus I, then the gene diversity at that locus is written as dI and defined by

and there is need for the complement of the sum of cubes g/= 1 -c

u

Pf..

The various heterozygosities are derived in the Appendix, and can be written as

242

WEIR, REYNOLDS,

AND DODDS

H,=(l-d,)rl, M,=2(48,-811,

-34, +76,)d,

+(1-28,-4e2+81!,+24,+d,-66,)cif -4(&-27,-d,

(4)

+26,) g,

H,,,=(1-28,+O,)d,d,,,

I’#1

M,,.=(1-26),+d:)d,d,,,

l’#l.

The two-locus frequencies should have the descent measures subscripted to indicate that they pertain to the particular pair of loci, 1 and I’. Averaging over loci is greatly simplified, however, if linkage between loci is assumed to be independent of allelic frequencies at the set of m loci being considered. There is no problem when all loci are inlinked since then all values of a two-locus measure are the same. Otherwise we work with averages of the descent measures over all pairs of loci, but retain the same symbols for those average measures. Likewise we work with average gene diversities. Special Allelic Distributions

If all alleles in the reference population are unique, homozygosity becomes the same as inbreeding. Setting the sums of squares and higher powers of allelic frequencies to zero (i.e., d, = g,= 1) in Eqs. (4) provides the variance

m> 1. +~(O,-dr)+~(e,--d*-O,+df),

This result was found previously (Cockerham and Weir, 1983) for the variance of inbreeding. By way of contrast, consider the case where every locus has two equally frequent alleles. In that case d, = 0.5, g, = 0.75, and the variances become Var,(R,)=i(n$+-0:)+-&(24,+4,-26,-d:)

ma 1. Finite Random Mating Populations

For monoecious populations of size N mating at random, we have previously given transition equations for the descent measures when selling

TABLE II Standard Deviation” of Average Heterozygosity for a Monoecious Population Mating at Random Infinite number of alleles Generation Population size (N)

Two equally frequent alleles Generation

Sample size (n)

N

5N

10N

N

5N

ION

1*

(H’ 1 1 10 100

(0.500) 0.500 0.158 0.050

(0.031) 0.174 0.127 0.122

(0.001) 0.03 1 0.023 0.022

(0.250) 0.433 0.214 0.253

(0.016) 0.124 0.091 0.087

(0.001) 0.022 0.016 0.016

100

(HI 1 10 100

(0.606) 0.489 0.205 0.149

(0.082) 0.274 0.177 0.164

(0.007) 0.081 0.055 0.052

(0.303) 0.460 0.223 0.184

(0.041) 0.198 0.131 0.122

(0.003) 0.058 0.039 0.037

10,000

(H) 1 10 100

(0.607) 0.489 0.205 0.150

(0.082) 0.275 0.177 0.164

(0.007) 0.082 0.055 0.052

(0.303) 0.460 0.223 0.183

(0.041) 0.198 0.131 0.122

(0.003) 0.058 0.039 0.037

UAs a multiple of &, where M unlinked loci are used. h I.e., self mating. ’ Expected heterozygosity at any locus.

TABLE III Standard Deviation” of Average Heterozygosity for a Monoecious Population Mating at Random with Selling Excluded Infinite number of alleles Generation Population size (N)

Two equally frequent alleles Generation

Sample size (n)

N

5N

10N

N

5N

10N

2”

(H’) 1 10 100

(0.750) 0.433 0.217 0.181

(0.141) 0.348 0.255 0.244

(0.017) 0.129 0.099 0.096

(0.375) 0.484 0.282 0.253

(0.070) 0.256 0.193 0.185

(0.008) 0.09 1 0.071 0.068

100

(H) 1 10 100

(0.610) 0.488 0.205 0.150

(0.083) 0.276 0.178 0.166

(0.007) 0.083 0.056 0.053

(0.305) 0.46 1 0.224 0.184

(0.042) 0.200 0.132 0.123

(0.003) 0.059 0.040 0.037

10,000

(H) 1 10 100

(0.607) 0.486 0.205 0.150

(0.082) 0.275 0.177 0.164

(0.007) 0.082 0.055 0.052

(0.303) 0.460 0.223 0.183

(0.041) 0.198 0.131 0.122

(0.003) 0.058 0.039 0.037

” As a multiple of &, where m unlinked loci are used. ’ I.e., sib mating. ‘ Expected heterozygosity at any locus.

243

244

WEIR,

REYNOLDS,

AND

DODDS

is either allowed, in random amounts, or not allowed (Cockerham and Weir, 1983; Weir et al., 1980). Our previous work suggeststhat results for monoecy with selling excluded give a good guide to those for dioecious populations of the same size. The numerical results found from such equations lead to the results shown in Tables II and III. The two-ellele results assume that the alleles are equally frequent. Note that samples now consist of n related and inbred individuals. The variance of heterozygosity in finite populations was also considered by de1Castillo et al. (1986). These authors did take into account the variation between populations, but ignored the effectsof finite samples. In other words, they ignored the within-population sampling. Tables II and III show that we confirm their observation that the variance may increase over time and then decrease, but this effect is swamped by within-population sampling for small sample sizes. Drif/Mutation

Balance

If mutation to new alleles, at rate p per gene per generation, is added to the drift model (i.e., random mating in a finite monoecious population), the descent measures have non-zero equilibrium values. These values were derived by Lessard (1981). With random mating, including a random amount of selling, it is not necessary to subscript the various descent measures since then they do not depend on the arrangement of genes within individuals. If we write 4 = 4Np, then the one-locus descent measures at drift/mutation equilibrium are 1 “=1+p

?=(I+4)(2+4) 1 “(1

+d(2+~3+d)

6+4 “(1+~)(2+~3+d)’ For unlinked loci, the two-locus measures at equilibrium are o = 1+ 42/3N (1 +4)’ A*=

1 +q++qd2/2N2 (1 +d)’

.

VARIANCE

245

OF HETEROZYGOSITY

For large N, the last two equations show that 0 = A* = O2 so that HNS= M,, = H,H,, and the components of variance of heterozygosity for populations and individuals within populations are zero when m > 1. When N is this large

=S~[(M,-H:)+~(H,-rM,)],

m>,l.

(5)

Evidently the total variance is going to be reduced primarily by increasing the number of loci scored. Increasing the number of individuals sampled will have relatively little effect. This also holds true even when N cannot be ignored in the two-locus values. The same conclusion was reached by Nei (1978) in his work on the variance of gene diversity in the drift and mutation model. Since the infinite-alleles model is appropriate here, Eq. (5) reduces to 1 9 Var’(Ni)=;(l+&(2+&(3+C)

1 ’

and for large n this expression has been given many times previously (e.g., Li and Nei, 1975). For large n, it is similar to the expression given by Nei (1978) for the variance of gene diversity. Self and Random Mating Another simple situation is provided by the case of infinite populations mating at random except for a constant proportion, s, of selling each generation. The only non-zero descent measures are now 8,) 0,) and A, = A: = OT, so that M, = HT and M,[, = H,H,!. The components of variance for populations and loci by populations are both zero, and the total variance of heterozygosity becomes Var.(Ai) = t

(

crf,, + t f~i,, . >

(f-5)

There is an obvious contrast between Eqs. (5) and (6). Equation (6) shows that the variance of heterozygosity for mixed self and random mating is decreased most by increasing the numbers of individuals sampled. Increasing the number of loci scored has relatively little effect. Although individuals sampled are expected to be inbred, they are unrelated. We have previously (Weir and Cockerham, 1973) evaluated the descent

246

WEIR,REYNOLDS, ANDDODDS TABLEIV Standard Deviation of Average Heterozygosity for Unlinked Loci in a Mixed Self and Random Mating Population at Equilibrium Infinite number of alleles Selfing rate (s)

Sample size(n)

Two equally frequent alleles Selting rate (s)

No. of loci (m)

0.25

0.50

0.75

0.90

0.95

0.25

0.50

0.75

0.90

0.95

H”

0.86

0.67

0.40

0.18

0.10

0.43

0.33

0.20

0.09

0.05

1

1 10 100

0.35 0.27 0.26

0.47 0.37 0.36

0.49 0.40 0.39

0.39 0.32 0.31

0.29 0.24 0.24

0.49 0.20 0.14

0.47 0.22 0.18

0.40 0.22 0.20

0.29 0.17 0.16

0.21 0.13 0.12

10

1 10 100

0.11 0.08 0.08

0.15 0.12 0.11

0.15 0.13 0.12

0.12 0.10 0.10

0.09 0.08 0.08

0.16 0.06 0.04

0.15 0.07 0.06

0.13 0.07 0.06

0.09 0.45 0.05

0.07 0.04 0.04

100

1 10 100

0.04 0.03 0.03

0.05 0.04 0.04

0.05 0.04 0.04

0.04 0.03 0.03

0.03 0.02 0.02

0.05 0.02 0.01

0.05 0.02 0.02

0.04 0.02 0.02

0.03 0.02 0.02

0.02 0.01 0.01

a Expected heterozygosity

at any locus,

measures for this mating scheme. At equilibrium, unlinked loci have the descent measures

e,L-

2-s’

s(2 + s) “=(2-s)(4-s)’

Some numerical values are given in Table IV for the cases of an infinite number of alleles, or two equally frequent alleles.

ESTIMATING VARIANCES

Within Populations

When inference is to be restricted to the one population sampled, the observed heterozygosities at one and two loci can be substituted into Eq. (1) to provide an estimate of the sampling variance of average heterozygosity. Between Populations

While the variance components allow prediction of the total sampling variances of heterozygosity, estimated variance components allow estimation of the total variance. This does require information on at least two

VARIANCE

247

OF HETEROZYGOSITY

populations, and for equal sample sizes from r populations and the same number of loci scored in every individual, the calculations may be set out as a split-plot analysis of variance shown in Table V. (Populations correspond to whole plot treatments and loci to split plot treatments.) The sums of squares are calculated for the indicator variables xv,, and are just the standard expressions. The sum, L, in the expected mean square for the fixed locus effects is L= --&z

I

w,-fo2.

In practice, it may be convenient to use a statistical computer package. The data on genotypes for every individual are transformed to a series of “1”s or “O”s, depending on whether a locus is heterozygous or not. The package can generate the sums of squares, and can usually provide estimates of the variance components. Ratios of mean squares should not be considered to have F distributions since the x’s are either zero or one.

TABLE V Analysis of Variance Format for Heterozygosity SOlKCC

Populations Individuals within populations Loci

Sum of squares

d.f. r-l

S-C

r(n- 1)

ss, - ss,

m-1

Loci by populations

(r-l)(m-1)

Loci by individuals within populations

r(n- l)(m-

ss,-c

I)

Expected mean square 0; p + ma:, + .a:, + ma:, 4.” + ma:,,

a;,+na;+L

ss, - ss, -ss,+c

d,,+n~;

ss, - ss, -ss, + ss,

6,

248

WEIR,

REYNOLDS,

AND

DODDS

Unbalanced Data

It is uncommon to be able to score every locus in every individual sampled, and to accommodate this unbalance we introduce a second indicator variable, Q,, for the Zth locus in the jth individual sampled from the ith population. This variable takes the value 1 if that locus is scored, and the value 0 otherwise. The average heterozygosity in the ith sample is then

and this has total variance

+&;.,(H,-M,)+x I

1 (~E;j,E,y) (H,h%)]. I /‘#I j

where a dot indicates summation over that subscript. With unbalanced data, the usual problems with analyses of variance arise. The expected mean squares given in Table V are no longer as simple and estimation of the variance components is difficult unless a computer package is used. Variation over Loci

From data taken from a single population, it is logically impossible to estimate the between-populations component of variance and hence get at the total variance of heterozygosity. A possible way around this problem is to regard the replication afforded by different loci as mimicking the variation between replicate populations. This would be appropriate if all loci were completely independent, meaning that dependencies imposed by linkage, population size, and mating scheme could all be ignored. It also assumesthat loci are random effects. One approach is simply to use the variance among the single-locus heterozygosities. However, if this between-locus sample variance is written as &, where

VARIANCE

249

OF HETEROZYGOSITY

then the quantity sL/m, instead of providing an estimate of Var,(R), a total expectation of &(sZ,/m) = Var,(A) + ,J-

,) c (ff/

has

w2

-.,(i- ,)C c COV.(~,~ H,,). I l’fl Use of s’, to estimate total variance therefore requires equal and independent heterozygosities across loci. Taking expectations within populations produces the same result as in Eq. (7), but with W replacing T throughout. Of course the values for the total variance Var,(ii) and the total covariances Cov.(A,, A,.) are different than for within populations. Another approach is to jackknife over loci. Each of the m loci can be omitted in turn, and the average heterozygosity estimated from the remaining m - 1 loci. The variance of the original estimate is estimated in the usual way for the jackknife procedure (e.g., Reynolds et al., 1983). To accommodate unbalanced data, a weighted jackknife would seem preferable, with the numbers of observations for each locus serving as weights. If fiicL, is the estimated average heterozygosity in population i obtained when locus L is omitted, then the weighted jackknife estimate of average heterozygosity is R,=mR,--

m

-

m

1 CL

N(L)iii(L)

CL N(L,

where N(L) = Cl+ L C, El/. The jackknife estimate of the variance of average heterozygosity is

In this expression N. =ci

Note that jackknifing over loci, when loci are regarded as random effects, fails within the guidelines for resampling suggested by Dodds (1986).

250

WEIR, REYNOLDS,

AND DODDS

DISCUSSION

This paper contains a method for predicting the sampling variance of heterozygosity for any regular mating system, and in this respect complements our earlier work on inbreeding (Cockerham and Weir, 1983; Weir et al., 1980) and extends the treatment of Lessard ( 1981). Expressions can be evaluated in specific situations to allow the effectsof different allocations of sampling resources to be determined. Much of the complexity in the development is a result of allowing for dependencies between loci and quantifying these dependencies with twolocus descent measures. We can often assume unlinked loci, however, and then to a high order of accuracy it is sufficient to work with one-locus measures only (e.g., Li and Nei, 1975). In these cases the variance of average heterozygosity is inversely proportional to the number of loci scored and there is no component of variance between populations. Our numerical results confirm that variances are reduced more quickly from increasing the number of loci than from increasing the number of individuals. The two-locus measures need to be retained in the mixed self and random mating case, even though the infinite population size eliminates variation between populations. Numerical work shows that there is an intermediate level of selling at which the variance is maximized. In a common situation of 10 loci scored in each of 100 individuals, the standard deviation is an appreciable fraction of the expected heterozygosity unless selfing levels are small. Caution is therefore called for if observed heterozygosity levels are to be used to infer the existence of selection in such cases.With limited resources, it is preferable to increase the number of individuals sampled rather than the number of loci scored in order to increase precision. Selling causes a dependence among frequencies at different loci, even when they are unlinked. In neither the self and random case nor the finite monoecious population case does the allelic frequency distribution have a great effect. In practice, we may as well use the infinite alleles model and equate homozygosity to inbreeding. The difference between the two extremes of an infinite number of alleles and two equally frequent alleles increases with the number of loci. The other issue addressed in this paper was that of estimating sampling variances of heterozygosity. For data from a single population, a fixed model is appropriate and variances can be estimated from the observed one- and two-locus heterozygosities. Different loci may be used as surrogates for different populations, and jackknifing over loci used to estimate the total variance that takes into account between-population variation. With data from several populations, the between-population variation can be estimated directly. The most direct approach is by estimation of com-

VARIANCE

OF HETEROZYGOSITY

251

ponents of variation from an analysis of variance for a split-plot design. The variance components clarify the relative contributions of the four sources of variation affecting average heterozygosity. The variance of heterozygosity has been used by Brown et al. (1980) to summarize linkage disequilibria within populations between several pairs of loci. They worked with random pairs of gametes from an infinite gamete pool, which is the same as working with genotypes when there is random union of gametes.In that case genotypic frequencies within populations can be replaced by products of gametic frequencies, and two-locus gametic frequencies in turn can be written in terms of allelic frequencies and gametic linkage disequilibria. For population i, at one locus, heterozygosity is the same as gene diversity, II,,= d;,. The frequency of two-locus gametes carrying alleles U, u at loci 1,1’ is written as pi,,,,;, and the associated linkage disequilibrium as D!,,,;. Then

so that

and Eq. (1) leads to Var,(A,)=$~d,(l

/

-d,,)

as given by Brown et al. (1980), except for the divisor of nm2. These authors used the statistic K defined in our notation by

K= 1 xi,/, where only one value of i and j are involved. This is why the divisor of nm2 is not needed. Although Eq. (8) does summarize all the pairwise disequilibria, it does not allow general inferences to be drawn, since the between-population variation is not taken into account. Brown and Feldman (1981) subsequently extended their work to include data from several populations. They partitioned the total variance in a way different from that presented here.

252

WEIR,

REYNOLDS,

AND

DODDS

APPENDIX

Heterozygosities are functions of genotypic frequencies. If u is an allele at locus 1, then P,,” denotes the frequency of individuals homozygous for that allele. Genes in different individuals are separated by a vertical rule, 1, so that pcI,, b,lu,denotes the frequency with which two individuals are both homozygous at locus I, although not necessarily for the same allele. Then heterozygosities for one and two individuals are H, = 1 -1

u

P,Jw

M,=1-2CP,,~+CCP,“,,,“.,“.. u u u’ Similarly, if u is an allele at locus I’, then Pkii denotes the frequency of individuals homozygous for alleles u and u. The frequency with which one individual is homozygous for allele u and another is homozygous for allele v is written as Pfi:“. ” I Then the two-locus heterozygosities are H,,, = 1 - c f’,u,u-c PI;/;,+ c 1 Pi:; u I) u II M,,, = 1 - 1 P,“,” -c P,;,; + c 1 Pgi. u I, u I

Genotypic frequencies can be written in terms of descent measures and allelic frequencies (Cockerham, 1971; Weir and Cockerham, 1974). Assuming the absence of linkage disequilibrium, PLL=fj,P,*+(l

-wP;”

P LL IL/u=bp,u+(4~,+2A,+A,-%)p:~

+2(0,+28,-6y,-2A,-A2+66,)p; +(l-20,-4H2+8y,+2A,+A~-66,;p;u p,, uu,,4,=(A*--6,) U”

P/uP/u,

+(0,-2Y,-d,+26,)P,~P,~.(P,“+P,“.)

+(1-20j-40,+8y,+2A,+A,-666,)p;p;u,, p~:~=o,P,~P,;+(e,-o,)P,P,,(P,+P,~) +(1-24+@,)PtP;; P~l:i=A:~,~p,;+(e,-A:)

+(1-2~,+A,*)p;~~~.

P,,P,;(P/,+P,;)

u’ # u

VARIANCEOFHETEROZYGOSITY

253

ACKNOWLEDGMENTS Work on this paper began while BSW was a visitor at the Center for Population and Demographic Genetics, University of Texas Health Science Center at Houston. Appreciation is extended to Dr. M. Nei and his colleagues for their hospitality. Helpful comments on the manuscript were provided by Drs. C. C. Cockerham and W. G. Hill.

REFERENCES BROWN,A. H. D., FELDMAN,M. W., AND NEVO, E. 1980. Multilocus structure of natural populations of Hordeum spontaneum, Genetics 96, 523-530. BROWN,A. H. D., AND FELDMAN,M. W. 1981. Population structure of multilocus associations, Proc. Natl. Acad. Sci. U.S.A. 78, 5913-5916. COCKERHAM, C. C. 1971. Group inbreeding and coancestry, Genetics 58, 89-104. COCKERHAM,C. C. 1971. Higher order probability functions of identity of alleles by descent, Genetics 69, 235-246.

COCKERHAM,C. C., AND WEIR, B. S. 1977. Digenic decent measures for finite populations, Genet. Rex 30, 121-147. COCKERHAM, C. C., AND WEIR, B. S. 1983.Variance in actual inbreeding, Theor. Pop. Biol. 23, , 85-109. COCKERHAM, C. C., AND WEIR, B. S. 1986. Estimation of inbreeding parameters in a stratified population, Ann. Hum. Genet. 50, 271-281. DEL CASTILLO,F., JIMENEZ,J., AND MEDINA, J. R. 1986. Some comments on the variance of heterozygosity in finite populations, J. Theor. Biol. 119, 103-106. DODDS,K. G. 1986. “Resampling Methods in Genetics and the Effect of Family Structure in Genetic Data,” Ph. D. Thesis, Department of Statistics, North Carolina State University. LESSARD,S. 1981. Is the between-population variance negligible in the total variance of heterozygosity? Case of a finite number of loci subject to the infinite-allele model in finite monoecious populations, Theor. Pop. Biol. 20, 394410. LI, W.-H., AND NEI, M. 1975. Drift variances of heterozygosity and genetic distance in transient states, Genef. Res. 25, 229-248. NEI, M. 1978.Estimation of average heterozygosity and genetic distance from a small number of individuals, Genetics 89, 583-590. REYNOLDS,J., WEIR, B. S., AND COCKERHAM,C. C. 1983. Estimation of the coancestry coefficient: Basis for a short-term genetic distance, Genetics 105, 767-779. WEIR, B. S. 1989. Sampling properties of gene diversity, in “Plant Population Genetics, Breeding and Genetic Resources” (A. H. D. Brown, M. T. Clegg, A. L. Kahler, and B. S. Weir, Eds.), Sinauer, Sunderland, MA, pp. 2342. WEIR, B. S., AVERY, P. J., AND HILL, W. G. 1980. Effect of mating structure on variation in inbreeding, Theor. Pop. Biol. 18, 39&429. WEIR, B. S., AND COCKERHAM, C. C. 1969. Group inbreeding with two linked loci, Genetics 63, 71l-742. WEIR, B. S., AND COCKERHAM, C. C. 1973. Mixed self and random mating at two loci, Gener. Res. 21, 247-262.

WEIR, B. S., AND COCKERHAM,C. C. 1974. Behavior of pairs of loci in finite monoecious populations, Theor. Pop. Biol. 6, 323-354. WEIR, B. S., AND HILL, W. G. 1980. Effect of mating structure on variation in linkage disequilibrium, Genetics 96, 477488.

The variance of sample heterozygosity.

The variance of sample heterozygosity, averaged over several loci, is studied in a variety of situations. The variance depends on the sampling implici...
870KB Sizes 0 Downloads 0 Views