THEORRTICAL

POPULATION

BIOLOGY

166-183 (1975)

8,

The Covariance

of Relatives

a Random

Mating

I. M. R. Statistical

Laboratory,

VAN

Derived

AARDE’

Iowa State University,

Received

December

from

Population*

Ames, Iowa 50010

1, 1974

Attention is drawn to errors common in the derivation of forms for the genotypic covariance of noninbred relatives from a Hardy-Weinberg population of diploids. A synthesis of Fisher’s least-squares method of partitioning the genotypic variance and MalCcot’s probability method of expressing kinship, yields a general form. For one locus, the form is (Pm + PSd + P68 + Pm) &a” + (P,*P&z + P,,Pd.) 5?, where uoa2is the additive genetic variance, ud2 is the variance of dominance deviations, Pzj is the probability that parental gamete i is identical by descent to parental gamete j, i = s, d indexes the parents of one relative, and j = s, d indexes those of the other. The form provides a framework for obtaining the covariance of relatives from an equilibrium population with linkage.

INTRODUCTION

Numerous workers have contributed toward the development of general forms expressing the covariance of noninbred relatives from a Hardy-Weinberg population in terms of measures of their kinship and the genotypic variability. Fisher (1918) demonstrated (for simple kinships) the possibility of expressing the covariances in terms of general measures of variability not based on any assumptions about gene effects or frequencies. On the other hand, Wright (1921) demonstrated (for simple gene effec ) the possibility of expressing the covariances in terms of a general measure of kinship. He measured kinship and inbreeding (which is the kinship of parents) as correlations between additive scores, a device that (though useful for his immediate purposes) obscured the exact meaning of the measures. MalCcot (1948) gave an exact meaning to * Journal

Paper No.

J-8046 of the Agriculture

and Home

Station, Ames, Iowa. Project No. 1669. The research reported the National Institutes of Health, Grant No. 13827. + Present address: Department of Biometry, University 7600, Republic of South Africa.

166 Copyright All rights

0 1975 by Academic Press, of reproduction in any form

Inc. reserved.

Economics Experiment here was supported by

of Stellenbosch,

Stellenbosch

COVARIANCE OF RELATIVES

167

Wright’s measure of inbreeding and to an appropriately modified measure of kinship. He defined each measure as the probability that a pair of homologous genes are identical by descent. To measure the inbreeding of an individual, the pair is its genotype. To measure the kinship of two individuals, the pair is the genotype of their offspring should they be mated. The two lines of development have gradually merged into a single theory. This, however, has not been accomplished by deliberate synthesis. Consequently, certain fallacies have become common belief. The fallacies arise from a form given by Malecot (1948, 1968) w h erein the covariance of certain kinds of relatives is expressed as (+ + 4) i%z” + W’> Ud2,

(1)

ua2being the additive genetic variance, od2 the variance of dominance deviations, and $ and 4’ constants determined by the particular relationship. Contrary to common belief, the form is not a general form for “collateral” relatives. Furthermore, Maltcot did not identify 4 and (b’ as measures of kinship; he identified them as correlation coefficients. The purpose of this paper is to exhibit the correct synthesis of Fisher’s measures of variability (as modified and extended by Cockerham, (1952, 1954, and Kempthorne, 1954) and Wright’s measure of kinship (as modified and explicated by Malecot). The synthesis provides a framework for obtaining the covariances with linkage. The population is assumed to be in linkage equilibrium. It is assumed that there are no position effects on genotypic values (these being arbitrary otherwise) and that gene frequencies are constant. DERIVATION OF A SYNTHETIC FORM

Let random individuals be drawn from a Hardy-Weinberg population of ancestors, and let matings (and clones) be made so as to derive (without selection of any sort) a pair of noninbred relatives X and Y. Although only monoecious diploid species will be considered, it is convenient to identify pollen-parents called “sires” and seed-parents called “dams.” The identification is purely nominal. No sex-linked transmission is implied. There may be any number of alleles, with any frequencies, at each of any number of loci. For the moment, however, it is sufficient to consider a single locus. Let (X,)(Q) be the genotype of X, (x3) being the “sire” gene and (xd) the “dam” gene. Similarly, (ys)(yd) d enotes the genotype of Y. Then, by assumption, (xJ and (xd) are copies of different and independently random ancestral genes, and so are (yJ and (yd). Consequently (from well-known results) the genotypic covariance of X and Y may be expressed as a sum of two terms,

cw(4)

+ ((%))r ((YJ) + ((YdNl + covK~4c%>>~((YdYdNl,

168

I. M. R. VAN AARDE

wherein a gene is bracketed to denote its additive effect and two genes are bracketed to denote their dominance deviation. The first term is

where

P,, , P,, , Pds, and Pdddenote the probabilities

of the four events

and bd = WI, respectively

(identity

being by descent). The second term is

Jws)(%) = (Ys)(Ydl Qd2> where the probability will now be shown to depend on the four probabilities previously introduced. Consider Fig. (1) where each of the four lines connect two genes that may be identical by descent, either because one arose as a copy of the other, or because both arose as copies of a prior gene. Copies may arise through meiosis or mitosis (as in the case of a clone).

FIG. 1.

cx,,

(Y, 1

(Y(j)

(XJ

Each of the four lines connect two genes which may be copies of the same gene.

If we let any one line connect genes that are identical by descent, the perpendiculars must connect genes that differ by descent because X and Y are noninbred. Hence, the possible ways in which the event [(x,)(xJ = (y&J] can take place are exhausted by two mutually exclusive cases,

K%) FE(Ys), cd = (Yd)l, and

[@s)= (Yd), 6%) = (YJl* We shall now prove that each of the two cases consists (or may formally be regarded to consist) of a pair of independent events. Hence, we shall prove that the genotypic covariance of X and Y takes the synthetic form

(Ps.3+ P,d + Pds+ Pdd)s%z2+ (P,,Pd, + PsdPdJUd2.

(59

COVARIANCE

169

OF RELATIVES

The proof is in three parts as follows:

(9

Possibly the relatives are monozygotic (members of a clone). Then, p,, = 1,

ps, = 0,

p,, = 0,

and

p,, = 1,

where we have chosen to let the indices “s” and “d” refer to the original zygote. Hence, (2) is formally applicable. (ii)

Possibly the relatives are not monozygotic, and one (X say) is an ancestor of the other. Since Y is noninbred P,, and Pdd vanish if X is a “paternal” ancestor and P,, and Pds vanish if X is a “maternal” ancestor. In both cases P,,P,, + PsdPds vanishes. Hence, (2) is again formally applicable.

(iii)

The only remaining possibility is that the relatives not monozygotic but collateral (meaning that one is not the ancestor of the other). This does not exclude the possibility that they may have monozygotic ancestors. Furthermore, X and one or more of its ancestors may be monozygotic, in which case we may interpret “X” as the original zygote. Similarly, “Y” may be interpreted as an original zygote. Hence, the two mutually exclusive and exhaustive cases, [W = (YSL 6%) = (Yd)l, and k%) = (Ydh 6%) = (Y,)l% are of the nature indicated in Fig. (2) where X, and Y, are the “sires,” and X, and Yd are the “dams.” The double-headed arrows represent chains of related individuals that lead from common ancestors to related parents (or simply connect parental symbols that represent one individual). The single-headed arrows represent the terminal gametic transmissions. The nature of the event [(xJ = (yJ, (xd) = (yd)] is indicated by the left-hand diagram in Fig. (2). There are two pools of common ancestors, and no individual can be a member of both pools (because X and Y are noninbred). The copyings of genes in different pools are independent, and the copyings terminate in two independent pairs of gametic transmissions. Hence, the event [(xJ = (yJ, (Q) = (yd)] consists of two independent events, ks> = (YJI

and

[W

= (Yd)l.

Similarly, the event [(xs) = (yd), (xd) = (ys)] consists of two independent events, as indicated by the right-hand diagram in Fig. (2). Hence, (2) is applicable.

170

I.

x-v

1

(y,)

R. VAN

AARDE

y’ S

S

XS

1

i

(Y,)

(x,)

(YS

(Y,)

( ‘d

S

(XJ

M.

i r

(x,1

tt

t

b yd

yd dd

X %

FIG. 2. The two ways in which the genes of a noninbred individual can both be identical by descent to those of a noninbred relative. Single-head arrows connect genes to parental sources. Double-headed arrows indicate common sources.

Henceforth, when we describe two individuals as “ancestral relatives” or “collateral relatives,” we tacitly assume that they are not monozygotic. In other words, we imply that they arose from separate gametic unions. If X and Y are collateral relatives, P,, , P,, , Pds , and Pdd can be interpreted as coefficients of “kinship.” (Some writers prefer the term “coancestry.” Others prefer “parentage.“) The covariance is then (rss + rsd + ras + Ydd)iba2 + (rssrdd + rdds) Ud29

(3)

where rss, rsdp rdsy and rdd are the coefficients of kinship of the the the the

“sire” “sire” “dam” “dam”

of X and “sire” of Y, of X and “dam” of Y, of X and “sire” of Y, and of X and “dam” of Y,

respectively. If the relationship between X and Y is ancestral, however, form (3) is not applicable. The most convenient form for ancestral covariances is simply (4rd Qua2,

(4)

where rzV (the coefficient of kinship of X and Y) is given by &(Pss + I’,, + PdS + Pdd).

171

COVARIANCE OF RELATIVES RELATION TO FORMS IN THE LITERATURE

Malecot (1948) obtained (4) as a form for the covariance of noninbred relatives with uncorrelated dominance deviations. This is correct because (4) is just the first term of (2). Consequently, either (3) or (4) may be used to obtain the covariance of certain kinds of collateral relatives (such as half-sibs or uncle and nephew). With regard to other kinds of collateral relatives, however, Malecot (1948) erroneously concluded that the additive effect of a gene of one relative cannot be correlated with the additive effects of both genes of the other relative. For example, he ruled out the possibility of ((x8)) b ein g correlated with both ((yJ) and ((yd)), apparently because he believed that this implies Y is inbred. Consequently, he obtained a special case of (3), given by (5) Form (5) arises when “sires” may be related, “dams” may be related, but “sires” are unrelated to “dams.” For example, the covariance of full sibs can be obtained from either (3) or (5) but the covariance of quadruple half-first cousins (see Fig. 3) cannot be obtained from (5). Clearly (5) is just a redundant special case of (3). Malecot (1948, 1968) expresses (5) as in (1) where + = rSS and 4’ = rdd. The expression has been a source of confusion: The general form was obtained, somewhat less explicitly than in (2), by Kempthorne (1955a, b), but apparently Kempthorne (1955a) at first believed that he had merely confirmed Malecot’s formulae. Malecot (1948, 1968) expresses (4) as c iPa2,

(6)

where it is not obvious that “4” may have different meanings in (1) and (6).

“\A!

W

--txs

x,--x \

ZA FIG. 3. The pedigree parent to offspring.

ys

I

” r Yd of quadruple half-first

=Y cousins,

X and Y. Arrows

connect

172

I.

M.

R. VAN

AARDE

Kempthorne (1957) later showed how (1) arises as a special case of his form corresponding to (2). Nevertheless, Gallais (1970) erroneously presents (1) as a general form for noninbred relatives. The same error is committed by LeRoy (1960) and Pirchner (1969) who misapply the form to obtain (by incorrect calculations) the correct results for the covariance of monozygotic twins, of parent and offspring, and of uncle and nephew. Schnell (1961b) and Turner and Young (1969) erroneously present (1) as a general form for noninbred collateral relatives. Expressing the covariances in terms of coefficients of kinship, is a logical development. Forms (2), (3), and (4) show that this is easily accomplished, and they are the only forms required. The forms were obtained by this writer in an unpublished manuscript (van Aarde, 1963) and are implicit in the presentation given by Crow and Kimura (1970). The discussion so far has been limited to the case of one locus. Fisher (1918) indicated that the theory could be extended to account for contributions from epistacy, but suitable measures for this purpose were only developed much later by Cockerham (1952,1954) and Kempthorne (1954). Cockerham introduced the concepts additive x additive, additive x dominance, dominance x dominance, and so on, to measure such contributions. He confirmed Fisher’s results in the case of two alleles per locus for the covariance of certain relatives and obtained some new results. Kempthorne devised an ingenious formal algebra to deal with an arbitrary number of alleles per locus and obtained a generalization of (5). The methods yield generalizations of (2), (3), and (4). Detailed descriptions are given by Kempthorne (1955b, 1957). Kempthorne works in terms of rZy and a quantity “Use” which in our case is equal to PssPdd + PsdPds. The generalized form of (2) for the covariance of any pair of noninbred relatives is (7) where the prime indicates summation over all pairs of nonnegative integers (u, V) such that 1 < u + v < 1z(the number of loci) and where (T$~~is the sum of the variances of mixed “additive x dominance” deviations that involve disjoint sets of u loci and z, loci, respectively. The generalized form of (3) for the covariance of noninbred collateral relatives is

and the generalized form of (4) for ancestral covariances is (9) which contains no terms involving dominance deviations.

COVARIANCE OF RELATIVES

173

The discussion so far has been limited to the case of independent assortment. In subsequent sections generalizations of (7) and (9) are derived for equilibrium

populations with linkage. Schnell (1961b) d erived such a generalization of the form

which arises (with independent assortment) from Maltcot’s form corresponding to (5). Form (10) is of course redundant, being superseded by (8). By a curious coincidence, however, Schnell’s corresponding form for linked loci is well worth retaining because generalization of (8) is by no means as straightforward as generalization of (10).

DERIVATION OF FORMS FOR LINKED LOCI

In subsequent sections forms for equilibrium populations with linkage are derived by means of a method based on an algebraic representation of events. We shall now explain the method, for which purpose it is sufficient to consider the simple case of a pair of noninbred collateral relatives that are related through only two parents. We may identify the related parents as the “sires.” Then we wish to find the covariance of relatives that satisfy the following condition: Condition (I’). “Sires” may be related, but parents are unrelated otherwise. With no linkage, the covariance is given by

Recall that & is the sum of the variances of pure “additive x additive x additive x a--” deviations that involve u loci. For example,

where u2ahaiai is the variance of the “additive x additive x additive” deviation that involves loci h, i, and j. The coefficient of oEhaia,is given by the probability ps.3 x pss x ps, = (PJ” because of independent assortment. To account for linkage, we must replace the probability with

P[(xs”> = (Ys”), (xi) = (Ysi), es9 = (Yd)l,

174

I.

M.

R. VAN

AARDE

where the superscripts index the loci. The compound event

may not consist of three independent events and (consequently) the product rule for the probability of independent events is no longer applicable. We may, however, represent the event by the formal product where

A = {h, i,j}.

E{, represents a simple event. Its superscript indicates the locus at which the event may take place; its subscripts indicate the nature of the event. We now follow Schnell (1961a, b) who defined the generalized coefficient of kinship of two individuals with respect to a set of loci, A, as the probability that a random gamete from the one individual is identical by descent to a random gamete from the other, at every locus in A. Hence, the probability of the compound event

is the generalized coefficient of kinship of the sires, ry (say). Hence, the covariance of relatives that satisfy Condition (I’) is given by

where the summation is over all subsets of the set of all loci, N, the number of loci in A is a, and we make notations of the type uq2

=

CT2 qpia,a*

>

for

A = {h, i, j, k}.

In more general cases, we will consider form (7) and make notations of the type 2

“AD

=

2 “a,aid,dkd,,,

7

where

A = {h, i}

and

D = {j, k, m>.

To account for linkage, the coefficient of (8)” u:D must be replaced by the probability of a compound event that may be represented by a formal product of the type n (Ei + Ei, + &is + Eid) 17 (E:&, fEA

+ E:&:,)-

w

iED

When the product is formally expanded, we obtain a sum of terms that corre-

175

COVARIANCE OF RELATIVES

spond to various mutually exclusive cases. Impossible events may be replaced by zeros. For example, when Condition (I’) is satisfied, (12) becomes

which represents an impossible event unless D = o (the empty set). Hence, we consider the coefficient of (+)” ~~2, which is given by the probability of

fEA

Hence, we again obtain (11).

EVALUATION

OF PROBABILITIES

Consider a pair of half-sibs from a noninbred common “sire.” Condition (I’) is satisfied. Hence, from (1 I), their covariance is

covpq = c 4?‘(S)”UA2, ACN

where A(21 denotes the generalized coefficient of kinship of a noninbred indiA vidual with itself. Similarly, let a half-uncle and his “paternal” half-nephew arise from a noninbred common ancestor. Then their covariance is

where Ay’ denotes the probability that an individual transmits intact to its offspring the entire set of genes that it received from one of its parents with respect to A. The value of ry in this case, Ay)A($, arises from the product rule for the probability of independent events. To evaluate probabilities like (1:’ and A’$‘, Schnell (1961a) introduced linkage parameters of the type A, = 1 - 2,~ where pij is the recombination rate between loci i and j. The parameters range from zero (with independent assortment) to unity (with complete linkage). Complementary meiotic products are assumed to occur with equal frequency. Then (from reasoning that will be given subsequently) one obtains

and

ACk) (1 + hk12 ) , 12 = (1)” 2 4%

=

(4)"

(1

+

$2

+

A$

+

$3),

(13)

176

I.

M.

R. VAN

AARDE

for k = 1,2. From extensions of the reasoning one deduces that

Cov(HS)= (4)”c uzi+ (4)”c (1 + gj, &z, + ... , z id and Cov(HU, HN) = (&)“C u”a,+ ($)” 1 (1 + Q(l i i-cj

+ h,j) 02aiaj+ .‘. *

To obtain the extension, Schnell defined general linkage parameters such that forms like (13) arise for any number of loci. We shah now describe a method of obtaining the extension. Subsequently we show how the method can be used to evaluate the various probabilities required to measure kinships or covariances. The method seems to be simpler than corresponding methods used by Schnell(1961a, b) and that is the reason for describing it. Let (A, , A,) be any ordered partition, including the partitions (0, A) and (A, a), of the set of loci A, where A C N. Let the frequency with which an individual, X, produces a gamete of type ,i

(14)

(xsi) rI Wh iEA, 0

be represented by the formal product (15)

,rJ Y: E Y2. 0 1

The frequencies for a subset of the given set, A, may be obtained by operating directly on the representation with the summation rule ysk + ydk = 1 (unity)

for any k,

provided we take care to keep distinct the representations of different frequencies. For every set A we now define a value, Aa , given by

where the expression must be expanded completely, and frequencies inserted in place of representations. This reduces to the known Xij’s if A = (i,j}. Since complementary meiotic products are assumed to occur with equal frequency, we may interchange the indices “s” and “d” provided this is done for all loci. Hence, AA is also given by the expansion of

COVARIANCE

177

OF RELATIVES

It follows that AA is identically zero when the number of loci in A is odd; a parameter arises from every pair of loci, every quadruplet of loci, every sextuplet of loci, and so on. In order to express an arbitrary gametic frequency in terms of Schnell parameters, we note that our rule (ys” + ydk) = 1 implies that representation (15) is algebraically identical to

ig ML1 + (rsi - Ylal l-I 6x1 - (rd - Yd91. @A, 0 Hence, (15) may be replaced by the representation jG Ml 1

iG (4x1 +u 0

-u,

(16)

where (16) must be expanded completely, and every product of the type L,L,L, ... must be replaced by the corresponding value, hijle ... (which is zero if the number of subscripts is odd). In order to express A:’ in terms of Schnell parameters, we note from (15) and (16) that we need simply consider

z (6X1+ Li),

(17)

make the formal expansion, and identify the products that are x’s and the products that vanish. In order to express A?’ in terms of Schnell parameters, we note that, since complementary meiotic products occur with equal frequency, (16) must be identical to iG (+X1 0

Li)

jE (t)(l + Lj). 1

Hence, the probability that X produces two gametes like (14) is given by iG w 0

(1 + JiN

- K) $5 (4)” (1 - J&l 1

+ KJ

w-9

where we have taken L = J, K to keep distinct the representations of two different frequencies. To obtain A?‘, we must sum (18) over all ordered partitions (A,, A,) of A. Then we obtain

E (4)”[Cl + Ji)(l - Ki) + (l - Ji>(l + &)I = jJ (*X1 - IiK>. We may replace (1 - J&J with (1 + ~iiKi) b ecause any product containing an odd number of J’s or an odd number of K’s must vanish. Hence, Ay’ is given by (19) PA wu + 4%

178

I. M. R. VAN AARDE

where we must make the formal expansion, and identify the products that are squares of x’s and the products that vanish. From (17) and (19) we find, for example, that

for k = 1,2. Two more examples of the use of representation (16) will be found in a subsequent section. It can be seen directly from the definition of A, that 0 < A, < 1 for all A (except for biologically aberrant possibilities). The parameters with more than two subscripts, however, are difficult to interpret. (See Hayman, 1962.)

THE

MALQCOT-SCHNELL

FORM

With independent assortment, (10) is a general form for the covariance of relatives that satisfy the following condition (which supersedes (I’)): Condition (I). “Sires” may be related, and “dams” may be related but parents are unrelated otherwise. By expressing (10) in a form corresponding to (7), the coefficient of (i)” gin is recognized as the probability of the compound event represented by

where impossible events are represented by zeros. When the representation is formally expanded, we obtain a sum of terms that correspond to various mutually exclusive cases as follows: A ;,, 0’ I-

(J+D E~dtfE~+D ELd)r O 1

where the double prime indicates summation over all ordered partitions (A, , A,) of A, including (A, a) and ( D , A). Condition (I) implies that any event referring only to “sire” genes is independent of any event referring only to “dam” genes. Hence, the compound events represented by ieJ+D E4 0

and

(,.g+, 1

4

are independent. Hence, the coefficient of (*)” o:D is

where the r? symbol represents the generalized coefficient of kinship of the

COVARIANCE OF RELATIVES

179

sires with respect to A, + D, and the r” symbol has a similar meaning for dams. In the manipulations, formal multiplication over the empty set must be interpreted as unity. Hence, it is convenient to define the coefficient of kinship of any two individuals with respect to 0, as unity. In this way Schnell (1961b) generalized Malecot’s form for the covariance of relatives which satisfy (I). His result takes the simple form

2NA 9 - 0’2‘4 I-

yg+Dy~+D(w &D ,

(20)

where the single prime indicates summation over disjoint subsets. (The term m u:, vanishes.) Consider, for example, a pair of full-sibs from noninbred parents. Condition (I) is satisfied. Hence, their covariance is Cov(FS) =

C’ AsDEN

C”

Aj4;+&lj4:)tD(ga 02AD.

A,.A,CA

With independent assortment, we have found (10) redundant. With linkage, however, the corresponding form (20) represents a considerable simplification. Schnell(1961b). mcorrectly regarded (I) as a necessary and sufficient condition for two relatives to be noninbred and collateral. For this reason, he made (l(l) the starting point of his development. In the absence of analytical expressions like (7) and (12), h owever, it is not clear how to proceed systematically. In the event, Schnell (1961b) confined the rest of his development to a derivation of the covariance of an ancestor and its kth generation offspring from random mating. He obtained the form

Cov(A, 0,) = c plyy (5)”OA2. ACN

The covariance of parent and first generation offspring is independent of linkage.

A FORM FOR ANCESTRAL AND SOME COLLATERAL RELATIVES We shall now derive a form corresponding to (9), for equilibrium populations with linkage. The precise condition under which (9) yields the covariance of noninbred relatives is the following: Condition (II). One parent (say the “dam” of Y) is unrelated to the other parents. By expressing (9) in a form corresponding to (7) we see that D is empty, and the coefficient of (+)” aA is the probability of the event fl (J% + 0 + 6.q + 0). @A

180

I. M. R. VAN AARDE

Hence, the covariance of noninbred relatives that satisfy (II) takes the form (22) where the coefficient of (3)” uA2 is the sum of the probabilities mutually exclusive cases. Consider, for example, an uncle (from parents) and his “paternal” nephew. Condition (I) is not satisfied, satisfied. Hence, from (22) the coefficient of (# uA2 is given by probabilities like .

of various noninbred but (II) is a sum of

(23) where the first factor is the probability that the uncle’s sire transmits identical genes to the uncle and the nephew’s sire for all loci in A, , the second factor is the probability that the uncle’s dam transmits identical genes to the uncle and the nephew’s sire for all loci in A, , and the third factor is a representation (given by (16)) of the probability that the A, contribution of the uncle’s sire is recombined with the A, contribution of the uncle’s dam and transmitted to the nephew. Hence, the covariance of uncle and nephew is

+ (Q6 1 [ 1 + &(l i

The covariance of relatives derived from a random mating population.

THEORRTICAL POPULATION BIOLOGY 166-183 (1975) 8, The Covariance of Relatives a Random Mating I. M. R. Statistical Laboratory, VAN Derived...
869KB Sizes 0 Downloads 0 Views