Ann. Hum. Uenet., Lond. (1975), 38, 355 Printed in &eat Britain

355

Bias in estimating the frequency of incest BY VICTOR SISKIND

Statistical Consultant, Boston University Medical Center, 400 Totten Pond Road, Waltham, Mass. 02154 In a recent paper, MacLean & Adams (1973) have shown how, given a large random sample of mother-child pairs with known phenotypes with respect to one or more genetic markers, some effectsof undisclosed sibling incest might be studied. An early step on their method is the estimation by maximum likelihood of the proportion of such incestuous matings in the population from which the sample is drawn, a parameter they designate ‘v’. To this end one needs two motherchild transition matrices, viz. the matrix of conditional probabilities of child’s phenotype given the mother’s and, firstly, that the father is drawn a t random from the population at large - the ‘T-matrix’-and secondly that the father is the mother’s brother - the ‘X-matrix’. One characteristic of the X-matrix is that the entries in the homozygote-homozygote cells therein are larger than in the corresponding cells in the T-matrix. The estimation procedure may be affected by, amongst others, two factors not discussed by the authors andone which they do mention in passing. I n the first place blood-type incompatibilities, in particular those in the Rh and ABO systems, will increase the frequency of the recessiverecessive combination and may in this way bias the estimate of v. Secondly, it may happen, especially in a country of immigration like the U.S.A. , that an apparently homogeneous population is in fact stratified into several ethnic groups with each its own set of gene frequencies, whose members tend to mate amongst themselves. The resulting mother-child frequency matrix will then be the weighted mean of the component matrices, and the Holder inequality ensures that the entries in the homozygote-homozygote cells would again be greater than their analogues in the postulated single-population frequency matrix. MacLean & Adams claim that the presence of milder forms of inbreeding in the population under study will not ‘introduce bias into the results’. This remark presumably applies to comparisons between the incest and non-incest portions of the sample, rather than to the estimation of v: as might be expected, the effects of ethnic stratification and of inbreeding on the motherchild matrix are broadly similar. The rather crude approximative approach developed below is designed to give some quantitative insight into the practical importance of these factors. Numerical computations based on a more exact formula provide confirmation of the conclusions so reached.

APPROXIMATING THE ESTIMATE OF INCEST FREQUENCY

Only one genetic marker will be assumed to be present in the analysis that follows-the extension to the more general case will be obvious. Let D = {d,} = {xi3- ti,} = X - T,where, as mentioned in the previous section, xi, and ti, are the conditional probabilities, in the presence and absence of sibling incest, respectively, that the child is of phenotypej, given that the mother is of phenotype i ; we write nii for the number of

VICTOR SISKIND

356

mother-child pairs with this combination of phenotypes, with n =

c njj. As shown by MacLean ij

& Adams, the maximum likelihood estimate of the proportion of sibling incest, 8, is the root in v of

cf ntj d i f / ( t i+f Vdij)

i

=

0.

(1)

Now if v is small, and none of the quantities ldijl/tifare much greater than unity, (1) may be written nij{dtf/tfj - V ( d i j / t i f ) 2 }+ O(v2) = 0,

x

i j

whence, to the first order of approximation,

We note in passing that the numerator in (2) may well be negative, as may the root of (1); in such an event, one would of course take v to be zero. If the model is correct and E(nij)= nfi(tir+ v d i f ) ,where fiis the frequency of mother's phenotype i, then since dif = 0,

xi

W )= Cfi ci (dfj/ti,)/Zfi cj {dfjltij +vd&/t%J i i =v

to this order of approximation.

This approach leads in later sections to fairly simple but rather approximate formulae, whose adequacy can be checked by a more sophisticated result. For it can easily be shown that there is at most one real number, 8,in the interval [0,1] satisfying

c c , q n i f )dij/(tij+ mif)= 0. i

f

A standard Taylor-expansion analysis and straightforward algebra proves that if such a 0 does exist, E(0) = O+O(n-2). (4) INCOMPATIBILITIES

The proponents of the above method of estimation point out that sibling incest is most profitably studied among young primiparae. In such a population Rh incompatibility is of little consequence and in any case its manifestations, if they occur at all, are usually overt and can be allowed for in the analysis. O n the other hand, there is evidence that among mothers with type 0 blood, a deficit-due to foetal loss, stillbirth, etc. -of non-0 children occurs, although the exact extent of the shortfall is in dispute (Levene & Rosenfield, 1961). The question is of considerable complexity and other incompatibilities may also be involved (see, for example, Takano & Miller, 1972 and references therein). Fortunately, since orders of magnitude are all that are required, it should be enough for present purposes to consider the consequences of a very simple model involving only the offspring of type 0 mothers. The 2'-matrix for the ABO system is in effect given by Li (1955, p. 50); the X-matrix may be

Bias in estimating the frequency of incest

357

derived following the procedure laid out by MacLean and Adams, suitably modified. For reference, twice the difference between them is given: 2D = A B AB 0 A q + r(r - * ) / ( P+ 2r) - P / ( P + 2r) - Q(1 - d / ( P + 2r) - r(r - S)/(P+ 21.1 B P + r(r - S)/(!l+ 2r) -P( 1 - P ) k + 2r) -Pr/(q+ 2r) - r(r - &)/(a+ 2r) AB - ( 1 - 2q)/4 - ( 1 - 2P)/4 r/2 0 0 -P -Q 0 1-r Here p , q, r are the frequencies of the A, B, 0 alleles respectively. Assume now that no incest is present; in the absence of incompatibility type 0 mothers will produce children of types A, B and 0 in the ratio p :q : r. The effect of incompatibility is here represented by altering the ratio to ( p- E ) :(q - E ) : (r + 28). I n other words, if nal (n42,n44respectively) is the number of A (B, 0) children born to type 0 mothers in the sample, then E(n4Jn) = r 2 ( p- E ) , Substituting in ( 3 )one obtains the bias at v = 0 :

xi

E(8) A Er/{xfi @,/ti, i

etc.

+ (1 - 2r)€/a}.

For example, take p = 0.24, q = 0-08, r = 0.68 (values close to those observed in some North American white communities); if the deficit of non-0 children born to 0 mothers is 5 per cent i.e. E = 0*025(p+ q ) = 0-008,the bias is about 0.061;if it is 10 per cent the bias rises to 0.124. The inclusion of additional markers in the estimation formula would increase the denominator but not the numerator, so that the bias would become smaller.

ETHNIC STRATIFICATION

Here too a simple model, this time for a single codominant locus, may suffice to illustrate how the existence of several more-or-less endogamous ethnic groups could bias the estimate of the frequency of incest. Thus assume that the population from which the sample is drawn is made up of two equally large subpopulations, closed as regards mating, with gene frequencies p1 = p + a and p 2 = p - a.The sample contains approximately equal numbers from each group. The overall , this would be the value used to calculate the matrices gene frequency is p = ( p 1 + p 2 ) / 2 and T,X and D, given by MacLean & Adams, whereas in fact = E(n,,/n) E(nll/n)= p 3 + 3a2p, E(n12/n)

+

= p2q a2(q- 2p),

etc.

Assume further that v = 0, and t h a t p is close to neither 0 nor 1 (or, equivalently, that no ldijl/tii is much larger than unity). Substituting into ( 3 ) we have ~ ( 8 =) 4 4 p q + o(a4).

(5)

I n a like manner one can compute the relevant quantities in the case of dominance. As a numerical example let us take p = 0.6 and p = 0.4. By symmetry, the bias will be the same for both values in the case of codominance: a 0.02 0-05 0.10

E(0) ... Codominance 0.0067 0-041 0.159

Dominance ( p = 0.6) 0.0064 0.039 0.145

Dominance ( p = 0.4); 0.0069 0,043 0.173

VICTORSISKIND

358

Computations based on equation (4) indicate firstly that the crude formula ( 5 ) performs reasonably well over a fairly wide range of values of p and a,and secondly that the bias decreases as v increases-p and a being held constant - but not by much; between v = 0 and v = 0.25 the drop is rarely as much as 25 yo.The conclusions in the case of dominance are broadly similar. More generally, suppose that the population is composed of k such closed subpopulations, the ith of which contributes a proportion wito the whole and has gene frequency pi = p +ai;writing sh for C w ia;,we have firstly that if the overall gene frequency isp, s1 = 0 and, in the codominance i

case, E(n,,/n) = p8+3ps, = s3,E(n,,/n)= p2q+ (q- 2p)s2-s3,etc. The numerator in (3) becomes {(4pq+ l)s,+ (q-p)s3}/4pq which, since clearly lssl < s2, is always positive-as is of course the denominator. Thus under these conditions a t least, ethnic stratification will tend to bias the estimate of v in the positive direction. INBREEDING

If the coefficient of inbreeding in the population is somehow maintained at a fixed value, F, and sibling incest is not present to any noticeable extent, the mother-child frequency matrix, i.e. the matrix {E(n,,/n)},is for a codominant locus easily derived from the definition of F (Li, 1955, chap. XI): AA Aa UU P d 1 -PI (P+P!7) 0 AA P(P+Fda Aa Pd1- 3)(P+ P d Pd1Pd1- F)(a+ PP) aa 0 P d l - P)(a+ FP) d a+

m2

Once again this is substituted into (3), yielding, for P small and p not far from 0.5, E(8)= P/(pq+ $) O(F2).Here too the crude result turns out to be of the correct order of magnitude over a rather wider range of parameter values, and for both dominance and codominance.

+

DISCUSSION

It should first of all be noted that in all the above examples the samples would have had to be very large before there was much chance of detecting a departure from assumptions by means of the usual tests. Secondly, wherever checked by a more exact formula, the approximative approach has proved to indicate quite adequately the size of the bias even when the parameter values involved were well outside the region in which this approach could be justified; there is no reason to suppose that the unconfirmed cases would be any different. All three factors examined- ABO incompatibility, ethnic stratification and a slightly inbred population-would tend to cause the proportion of sibling incest in a sample of mother-child pairs to be overestimated. The first of these is likely to be most pertinent, since ABO blood-typing is very widely done. Precisely how and to what extent incompatibility manifests itself is not well understood, but from results obtained above, one can see that it could be a serious source of bias although if many other genetic markers are available for use in the estimation formula, its importance will decrease. The other two factors may be less frequently encountered in practice. If the population is made up of several ethnic groups the bias will depend on their genetic heterogeneity and on the extent to which their members marry amongst themselves. When the population being studied is to

Bias in estimating the frequency of incest

359

a minor extent inbred, the bias is roughly proportional to the coefficient of inbreeding, F, and may be as much as twice or three times as large even if several genetic markers are used. The careful investigator should ignore none of these three factors. I wish to thank Drs Olli P. Heinonen and Ernest B. Hook for suggestions and comments. Research partly supported by National Institute of Neurological Diseases and Stroke Contract No. NIH-NINDS-72-2322. REFERENCES LEVENE,H. & ROSENFIELD, R. E. (1961). ABO incompatibility. In Progrem in Medical Genetics (ed. A. G. Steinberg). LI, C. C. (1965). Population Genetics. Chicago: University Press. MACLEAN, C. J. t ADAMS,M. S. (1973). A method for the study of incest. Annals of Human Genetics 36, 323-32. TAKANO, K. & MILLER, J. R. (1972). ABO incompatibility as a cause of spontaneous abortion. Journal of Medical Genetice 9, 144-60.

Bias in estimating the frequency of incest.

Ann. Hum. Uenet., Lond. (1975), 38, 355 Printed in &eat Britain 355 Bias in estimating the frequency of incest BY VICTOR SISKIND Statistical Consul...
300KB Sizes 0 Downloads 0 Views