361

Ann. Hum. Genet., Lond. (1976), 39, 361

Printed in Great Britain

Bernstein’s and gene-counting methods in generalized ABO -like systems BY JUN-MO NAM AND JOHN J. CART

National Cancer Institute, Bethesda, Maryland 20014 1.

INTRODUC!I!ION

There is much current interest in the H L A system and its relation to histocompatibility (e.g. Cavalli-Sforza & Bodmer, 1971, pp. 242ff.; Elandt-Johnson, 1971, pp. 546ff.) and disease (e.g. Bodmer, 1972; Rogentine, Yankee, Gart, Nam & Trapani, 1972). It is a generalized ABOlike system with upwards of 8 codominant alleles (antigens) at a single locus. Yasuda & Kimura (1968) showed how the two Bernstein’s estimators and gene-counting methods (Ceppellini, Siniscalco & Smith, 1955; Smith, 1967, 1967) can be applied to such generalized models. They did not derive, however, formulas for the asymptotic variances of the adjusted Bernstein’s estimators nor did they investigate the relative efficiency of either of Bernstein’s estimators. The statistical properties of the ABO system were extensively studied by Stevens (1938,1950). In particular these papers showed the adjusted Bernstein’s estimators to be efficient for the ABO system. An associated chi-square test of fit of the Hardy-Weinberg law was also given. DeGroot (1956) derived explicit expressions for the asymptotic variances of these estimators and found explicit expressions for the relative efficiency of the simple Bernstein’s method when considering the individual point estimators separately. Smith (1967) tabled the relative efficiencies for the various simple estimators. Chakraborty (1970) found expressions for the relative efficiency of the simple Bernstein’s method using the concept of generalized variances. This paper investigates the statistical properties of Bernstein’s estimators in the case of any number of alleles. The asymptotic variance of the adjusted Bernstein’s method is found for a general number of alleles. This method is shown to be inefficient when more than two codominant alleles are present. The relative efficiencies of both Bernstein’s methods are studied for various numbers of alleles. An alternative simple and ‘almost’efficient method of estimation is suggested: the adjusted Bernstein method followed by one gene-counting iteration. The chi-square goodness of fit test is extended to the general case.

2. NOTATION

For the most part we shall use the notation of Yasuda& Kimura (1968). We consider throughout the generalized ABO-like system with m alleles which, in the terminology of Cotterman (1953), is the phenogram m- [(m2-m+2)/2] - 1. This is a system with m- 1 codominant alleles, A,, i = 1,2, ...,m - 1, and one recessive allele, 0, which therefore possesses (m2- m + 2)/2 phenotypes. The frequencies of the codominant alleles are denoted by pi, i = l, 2, ...,m, and the allele m- 1

frequency of the recessive allele is denoted by r , where 2 pi+ r = 1. The Hardy-Weinberg law i-1

may be written (following Yasuda & Kimura, p. 417):

J. NAMAND J. J. GART

362 Phenotype

Multinornial probability under Hardy-Weinberg law

Observed

Genotype

Am-1 0

A1 A,

Totals

The sum of all the frequencies involving the allele Ai is cxtlled Gi, that is, m-1

Gi=ni+

nij, i = 1 , 2,...,m - 1 .

j*i

3.

BERNSTEIN’S METHODS

The generalization of Bernstein’s simple method to the m allele case is quite simple :

A = 1 -4(1 - Gi/N)’\ (3.1)

I

P = J(.O/W

The statistic D measures the discrepancy in some sense from the Hardy-Weinberg Law: m- 1

It should be noted that D may be either positive, negative or zero. There is an alternative estimator of the recessive frequency (Smith, 1967)

Po

m- 1

= 1-

i=l

&.

Note that this estimator may also yield negative values. The adjusted Bernstein’sestimators uses the value of D to force the sum of the gene frequencies to be closer to one. The formulae as given by Yasuda and Kimura are, @j’=&(i+D/2). i = 1,2,...,~ ~ - 1 , l

P*

+

I

+

= (P D/2)( 1 D / 2 ) .

(3.2)

Note that Ifit+ P* = 1 - Da/4, that is, the deviation from one is considerably less than for the simple estimators. One should note that it is possible for B* to be negative if D is negative with p / 2 1 > P. We propose that Bernstein’s method be modified in a somewhat different way which will enjoy the same asymptotic properties as does the adjusted method (3.2). The ‘modified’ Bernstein’e method is:

8i 8‘ = i = 1,2,...,m-1, a 1-D/2’

+

.P D/2 9‘ = 1 - D/2’

I

(3.3)

Generalized ABO-like systems

363 m- 1

C fj;+P'

An obviously amicable property of the 'modified' method is that

= 1 all the time,

i=l

although as before it is still possible for 2' to be negative. We shall have occasion to compare some of the above estimators with the maximum likelihood estimators which shall be denoted by @,, i = 1,2, ...,rn - 1, and P. 4.

THE VARIANCE AND COVARIANCE OF THE SIMPLE

BERNSTEIN'S ESTIMATORS

The asymptotic variances and covariances of the fji's are simple functions of the means, variances, and covariances of the Gi/N. From the properties of the multinomial distribution these are found to be E ( G , / N )= p i ( 2 - p i ) , V ( G J N )= ~ ~ ( 2 - p ~ ) ( l - p ~ )i ~=/ 1N, 2,, ...,m - 1 , and

C ( G i / N , G i / N )= - p i p j { 2 - 2 ( p i + p i ) + p i p j } / N ,

Let fji = $(Gi/N) where $(u) = 1 -,/( variance is V@i)

i =I= j = 1,2, ...,m - 1 .

1 - u ) . Then by the usual technique, the asymptotic

[$'{E(Gi/N)}12J'(Gi/N)V(fji)= p i ( 2 - p i ) / 4 N , i = 1 , 2 , ...,m - 1.

This yields

=

(4.1)

I n a similar manner, we find the asymptotic covariance to be

The variance and covariance of P depend on the means, variances, and covariances of n o / N .We find, E(no/N)= r2,

V(no/N= ) r2( 1 - r 2 ) / N , and

C(no/N,G J N ) = - r2pi(2- p i ) / N .

From these we find the asymptotic variances and covariances to be

V(P)= (1 - r 2 ) / 4 N , (4.3)

If, in order to insure that

m- 1

m-1

x 9,+ P o

=

1, the estimator Po = 1 -

i=l

i-1

fji

is used for r, then ( 4 . 1 )and (4.2)may be used to yield

It is apparent from (4.3)and ( 4 . 4 )that V(Po)< V(P).It is difficult t o calculate V(Po)from ( 4 . 4 ) ; a somewhat easier computing formula for large m follows:

"

V(Po)= - - ( l - r ) 2 + 2 r

4N From ( 4 . 1 )and (4.2)we also find

m-1

r)k k=l

l-pk

)']

(4.5) '

J. NAMAND J. J. GART

364

For the ABO system m = 3, (4.4) and (4.6) can be written

and Even after allowance for the different notation, (4.7) and (4.8)differ somewhat from Smith (1967, p. 103). 6. D AND ITS ASYMPTOTIC VARIANCE

The statistic D may be used in a test of the adequacy of the fit to the Hardy-Weinberg Law. The asymptotic variance of D is found by routine but rather elaborate algebraic manipulations from (4.1), (4.2), and (4.3) to be Since under the Hardy-Weinberg law E(D) = 0 asymptotically, the approximate x2 statistic with one degree of fieedom is

~5 = Da/P(D),

where p(D)is (6.1) withpk = @ k , k = 1,

...,m -

(5.2)

1. when m = 3, the 0 0 system, thisreduces to

x5 = 2N(1+ P/$,@a)

D2,

which agrees exactly with Stevens (1950, equation 2.24). 6. THE VARIANCES AND GOVARIANCES OF THE ‘MODIFIED’BERNSTEIN’S ESTIMATORS

The asymptotic variances and covariances of the adjusted Bematein’s estimators are simple functions of C(&D) and C(4,D) as well as of V(&), V(P)and V(D). Using (4.1), (4.2), and (4.3), we find

and Applying the usual asymptotic technique t o (3.3) we have, for i = 1,2, ...,m - 1,

where all the derivatives are evaluated at @( = pi and D = 0. Clearly the expression on the right . yields the result will be exactly the same for V ( @ f )This Pa V ( D )+Pic(&, = V($J D),

+$

v@;)

or explicitly,

Similarlvwe find that

( I; L -l-Pi A )i, 1,2,...,m-1. 4N(1-Pt) Pf

m- 1

=

k-1 l - p k

V ( P ) = V(4) + (’ +“) 4 V ( D )+ (1+ r ) C(4,D).

(6.3)

365

Generalized A B O - l i k e systems Clearly this expression will be the same for B*. Explicitly this is

(6.4)

Covariances are similarly found to be (using (4.2), (4.3), (5.1),(6.1) and (6.2)),

and,

where i

+ j = 1,2, ...,m - 1. As before the covariance terms will be the same for $t and P*.

BERNSTEIN'S MODIFIED METHOD FOR m = 4 Stevens ( 1938) showed the adjusted (and thus asymptotically equivalent modified) Bernstein's estimators to be fully efficient form = 3, the ABO system. This can be easily confirmed by letting m = 3 in (6.3),(6.4), (6.5) and(6.6)and finding these results are identical with the corresponding elements of the inverse of the information matrix (see, e.g., DeGroot (1956)).Does full efficiency hold for the general case ? Since Stevens' form of proof is clearly limited to the case m = 3, we extend the result to m = 4 via the route of inverting the information matrix. The information matrix and its explicit inverse for m = 4 is given in Appendix I. For m = 4, (6.3)becomes 7 . THE RELATIVE

EFFICIENCY

OF

8 4

where u 9 v

+ i = 1,2,3. The relative efficiency of 9; as an estimator of p acan be written as

I,.

where Ai = V(j3;)- V(Pi)and V(j3Jis given by ( A I . 1). By straightforward algebra we have that

[

Pu P v A . = P: a 8N

+ +

P i ( r + 2 P u P ~()3 + r )

+ikb1

(1-Pk)){4r+Pi(1-')}

3

3

k=l

k= 1

( n ( 1 - p k ) ] ( ( 3 + r ) p l P ~ p 3 + 4 r( I~- P k ) )

(7.3)

where again i u v = 1,2,3. This shows the modified Bernstein's method to be inefficient, unless some pi = 0, wherein the m = 4 case effectively reduces to the m = 3 case. I n some instances for m > 3 the modified Bernstein's estimator is not only inefficient, but the modified estimator has a larger variance than the simple Bernstein's estimator. Consider the case: p 1 = 0.1, p 2 = 0.4, p 3 = 0-3and r = 0.2. Using (7.2),we find R(f3i)= 0.987 or 98.7% while, using (4.1) and ( A I .1 ) , we find

V(@d = 0.992 or 99.2%. R($,) = V(@J However, the modified method is better for each of p z ,p3, and r. 23-3

366

J. NAMAND J. J. GART Table 1. The total relative eficiencies (7.4) of Bernstein's methods for m = 4 Relative efficiencies (yo) Parameter values

I

A

I

\

Simple

Modified

PI

Pa

P3

T

Rd6)

RAB')

0.05

0.05

0.05

0'10

0'10

0'10

0'20

0'10

0.05

0.85 0.70 0.65

0.40

0'10

0.05

0.60 0.80 0.30

0.05

0.05

0'45 0.30

0.05

0.05

0'10

0.30

0.30

0'10

0.40

0.40

0'10

0'10

0.85 0.90 0.85 0.90

0.05

0.05

0.05

0.05

0.03 0.03 004

0'02 0'02

99.98 99.8 99'6 97.8 95'3 79'2 73'6 69.5 66.I 53'8 50.6

0'01

45'0

99'99 99'9 99'9 99'5 99'3 95'8 85.1 88.7 90.8 846 83.1 77'0

0'10

0.05

In the multiparameter situation it is better to measure the relative efficiency by the ratio of the generalized variances, i.e. the determinant of the asymptotic variance-covariance matrices. These total relative efficiencies are denoted by

where Cs,Eg,and C, are the asymptotic variance-covariance matrices for the maximum likelihood, modified Bernstein's, and simple Bernstein's estimators. For the example given above we find lit(;')= 94.6% and = 84.1 %. More examples of R&) and Rt(6')form = 4 are given in Table 1. Note that Rt(&) > lit(:)for all these examples. Also it is apparent that either of Bernstein's methods is almost efficient for large r (r > 0-30). For small r ( r < 0.10) both methods are more inefficient, although in all these cases the modified method is a great improvement over the simple method.

&(a)

BERNSTEIN ESTIMATOR BY THE GENE COUNTING METHOD The gene counting method was fully developed by Smith (Ceppellini, Siniscalco & Smith, 1955; Smith, 1957, 1967). Its extension to the general ABO-like phenogram m - [(m2- m + 2)/2]- 1 was pointed out by Yasuda & Kimura (1968). This iterative method is equivalent to the maximum likelihood method in that it converges to a fully efficient set of estimators. It is somewhat simpler to apply in that it does not require matrix inversion. We are concernedwith the question: If a single gene counting iterative is applied using ' modified ' Bernstein's estimator as an initial estimator, how much is the modified Bernstein's estimator improved in efficiency ? Obviously form = 2, no improvement is possible as the initial estimator is already fully efficient. We shall consider the general case. The first iteration of the counting method yields the estimators, @,; where 8.

IMPROVEMENT OF THE MODIFIED

fj;=-+- s' 2N

fji

$,+2P

(") 2N '

i = 1,2,...,m-I,

367

Generalized ABO-like systems and the estimator P", where

p"

= 1-

m- 1

2 pi.

(8.2)

i=l

As before it is possible for P" to be negative. Recalling (3.3),we see that (8.1) may also be written:

y=Gi+ 2N

@i+2P+D 2N '

i = 1,2,...,m - 1 .

Thus we can go directly from fii to without a n intervening computation of the @I or 9". The calculation of bhe V(@;)requires the results found above in (4.1), (4.3), (5.1), (6.1), and (6.2) as well as the following results: V ( G i / N )= Pi(2-Pi) ( 1 -Pi)'/N,

V ( n i / N )=pi(pi+2r) ( 1 -pq- 2pir)/N,

+

C(?t,/N,G J N ) = pi(pi 2r) ( 1 -pi)'/N, C(GJN,@i)= P i ( l - ~ i () 2 - ~ i ) / ' N , C(Gi/N,P)= -rpi(2-pi)/2N, C(ni/N,@i) C(ni/N,P )

= Pi(Pi +2r) ( 1 -Pi)/'N, =

-rpi(pi+ 2r)/2N,

and These fifteen variances and covariances are used to produce the asymptotic variance for f$:

(A) (

4N 2 -pi - ( 1 -pi - r ) (A) pi+2r 1-pi + r pi+2r

+' 4

1-p, {T (L)

(A)2{(zA)2-z pi+2r

Appendix I1 shows that

V(@;)< V(@li),

and

V ( @ ; )< V(i3i),

-Pk

ek

P. 1 -Pi - 2)

(A)']]. -Pk (8.4)

for all i = 1,2, ...,m - 1, wheneverm 2 4 , except for degenerate cases of one or morepi = V(@i). the model effectively reduces to m < 3, for which V(@;) By similar calculations we find for i + j = 1,2, ...,m - 1

= 0 so that

J. NAMAND J. J. GART

368

Table 2. Esteruse variants of Drosophila virilis (phenogrum 4-7 - 1 (m = 4)) (Data of Ohba, source: Yasuda & Kimura (1968, p. 415))

A,@) 1 I49 336

AAS) A2W) 4(F) 0

&F)

A,(W 336 36

25

I7

25

203

-

-

Gi

0

-

203

-

I 688

397 245

-

20

N = 1786.

xk = D a / V ( D )= ( -o~0608)a/(o~ooo~676) = 22.06,P

N

3x

I

O

~

The estimates (3.1)

(3.2)

Simple Bernstein

Adjusted Bernstein

231

0.7658

Pa

0.1181 0.071I

0.7425 0.1145 0.0690 0.0731 0.9991

Parameter

P3

r

0.1058 I .0608

+

Z@i 2

(3.3) Modified Bernstein

(8.1) or (8.3)

0'7432 0.1146 0.0690 0.0732

0'7413 0.1156 0.0701 0.0730

0.7414 0.1156 0.0701 0'0729

1'0000

I '0000

1'0000

Modified and I counting

ML

Relative efficiency (yo)

Parameter

Simple Bernstein

Adjusted or modified Bernstein

77'2 98.1 99'1 54'4

93'1 98.6 98.9 92'9

P1

PS P3 T

Modified and I counting

97'2 99'9

ML 100'0

100'0

100'0 100'0

94'4

100'0

m-1

Using the fact B" = 1 - C @: and (8.4) and (8.5) we find i=l

(8.6)

Expressions (8.4) and especially (8.6) are so complicated that it is probably easier to estimate these variances by using the approximate methods suggested by Smith (1967). 9. SOME NUMERICAL EXAMFLES

We consider some numerical examples for m 2 4. Example 1.m = 4,phenogram 4-7- 1.Table 2 gives the data of Ohba (cited by Yasuda & Kimura (1968, p. 415)) for Esterase Variants of Drosophila virilis. It is seen that D = - 0-0608, and the

Generalixed A BO - like systems

369

Table 3. HL-A Antigens of 200 Normal Controls (LA-series)(phenogram 9-37- 1) (Source:Rogentine, Yankee, Gart, Nam and Trapmi (1972))

A, 16 26 1 7 2 4 2

-

0

A, 26 25 3 15 3 3 3

-

2

A, 13 6 6

A, 7 15 9 12

I

_

2

3

6

6 3 3 2

_

A,

3

A, 4

A,

A,

2

0

3

3

2

3 5 2

2 4

2

2 5 4

0

2 0

0

0

7

1

2

I

0

I

_

_

Qi

I

2 4 2 2

_

0

I

_

2

N = zoo.

x2 = D 2 / V ( D )= (-0~0126)~/(0~000518) = 0.31, P = 0.58. The estimates

Parameter

Cpi + r

(3.1)

(3.2)

(3.3)

Simple Uernsteiii

Adjusted Bernstein

Modified Bernstein

0.1938 0.2351 0-1 140 0'1427 0.0487 0.0434 0.0540 0.0228 0.1581

0 . I 926 0.2336

0.1926 0.2336

0'1133

0'1133

0.1418 0.0484 0.043I 0.0537 0.1 508

0.1418 0.0484 0.043I 0.0537 0.0227 0.1508

1.0126

1'0000

I '0000

0.0227

(8.1) or (8.3) Modified and I counting 0 -I 906

ML

0-0229 0'1492

0.1906 0.235 I 0.1137 0.1422 0.0489 0.0425 0.0552 0.0229 0.1489

I 'OOOO

I '0000

0,2349 0.1137 0.1421 0.0489 0.0425 0.0552

Hardy-Weinberg Law is rejected by the & test ( ~=522.06, P = 3 x The chi-square goodThus the ness of fit test (given by Yasuda & Kimura) yields xa = 26.98 (d.f. = 3, P = resulting estimatesof thep's will be misleading whatever method is used. The various estimates are reported in Table 2 as a relatively simple numerical application of the various methods discussed here. Still other methods were applied to these data by Yasuda and Kimura (p. 415). On the other hand the bottom panel of Table 2 gives a valid picture of the relative efficienciesof the methods for these p's when the Hardy-Weinberg law does fit. As p 1 is large (0,7414) the relative efficiency of the simple Bernstein's estimator is low (77.2 %). A single iteration of the counting method is a moderate improvement of the adjusted or modified method (97.2% to 93.1%). On the other hand, p 2 and p 3 are small and the simple method is almost fully efficient in these cases. I n fact 8; or 9: is less efficient than 83in this case (98.9 % to 99.1 %). The estimation of r essentially reflects all the errors in the estimation of thep's. When the efficiency of the estimation of one of the p's is low, the efficiency of estimation of r is even worse; when all the p's are estimated well, so is r. Example 2. m = 9, phenogram 9-37-1. Table 3 gives the data of Rogentine et al. (1972) on the LA series of HL-A antigens in 200 human 'controls'. It is seen that the Hardy-Weinberg Law fits very well by the xk test. Once again the modified Bernstein's estimator plus one counting iteration gives results virtually identical to the ML eetimatee. The relative efficiencies of the various methods for estimating p a are 95.90 % for &, 96.04 % for $6 or $$,and 99.73 % for $:. Sincep , = 0.2351 is the largest p , these efficiencieswould be even larger for the other p's.

370

J. N m AND J. J. GART

10. ESTIMATION EFFICIENCY AND

DESIGN EFFICIENCY FOR VARIOUS NUMBERS OF ALLELES

The H L A Study alluded to in Table 3 was concerned with whether any gene frequency was higher ina group of leukaemics compared with normal controls. This study (Rogentine et at. 1972) found H L A 2 to be significantly more frequent in leukaemics than in the 200 normal controls. Many studies of the LA series do not report on all these eight antigens. For instance, CavalliSforza & Bodmer (1971,p.251)reportononlyfourantigens H L A l , H L A 2 , H L A 3 , a n d H L A 9 , that is, m = 5.The questions arise: (1) If we are primarily interested in estimating a particular antigen frequency, say HL-A2, how much is gained by typing various numbers of other antigens? (2) Which are the best other antigens to type? (3) How does the method of estimating affect the anmers to the first two questions? Consider the simplest case of typing for only one antigen, say HL-A2 in example 2. It is easy to show that the ML estimator is pa = 1 -J(l - G , / N ) , which is identical to the Bernstein’s estimators. The variance of the efficient estimator for this case (m = 2) is

%(a) = Pa@-Pd/4N. If H L A 2 and exactly one additional antigen, the ith is typed, then the variance of the efficient estimator (either Ba,p:, or $8) is found to be (using (6.3)with m = 3),

We immediately see that the maximal decrease in variance is found choosing i so that pi is the largest other gene frequency. We have not proven this result in general, but it seems apparent from intuition and other arithmetic calculations that this must be the most efficient design for any m : always type the other antigens in descending order of their frequency. We applied this rule to the data of example 2 and Table 3 to arrive a t the results given in Table 4. The top panel is the relative efficiency of the estimators of pafor various values of m. The simple Bernstein’s estimator is fully efficient only for m = 2, and, as m increases, it decreasesin efficiency relative to the ML method. The adjusted or modified Bernstein’s methods are fully efficient for m = 2 and 3, but decrease in efficiency to almost the same relative efficiency as the simple method form = 9. The modified plus one counting is also fully efficient form = 2 and 3 and loees very little in efficiencyas m increases. The ML estimator is, of course, fully efficient for all m. The ‘design efficiency’is computed in the bottom panel of Table 4. The design efficiency is the ratio of the variance of the particular estimator in question for m = 9 to the variance of that estimator for various values of m < 9. For instance for the ML method the design efficiency, in per ) , = 2,3, ...,9. Note that this measure compares the estimators to cent, is i O O ~ ( @ a ) ~ m ( f i ~ m themselves for various values of m, so comparisons among columns are not appropriate in the bottom panel of Table 4. The aimple Bernstein’s method neglects all the information on the other antigens in estimating p,, thus its variance is constant over m (see (4.1)). This is reflected in the fact that all values of m < 9 are as efficient as m = 9. On the other hand the ML method yields the most logical pattern of design efficiency. As m increases the efficiency increases to a maximum for m = 9. Each additional antigen type yields

Generalized ABO-like systems

371

Table 4. Relative efficiency* (to ML in

yo)various methods of estimation for p , = 0.235 and varying m

m

Simple Bernstein

Adjusted or modified Bernstein

2

100'00

100'00

3 4

99'52 98.94 98.20 97.68 97'07 96.38 95'90

IOO'OO

5 6 7 8 9

99'50 98.67 98.07 97'36 96.57 96.04

Modified and I counting 100'00 100'00

99'99 99'96 99'93 99-88 99'80 99'73

ML 100'00

IOO'OO IOO'OO 100'00 100'00

I 00'00 IOO'OO IOO'OO

Design efficiency* (yo)of numbers of antigen tested relative to m = g using that particular method of estimation

112

*

Simple Bernstein

Adjusted or modified Bernstein

100'00

99.86

IOO'OO

100.34

IOO'OO

100'00

100.42 100'34

100'00

100.25

100'00 IOO'OO

100.16 100.06

IOO'OO

IOO'OO

Modified and I counting

96.I 6 96.63 97.19 9749 98.38 98.94 99'57 IOO'OO

ML

95'90 96.39 96.93 97'66 98.18 98.80 99'5 1 100'00

As m increases next most frequent alleleis addedin order, i.e. A,, A,, A,, A,, A,, etc. usingexample 2,Table 3.

a bit more information onp2.As it is almost fully efficient, the modified plus one counting iteration apes the ML method in design efficiency. The adjusted or modified methods present a mixed picture in design efficiency. The design efficiencyincreases from its minimum at m = 2 to a maximum at m = 4, after which it decreases monotonely to m = 9. Thus, for this example, typing only three antigens (HGA2, H L A 1 , and H G A 4 , m = 4) yields the minimum variance for 8; or 138. We raise this point mainly to point out the inconsistencies that can arise when using inefficient estimation techniques in conjunction with 'efficient ' designs. We note that the aforementioned Cavalli-Sforza and Bodmer example typed the four most frequent antigens, so that the design efficiencyfor m = 5 relative to m = 9 is 97.66% for the ML estimation and 97.89% for the modified plus one counting iteration estimation of p z . We are grateful to Dr G. N. Rogentine, Jr. for introducing us to the H L A antigen system. SUMMaRY

Although the simple and adjusted Bernstein's methods are fully efficient for m

(ABOsystem)respectively, their efficiency declines for larger values ofm. For m 2

=

2 and m

=

3

4, the adjusted

or modified Bernstein's method witha single counting iteration leads to anearlyefficient estimator. A single degree of freedom chi-square test of the Hardy-Weinberg law for all m is derived. Some findings on the statistical efficiency of typing various numbers of antigens are given. All the results are illustrated in numerical examples.

J. NAMAND J. J. GART

372

REFERENCES

BODEZER, W. F. (1972). Evolutionary sigmficance of the HL-A system. Nature 237, 139-146. CAVALLI-SFORZA, L. L. t BODMER,W. F. (1971). The Genetico of H u m n Populations. San Francisco: Freeman. CEPPELLINI, R., SINISCALCO, M. & SMITE,C. A. B. (1955). The estimation of gene frequencies in a randommating population. Ann. Hum. Genet. 20, 97-115. CHAKRABORTY,R. (1970). Gene frequency estimates in the ABO system and their efficiencies. Sankhya B 32, 21-26.

COTTERMAN, C. W. (1953). Regular two-allele and three-allele phenotype systems. Part I. A m r . J. Hum. Genet. 5 , 193-235. DEGROOT, M. H. (1956). Efficiency of gene frequency estimates for the ABO system. Am. J. Hum. Genet. 8,39-43.

ELANDT-JOHNSON, R. C. (1971). Probability Models and Statbtkal Methook in Genetics. New York: Wiley. ROCENTINE, G. N. Jr., YANKEE,R. A., GART, J. J., NAM,J. & TRAPANI, R. J. (1972). HL-A antigens and disease. Acute lymphocytic leukemia. J. Clin. I n v . 51, 2420-2428, SMITH,C. A. B. (1957). Counting methods in genetic statistics. Ann. Hum. Genet. 21, 25P276. SMITH,C. A. B. (1967). Notes on gene frequency, with multiple alleles. Ann. Hum. Genet. 31, 94-107. STEVENS, W. L. (1938). Estimation of blood-group gene frequencies. Ann. Eugen. Lond. 8, 362-375. STEVENS, W. L. (1950). Statistical analysis of the A-B-0 blood groups. H u m n Biology 22, 191-217. YASUDA, N. & KIMIJRA, M. (1968). A gene-counting method of maximum likelihood for estimating gene frequencies in ABO and ABO-like systems. Ann. Hum. Genet. 31, 409-420.

APPENDIX I

The inverse of the information matrix for m = 4 The information matrix is expressed by

+ &fir1

I = [&,I = “gi,

fii = (2/pi) +hi - 1 with hi

+ 2r)

= pi/(pi

for i,j = 1,2, ...,m - 1.

We want to find an explicit form of the inversion matrix for m = 4, i.e. I-l = lI1-l adj I = [ P I . From the relations and we have

giigj,-g!j = -4(hi-hj)2, giigjk-gikgj$

= 4(hj-hi)(hj-hk),

i *j = 1,2,3, i *j

* k = 1,293,

gll(g2a 933 - d 3 ) +glZ(g13923 - 912 933) +gl&lZ 923 - 912 9 2 2 ) The determinant of I can be written

= Oa

Further calculation shows 1 1 1= 16N3{4(r+ 2 ~ i ~ j ) r + 3 ~- ~ )~>2/ @ ~ 3~ P( ~~ P ~ ) s 0 and the inverse exists. The cofactors of Iiiand Ii,in the determinant I are,

373

Generalized A BO-like systems After intensive algebraic manipulations we fkd the diagonal elements of the inverse to be,

where u $: v

+ i = 1 , 2 , 3 . The off-diagonal terms are, I

.-

fori + j = 1 , 2 , 3 .

APPENDIX I1

Expression showing f3: is an improvement over both f3; and f3$ i n estimation of p i for m 2 4

From ( 6 . 3 )and (8.4), the variance of 8; can be written by

for all i = 1 , 2 , ...,m - 1 .

where

After rather involved algebraic manipulation we find yi > 0. Therefore V(f3;)< V(f3;)and the second term on the right of (AII. 1) is the amount of improvement of the adjusted or modified Bernstein’s by the single interaction of the counting method. Since V(&)is not necessarily less than the V(&),we also need t o compare V(f3;)with V(fii). Using (4.1)and (8.4) we have Pi Si, V(f3;)= V(Pi) --P: (AIL 2) 4Npi+2r

where

for all i = 1,2, ...,m - 1. Putting each sum over a single common denominator, we can write the numerator of Si as m- 1

m-1

4dl-PL-7.)

n

k+i

q = 1 - k*i IT

(l-pk)-

m- 1

and

c = cs

Bernstein's and gene-counting methods in generalized ABO-like systems.

Although the simple and adjusted Bernstein's methods are fully efficient for m=2 and m=3 (ABO system) respectively, their efficiency declines for larg...
683KB Sizes 0 Downloads 0 Views