Am. J. Hum. Genet. 51:1156-1160, 1992

A Cooperative Binomial Ascertainment Model E. Kh. Ginsburg and T. 1. Axenovich Institute of Cytology and Genetics, Siberian Division, Academy of Sciences, Novosibirsk, Russia

Summary It has been shown that the classical binomial form of ascertainment, assuming a constant probability X that any affected individual may become a proband for his pedigree, cannot describe a rather wide range of ascertainment procedures that might arise in practice. Some more general heuristic ascertainment formulas might then be preferred, and in this paper we consider the probabilistic basis for these formulas. We retain the binomial assumption of the classical scheme but allow the ascertainment probability to depend on the number of potential probands per pedigree. This probability can be expressed by an increasing or a decreasing function of that number. Various illustrations are given and situations where the "cooperative" binomial scheme should be valuable are discussed.

Introduction Many studies of the genetic basis of a disease are carried out by segregation and linkage analyses based on data from pedigrees sampled via a proband. Only a certain part of a pedigree can contain a proband. This part is designed by the aim of an investigation and by the investigator's possibilities and is determined by a set of characteristics of its members, such as age, sex, geographical location, etc. Elston and Sobel (1979) proposed the expression "proband sampling frame" (PSF) to describe this part of the pedigree (also see Dawson and Elston 1984). A member of PSF affected by the disease in question is named "potential proband"; he or she has a certain ascertainment probability t to attract the investigator's attention and become a cause of the pedigree sampling. We define the "potential proband frame" (PPF) as a subset of potential probands possible with the given PSF. We suppose that all PPFs are listed in some agreed order and denote by j a typical member of this list. Since several different pedigrees can have identical PPF, we denote the general pedigree by (i, j), Received June 18, 1991; final revision received May 20, 1992. Address for correspondence and reprints: Dr. E. Kh. Ginsburg, Institute of Cytology and Genetics, Lavrentiev Avenue 10, Novosibirsk 630090 Russia. O 1992 by The American Society of Human Genetics. All rights reserved. 0002-9297/92/5105-0025$02.00

1156

where the symbol i corresponds to a listing of those components in the pedigree that are not relevant to the PPF. The probability that a pedigree of type (ij) is ascertained depends only on j, and, using the symbol "A" to denote the event of ascertainment, we write this probability as P(AjI). We also denote the number of potential probands in a pedigree of type (ij) by rj and denote the population frequency of pedigrees of type

(ij) by Pij.

It follows that the probability that a pedigree in the ascertained sample is of type (ij) is

Pt(i,j)IAI

=

PijP(AlI)/EEPkmP(Alm), km

(1)

where k and m range over all values corresponding to ascertainable pedigrees. This formula enters into all likelihood-based statistical inferences, both of parameter estimation and of testing hypotheses. The formula clearly depends on the expression used for the ascertainment probability P(AjI). In many cases it is very difficult, if not impossible, to know this probability exactly. Clearly, any incorrect assumption made for the form of P(A j) will lead to biased estimation of parameters and possibly incorrect conclusions in tests of hypotheses. We now discuss various assumptions made concerning the form of P(Alj), in particular the "classical" binomial formula and the "ascertain-

1157

Cooperative Binomial Ascertainment Model

ment-assumption-free" (AAF) expression. We will then propose a "cooperative" binomial scheme that might combine the virtues of the classical and the AAF approaches. Ascertainment Assumptions The classical binomial ascertainment assumption (Haldane 1938; Bailey 1951; Morton 1959) is that P(Alj) is given by

P(AIj) = 1-(1 -i)r.

(2)

Here i is a fixed (but unknown) constant. Thus the classical scheme assumes, first, that all potential probands have the same probability, it, of becoming actual probands (irrespective of age, sex, birth order, etc.) and that different potential probands act independently in becoming actual probands. Using equation (2) in equation (1), we get

Pt(ij)IA3 = Pij I 1 -(1 n7rji /ZPkmt, 1 - (1 - n)r-} . (3) km

Since the parameter n is a probability, we can describe limiting cases (n-O, n = 1) of the classical scheme. These are called the "single" and "complete" ascertainment cases, respectively, and we have

Single ascertainment: P(A Jj) = const rj (4a) Complete ascertainment: P(A Ij) = const . (4b) These values may also be inserted, when appropriate, into equation (1). We have noted the strong assumptions attending the classical scheme. From a historical point of view, these assumptions were partially justified through the simplicity of the pedigrees examined several decades ago (e.g., nuclear families with all parents of the same genotype). However, as has been pointed out by Stene (1977), Elston and Bonney (1984), and Ewens and Shute (1986a), the assumptions in the classical scheme are possibly too rigid for the more complex forms of pedigrees used in present-day analyses. To weaken the rigidity of the classical formalization, the following heuristic form of the ascertainment probability was proposed (Stene 1978; Ewens and Shute 1986b):

P(Alj)

= b-rave

(5)

where b and c are constants: 0 < b < rc, where r is the maximum r, value. The case c = 1 corresponds to single ascertainment, and c = 0 corresponds to complete ascertainment. However, one can imagine cases where the probability of ascertainment is given as in equation (5), with c outside the limits 0 and 1. For example, c = 2, the "quadratic" ascertainment case, has been considered by Haldane (1938), Elston and Bonney (1984), and Ewens and Shute (1986a). We can also envision cases where c is negative, corresponding to cases where P(Ajj) decreases as rj increases: here a family with a large number of affected children might try not to attract attention from outside, thus lowering its ascertainment probability. If we regard the ascertainment probability (5) as potentially extending the range of cases covered by the classical scheme, then, using equations (1) and (5), we would have

Pt(i,j)IA}

=

Pijrjc/ZPkmrmn. k m

(6)

An attempt to avoid the potential defects in the classical binomial ascertainment scheme is made in the elegant AAF approach of Ewens and Shute (1986b; also see Ewens and Green 1988; Shute 1988; Shute and Ewens 1988a, 1988b). Here no specific mathematical form is assumed for the ascertainment probabilities P(Ajj). Instead, these probabilities are treated as independent unknown parameters that must be estimated, through maximum likelihood, along with the genetic parameters of interest. The practical implementation of this procedure simplifies considerably. If we consider first the maximization of the ascertainment parameters P(Ajj) in the log-likelihood function

Znijlog[PijP(Aj)1/ ZZPkmP(A Im)], km

(7)

where nij is the number of pedigrees of type (ij) in the sample, the resulting maximum-likelihood equations are

nP(Aij)Pj

=

njZP(Alk)Pk, k

where

nj = Znij and Pj = ZP# . l

i

1158

Ginsburg and Axenovich

Substitution of these values into equation (1) shc that, in practice, estimation of the genetic paramet separates out from estimation of ascertainment rameters and that the genetic parameters may be e mated directly by using the log-likelihood functic

contribution, an expression that was originally derived by using an ascertainment assumption different than that under consideration. Note that the value of i1 cancels out: thus need not be estimated in this scheme. Several properties of the cooperative ascertainment scheme follow from this observation. The cases c = 1 and c 0 correspond to single and complete ascertainment, respectively. Note, however, that, in the cooperative scheme, c = 1 (single ascertainment) does it

Znij[PijlPj]

=

i

While in some cases this approach does indeed rem all problems potentially arising through an incorr assumption concerning the ascertainment process does so at a cost: some of the sample's informat concerning the genetic parameters is lost, and larger the pedigree proportion that is in the PSF, larger is this loss. We thus now turn to an appro that possibly combines the merits of the classical the AAF approaches.

not

--1

imply

0,

does

nor

c

=

0

imply

1. Note

also that, while complete and single ascertainment are special cases of the cooperative ascertainment scheme, as they are also of the classical scheme, we cannot regard the latter as a particular case of the former, since it is impossible to choose c and in probability (9) so as to give = const, the classical ascertainment-scheme value. Tables 1 and 2 illustrate properties of the cooperative scheme. The maximum possible values of are given, in table 1, forc = 2, c = 1, c = 1, and c 2. For c 2 and c = 1 it is necessary to introduce also the maximum number, rmax, of potential probands. If c = 0, then the maximum value of ij is unity. Table 2 gives the maximum value of the probability of ascertainment of any pedigree, given by it

The Cooperative Binomial Scheme In the classical ascertainment scheme, it is assun (i) that there is a constant probability x that any pot tial proband in a pedigree becomes an actual prob.

=

-

=

-

and (ii) that individuals act independently so far proband status is concerned. In the cooperative bi mial scheme we retain the second assumption but re the first and assume instead that, if there are rj pot tial probands in a pedigree, then the probability t any specific potential proband becomes an actual p band is not a fixed constant but, rather, is given by

Pmax

=

iltmaxErjPj

(11)

c

I

expression

Table I

it(rj)

j

=

=

1

n-(1-i

rjc)l/r"

(9)

where c is a fixed constant and is the probabilit question when r, = 1. There are natural limitati on c and it1 so as to ensure that it(rj) be positive. We this ascertainment model the "cooperative" binon scheme, since, while the binomial assumption of classical scheme is retained, the probability (9) is longer a constant but depends on the number of pot tial probands in the pedigree and can be an increas or a decreasing function of rj, as appropriate. According to the classical binomial assumption, probability of ascertainment of a pedigree with rj tential probands is then

Maximal Values of n, (see Ascertainment Schemes

[91) for Various

eq.

ri

it

P(A rj)

=

1

( 1 ,j) ri -

=

7i 1

rjc

-

(10)

Insertion of this value into the general likelihood formula (1) leads to the expression (6) for the likelihood

c

rmax

1 ... 1 2 3 4 5 6 7 1 2 2 3 4 5 6 7 -1 .. -2 ...

2

3

4

5

6

7

1.0

.500 1.0

.333 .423 1.0

.250 .293 .370 1.0

.200 .225 .263 .311 1.0

.167 .184 .206 .240 .301 1.0

1.0

.250 1.0

.111 .255 1.0

.063 .134 .241 1.0

.040 .084 .138 .255 1.0

.028 .057 .091 .137 .211 1.0

.030 .005

.044 .008

.069 .016

.126 .038

.293 .134

.143 .155 .170 .191 .222 .277 1.0 .020 .042 .065 .094 .133 .198 1.0 1.0 1.0

1

.022 .003

1159

Cooperative Binomial Ascertainment Model

offspring from parental pair type m be A. Now, in this case, rmax = ro + 2, where ro is the number of offspring in the family, so, using equation (11) with c = 1, we get

Table 2 Maximum Probability (see eq. [11]) That a Pedigree Is Ascertained for Different Ascertainment Schemes rmax

PSF, d, AND C

1

2

3

4

5

6

7

2

Pmax= (ro + 2)

All family members: 1:

=

2 1 0 -1 -2

.120 .190 .344 .241 .191

.120 .190 .344 .186 .122

.120 .190 .344 .149 .081

.119 .190 .344 .124 .056

.118 .190 .344 .105 .040

2 1 0 -1 -2

.004 .010 .028 .027 .027

.003 .010 .034 .031 .030

.003 .010 .039 .034 .032

.003 .010 .042 .035 .032

.003 .010 .045 .036 .032

0:

Offspring only: 1:

2 1 0 -1 -2

.......

2 1 0 -1 -2

.......

.......

....... .......

.......

.190 .190 .190 .190 .190

.151 .190 .269 .213 .186

.137 .190 .307 .199 .152

.131 .190 .326 .173 .116

.127 .190 .335 .148 .086

.010 .010 .010 .010 .010

.007 .010 .017 .015 .015

.005 .010 .022 .018 .017

.005 .010 .026 .020 .017

.004 .010 .028 .020 .017

0:

where

....... .......

....... .......

for c > 0 and ltlmax

1 for c K 0. Here rmax is the maximum possible number of individuals in the PSF. 7tlmax = r1ax

=

Families with more than five offspring are not considered. The upper half of table 2 corresponds to ascertainment where the PSF comprises all members in the family, the lower half to the case where the PSF comprises only offspring. d = 0 and d = 1 correspond to recessive and dominant diseases, respectively. The frequency of the disease allele is assumed to be .1. For single ascertainment (c = 1), two facts are worth noting. First, as noted above, this scheme can arise with nonzero values of ;j. Second, the values of Pmax are independent of the number of offspring in the family. This fact can be explained as follows: Suppose first that parents are in the PSF. Let N denote a normal individual and let A an affected individual. Let Qm (m = 0, 1, 2) represent the frequencies of parental pairs of types NN, AN, and AA, respectively, and let Om (m = 0, 1, 2) be the probability that a given

1

ro

ZQm Z (r + m) rO)or ( 1 Om)rO

m=O

-

r

rO=

(ro+2) -1{roP +2P2},

where Pi (P2) is the probability that an individual in the offspring (parental) generation is A. At equilibrium these two probabilities are equal (to, say, P), in which case Pmax reduces to P. which is independent of ro. We have assumed that the frequency of the disease allele is .1, so that Pmax = P = .01 in the recessive case and .01 + 2(.1)(.9) = .19 in the dominant case, as is shown in table 2. An essentially identical calculation leads to the same conclusion if only the offspring are in the PSF. There appears to be no necessity to carry out a detailed comparison of the cooperative binomial ascertainment scheme described above with the classical binomial scheme and the ascertainment-assumptionfree procedure. If the real ascertainment procedure is described sufficiently accurately by probability (9), then the cooperative binomial scheme should be preferred, since it uses all the genetic information and leads to asymptotically unbiased estimation of genetic parameters. If probability (9) is not appropriate, then the cooperative binomial assumption will lead to biased parameter estimation, in which case, as pointed out by Ewens and Shute (198 6b, p. 409), an investigator "can buy an assumption-free analysis by paying in decreased precision as measured by the standard error of each estimate." Discussion

Any ascertainment procedure will imply that the sample has a distribution of pedigree types that is different from that in the population from which the sample was taken. This difference may be known to the investigator or, if the real ascertainment scheme is unknown to him or her, may be unknown. We can distinguish two different stages in the formation of the sample. The first stage indicates what pedigrees cannot be in the sample: for example, a pedigree with no affected individuals usually cannot be in the sample, and in some sampling procedures a pedigree with only one affected individual cannot be sampled. The second stage leads to a redistribution of pedigrees

1160

of various types: for example, single and quadratic ascertainment lead to an increase in sample probabilities for pedigrees with a large number of potential probands. These two stages might be independent. There are three types of ascertainment procedures satisfying these requirements. The first type requires a formalized probabilistic model of the ascertainment process, implying a set of more or less strict assumptions relying on a small number of ascertainment parameters. The classical binomial scheme is of this type. The second type leaves the ascertainment method undetermined, making the ascertainment probabilities P(A Ij) independent parameters. The cost of such a procedure is a loss of information concerning genetic parameters. The cooperative binomial scheme illustrates the third type of ascertainment assumption. Here P(Alj) is assumed to be a certain approximating function of the PPF characteristics. Here a reasonable choice of function is needed, which, on the one hand, covers the possibly wide range of possible ascertainment processes holding in reality but, on the other hand, has only a small number of parameters and does not lead to significant loss of information about genetic parameters. The particular mathematical form (10) is chosen because, first, complete and single ascertainment are particular cases of this model and, second, P(Alj) is determined by only one unknown parameter, c (the parameter s1, cancels in the likelihood calculations and does not have to be estimated). Clearly, mathematical forms other than equation (10) are also possible. While the third type of scheme may appear similar to the classical scheme, it significantly widens the range of ascertainment procedures that can be considered. For example, data gathered in a two-tier ascertainment process, with single ascertainment operating in each tier, would lead to a cooperative binomial ascertainment scheme described by equation (10), with c = 2. Thus although the cooperative binomial scheme makes various strict assumptions similar to those of the classical binomial scheme -namely, that (i) the probability of ascertainment of a pedigree depends only on the number of potential probands, (ii) for any given pedigree, each member of the PPF has the same probability of being a proband as any other, (iii) each potential proband is reported to the investigator independently of any other potential proband, and (iv) the age, sex, etc. of the potential proband do not affect the probability of being a proband - the generalization to the classical scheme by equation (10) seems

Ginsburg and Axenovich a reasonable step in extending the range of ascertainment schemes that can be assumed to apply. Quite naturally, assumptions ii-iv are to be put forward and considered only if the ascertainment process needs to be interpreted. Otherwise, only assumption i is valid, no probabilistic model for this process is proposed, and likelihood (6) is used as a suitable approximation.

Acknowledgment We would like to express our gratitude for the reviewers whose useful suggestions help to clarify the text.

References Bailey NTJ (1951) The estimation of the frequencies of recessives with incomplete multiple selection. Ann Eugenics 16:215-222 Dawson DV, Elston RC (1984) A bivariate problem in human genetics: ascertainment of families through a correlated trait. Am J Med Genet 18:435-448 Elston RC, Bonney GE (1984) Sampling consideration in the design and analysis of family studies. In: Rao DC, Elston RC, Kuller LH, Feinlieb M, Carter C, Havlik R (eds) Genetic epidemiology of coronary heart disease: past, present and future. Alan R Liss, New York, pp 349371 Elston RC, Sobel E (1979) Sample consideration in the gathering and analysis of pedigree data. Am J Hum Genet 31: 62-69 Ewens WJ, Green RM (1988) A resolution of the ascertainment sampling problem. IV. Continuous phenotypes. Genet Epidemiol 5:433-444 Ewens WJ, Shute NCE (1986a) The limits of ascertainment. Ann Hum Genet 50:399-402 (1 986b) A resolution of the ascertainment sampling. I. Theory. Theor Popul Biol 30:388-412 Haldane JBS (1938) The estimation of the frequencies of recessive conditions in man. Ann Eugenics 8:255-262 Morton NE (1959) Genetic tests under incomplete ascertainment. Am J Hum Genet 11: 1-16 Shute NCE (1988) The ascertainment sampling problem and estimation of genetic parameters when parental haplotypes are known. Am J Med Genet 31:281-290 ShuteNCE, Ewens WJ (1988a) A resolution ofthe ascertainment sampling problem. II. Generalization and numerical results. Am J Hum Genet 43:374-386 (1 988b) A resolution of the ascertainment sampling problem. III. Pedigrees. Am J Hum Genet 43:387-395 SteneJ (1977) Assumption for different ascertainment models in human genetics. Biometrics 33:523-527 (1978) Choice of ascertainment model. I. Discrimination between single-proband models by means of birth order data. Ann Hum Genet 42:219-229

A cooperative binomial ascertainment model.

It has been shown that the classical binomial form of ascertainment, assuming a constant probability pi that any affected individual may become a prob...
737KB Sizes 0 Downloads 0 Views