33 1

Ann. Hum. Genet. (1992), 56, 331-338 Printed in Great Britain

The power of the N-test of haplotype concordance J. R. GREEN AND S. SHAH Statistics and Computational Mathematics Department, Liverpool University, P.O. Box 147, Liverpool L69 3BX

SUMMARY

The N-test of haplotype concordance among siblings affected by some disease under investigation is used t o decide whether there is a disease susceptibility gene linked t o a marker locus or chromosomal region. The use of this test and appropriate modifications of it is briefly reviewed. The power of the ordinary N-test is then derived as a function of several parameters. The sample size needed to attain a given power is then derived. Some of the parameters are specified and the required sample sizes are given in tables for different values of the main unknown parameters. INTRODUCTION

The N-test is used for testing the inheritance of a disease susceptibility gene by investigating the sharing of haplotypes among siblings affected by the disease. A significantly high sharing, as compared with pure Mendelian inheritance is an indication of there being a disease susceptibility gene linked to the shared haplotypes. No assumption is made about the mode of inheritance in applying the test. The criterion N consists of the highest frequency from one parent among the affected siblings plus the highest frequency from the other parent. The asymptotic normality of N is assumed with mean and variance under the null hypothesis as given by Green & Low (1984). The N criterion has also been used to investigate the mode of inheritance of IDDM and to estimate relevant parameters (Green et al. 1983). When there are only two affected siblings, the measure, identity by descent (e.g. Suarez, 1978), has been used for some time. However, this is not appropriate when there are more than two affected siblings in a sibship. Thomson (1980) suggested randomly selecting pairs from such subships, but this loses information. Weitkamp et al. (1981) suggested including all possible pairs of affected siblings in each family, but these are not all independent, so the associated distribution theory becomes invalid. Other test criteria, R (Green & Woodrow, 1977) and F (de Vries et al., 1976), have been used, but N is better to use, as explained in Green et al. (1983). Modifications of the N-test have been devised for situations where the ordinary N can not be used: ( a ) When there is only one affected sibling in a sibship and there is at least one unaffected sibling; a measure, T (based on N), of discordance between the affected and unaffected siblings (Green & Montasser, 1988) has been devised t o deal with this situation. ( b ) Some families provide incomplete information about the four relevant parental haplotypes. Also some of the loci have a low polymorphism, so that there is an appreciable probability of there being fewer than four distinct haplotypes among the parents. This can result in some excess sharing among the affected siblings apart from that due t o sharing of haplotypes linked to a disease gene. This situation has been catered for by amending the N 23-2

J. R. GREENAND S. SHAH

332

distribution, using estimated frequencies of the alternative haplotypes involved (Green & Grennan, 1991 ; Sidebottom et al. 1991).

THE POWER OF THE N-TEST

As in the previous work, we assume approximate normality for the N criterion under either of the hypotheses : H , : the null hypothesis : the inheritance of the haplotypes from the parents is purely random - each of the affected siblings may inherit either haplotype from one parent and either haplotype from the other with equal probability. The resulting distribution of N has been given by Green & Low (1984). H , : the alternative hypothesis : the probability of inheritance of different combinations of haplotypes by the affected siblings depends upon the number of disease genes each parent has. Suppose the mean and variance of N for a single sibship are p, and C T ~under H , and p, and u: under H,. The power of the test for one sibship is the probability that N exceeds its critical value at, say, a level of significance. This critical value is c,, such that

where the normal integral to

e,

is @(e,) = 1-a,

so that c, = eauo+,uo. Hence the power is

Now we suppose that there are n such sibships and the means and variances under H , and H , are the same for each sibship (this would not normally be true, particularly under H , , but, for the sake of convenience of discussion, we assume this for now). In this case the means and the variances are increased by a factor n under both hypotheses (with the usual assumption of independence of the sibships). In this case the power of the N-test becomes

The values of p, and ci are as given by Green & Low (1984), and these are functions of s, the number of affected siblings in a sibship. The derivation of p1 and CT: is much more complicated and is a function of m, the sibship size, s, and the probabilities f,,f,,f, and p , where f, = the probability that a sibling has the disease having inherited i disease genes (i = 0,1,2);p = the probability that a member of the general population inherits a disease gene (that is, the frequency of the disease gene).

N-test of haplotype concordance

333

This means that the power, ll,, is a function of (besides n ) the six parameters m, s,fz, f l , f o , p . The involved derivation of the distribution of N and H , is outlined in the Appendix for s = 2-5. The resulting values of p1 and u: for s = 2 = m are

SAMPLE SIZE NEEDED TO ATTAIN A CERTAIN POWER

If we test for haplotype concordance a t a significance level a,the power of the N-test for a sample of size n will be

If this is required to be 1 -p, then - e e , = e ,u2 - d n

(P1-Po)

u1

so that

n={(e,:+e)

g1

(P1-Po) u1

,

r.

As we have pointed out, po and noare functions of s, while p1 and u1 are functions of m, s, fz, f l , fo, p and we do not ordinarily know fi,f l , fo or p . Also ordinarily, m and s vary from one sibship to another. The values of a and p (or equivalently e, and el) are chosen by the person wishing to decide which sample size to use. It is appropriate for us to specify some of the other parameters in order to proceed. Many workers take fo = 00, as we shall do (that is, the probability of getting the disease in the absence of disease genes is zero). We shall suppose m = s = 2 for each family. We take p = 0.05, and a = 0.05 and 001 so that we seek the value of n to give a 95% power for an N-test at significance levels 5 and 1 percent. Table 1 shows the values of n rounded up to the next higher integer to give the required powers.

CONCLUSION

We have derived a complicated expression for the power of the N test of haplotype concordance, assuming approximate normality. From this we have derived an expression for the number of sibships to attain a certain power using the test a t a given significance level. This value of n, the required number of sibships, is a complicated function of the power, significance

J. R. GREENAND S. SHAH

334

Table 1. The number of sibships required to get a 95 YOpower of N-test for fo = 0.0 a = 005 The prevalence of disease : p = 0.01

a = 001 The prevalence of disease : p = 0.01 f2

0.9 0.8 0.7 0.6 0.5

f2

04 0.3

0 9 0 8 0.7 0 6 0.5

0 2 0-1

04 0 3

02 0 1

-

0.9 08 0.7 06

25 25 25 25 25 25 25

0.5

f,

0.4 0.3

25 25 25

20

21

25 25 25 25 25 25 22

3

3

3

2.5

25

25

24 24

02 0 1 00

25 25 25 25 2.5 23

25 25 25 25

25 25 25

24 24 3 3 3

f, 25

18 18 18 18 18 18 18 18 15 3

09 08 0.7 0.6 05 04 03 02

25

25

25

25

25

0 1

3

3

3

0'0

The prevalence of disease: p = 0.02

I

18 18 18 18 18 18 18 16 3

18 18 18 18 18 18 16 3

18 18 18 18 18 17 3

18 18 18 18 17 3

18 18 18 18 18 18 18 18 18 18 3

3

3

3

The prevalence of disease : p = 002

Ja

f2

0.9 0 8 0 7 0 6 0 5 04 0 3

0 9 0.8 0.7 0 6 0.5

0 2 0'1

0.4 0 3 0.2

01

~~~

27 28 28 28 28 28 0 3 27 0'2 26

09

08 0.7 06 0 5 04

fl

0'I 00

27 28 27 28 28 27 28 28 28 27 28 28 28 28 27 28 28 28 26 27 27 28 20 21 22 24 25 3 3 3 3 3

27 28 27 28 28 27 26 27 28 27 3 3 3 3

fl

0.9 08 07 06 0.5 0.4 0.3

20 20 20 20 20 20

0 2

19 19 20 20 14 1 5 16 17 18 19 3 3 3 3 3 3

0 1 00

The prevalence of disease: p = 003

20 20 20 20 20 20

20

20 20 20 20 20 20

20 20

20 20 20

20 20

20 20 20

20 20

20

3

3

3

f2

0.2 0 1

0 9 0.8 0.7 0 6 0.5

0.4 0 3

22 22 22 22 22 22 22 20

22 22 22

0 2 0.1

-

~

fl

20

The prevalence of disease : p = 003

f2

3.9 0 8 0 7 06 0.5 0 4 0 3

20 20

09 08 0.7 06 05 04

0.9 0.8

fl

07 06 0.5 04

03

03

0'2 0 1

02 0'I

00

0'0

22 22 22 22 22 22 21

14 1 5 3 3

22 22 22 22 22 21

22 22 22 22 22

22 22 22 22

16 17 18 19 3 3 3 3

22 22 21

22 22

22

3

3

3

level and parameters, s, m, f2, fl, fo and p . The first two of these variables we may choose ourselves; s and m we would know, but they may vary appreciably, while the other parameters may be unknown. Accordingly we have given tables of n (rounded up to the next higher integer), for /3 = 0.05, a = 0 0 5 and 001, for s = 2 = m and for a range of values of f2, fl, and p, taking fo = 0. Examining the tables we see that for a given value of a and p when fa = fl, the n values are constant, also when fl = 0 the n values are constant. The formula for n makes it clear how this comes about. However, we also see that in each table, when f2 fi and fi =!= 0 the n values are

+

N-test of haplotype concordance

335

approximately constant. This is very convenient. We see that t o get 95% power for f o = 0.0, we need a=OOFi a = 0 0 1 for p = 0.01, for f2 = f,: n = 18; n = 25, forfi = 0 :

n = 3;

n=3.

for p = 0.02, for f2 = f,: n = 20; n = 27, forf,=O:

n=3;

n=3.

for p = 0.03, for f2 = f,: n = 22; n = 30, forf,=O:

n=3;

n=4.

For simplicity, we can attain a t least the required power, whatever the value of fi and f,, if we use n = 25 for p = 001, 7~ = 28 for p = 0 0 2 , and n = 31 for p = 0-03. The same is true for a = 005 if we use n = 18 for p = 0.01, n = 20 for p = 0.02 and n = 22 for p = 0.03. Similar values of n were derived for some values of p and a not shown in our tables here. It was observed that the required n value increases as p increases or as a decreases. The values of n when f, = 0 seem t o be surprisingly small, but in these cases we have f, = fo = 0, and the two affected siblings must have come from parent type 1 or 2. Also for small p (which we have assumed), mostly parent type 2 will be involved. I n this case the siblings must share the same haplotype from the Dd parent, while the sharing from the other parent corresponds to random inheritance. Hence the expected value of N per sibship is approximately 3& with a variance of little over 0.25 under H , or H , , so that the standardized test criterion is approximately z / n . Thus n does not have to be large to achieve the required power. 8. Shah was supported by a grant from The Association of Commonwealth Universities in the United Kingdom.

REFERENCES

DE VRIES, R. R. P., LAI, A., FAT,R. F. M., NIJENHUIS,L. E. 6 VAN ROOD,J. J. (1976). HLA-linked genetic control of host response to Mycobacterium Leprae. Lancet ii, 1328-1330. GREEN, J. R. & GRENNAN, D. M. (1991).Testing for haplotype sharing by siblings with incomplete information of parental haplotypes. Annals of Human Genetics 55, 243-249. GREEN,J. R. & Low, H. C . (1984). The distribution of N and F measures of HLA haplotype concordance. Biometrics 40,341-348. GREEN,J. R., Low, H. C. & WOODROW, J. C. (1983). Inference on inheritance of disease using repetitions of HLA haplotypes in affected siblings. Annals of Human Genetics 47, 73-82. GREEN,J. R. & MONTASSER,M. (1988). HLA haplotype discordance. Biometrics 44, 941-950. GREEN,J. R. & WOODROW, J. C. (1977). Sibling method for detecting HLA-linked genes in disease. Tissue Antigens 9, 31-35. Low, HENGCHIN(1983).The inheritance of diseuse. Ph.D. Thesis, Liverpool University. SIDEBOTTOM, D., GRENNAN, I). M., GREEN, .J. R., SANDERS, P., OLLIVER, W. & DE LANCE,G. (1991).IgCH and D14S1 variants in rheumatoid arthritis linkage and association studies. Brit. J . Rheumtology 30, 167-172. SUAREZ, B. K . (1978).The affected sib-pair IBD for HLA-linked disease susceptibility genes. Tissue Antigens 12, 87-93. THOMSON, G. (1980).A two-locus model for juvenile diabetes. Annals of Human Genetics 43, 383-398. WEITKAMP, L.'R., STANCER, H. C., PERSAD, E., FLOOD, C. & GUTTORMSEN, S. (1981). Depressive disorders and H L A : a gene on chromosome 6 that can affect behaviour. New England Journal of Medicine 305, 1301-1306. ~

J. R. GREENAND S. SHAH

336

APPENDIX

The distribution of N under H, Table 2. The six parent types with their prior probability of being affected and probability of having one affected child i Parent Types:B, Prior Probab.: (t,) P(sib affected/B,): I,

I

2

DDDD P4

DD Dd 4p3(t-p)

fz

t(f*+fJ

3

4

DD dd 2pP(1-p)*

Dd Dd 4pB(1-p)' WZ+2fl+fO)

fl

6 Dd dd dd dd 4~(l-p)~ ( l - ~ ) ~ 5

t(fi+fLl)

fo

Here p is the frequency of the disease susceptibility gene, D represents the presence of the disease gene and d the absence of the gene. The posterior probability of each B,, for each i is given by

Therefore,

w,=

{ 1 - ( 1 - I,)" - ml,( 1 -li)"-1} ti Xi{ 1 - ( 1 -l,)"-ml*( 1 -I,)"-'} t, *

In the same way, However, we shall normally be concerned with P(B,I s 2 2). We need to obtain the distribution of N for each B,, and so get E(NI B,) and E ( N 2IBi). Then we derive the weighted average of these two expected values over all the parent types and from them get the overall mean and variance of N. For each parent type, N is the sum of the contributions from both parents and these are independent in every case except B,. The contribution of each parent is an H,(c) random variable (we will often drop the suffix 8). Here H ( c ) is the larger of K and s-K and K B(c,s) (Green & Low, 1984). For B,, B,, B,, N is the sum of two independent H(4) random variables, so that, under H , for these three parent types,

-

For B,,

E ( N ) = E{H(a)}+E

N-test of haplotype concordance

337

For B,,

+

E(N2) = E(HZ(8)) E

{

H2

bn$f)}+ -

2~{f-w+(j?j?)}

*

For B,, the situation is different, since the two contributions are no longer independent, given that the sibling is affected. We have

f +fz 1

+

E ( N )= 2 b b n 2fi +f,)}. But the derivation of E ( N 2 )needs special attention, which we consider later. The distribution of H(c), for 0 < c < 1, has been investigated by Heng Chin Low (1983). Putting T ; =1-c, obviously S S ( C ” ( C ) ~ - ” -2+cs--2 ( ~ }, f o r x = -2+ l , - +22 , ..., s ; seven s+l s+3 for x=-, 2 2 , ..., s ; s odd

P(H(c)= 2) =

S

for x = -; 2

s even

Considering particular s values and using direct derivation, we find

For s = 2

E{H(c)}= 2(c2+?)+1.2cc= 2(1-cC) E{H2(c)} = 4(c2+1)+ 1.2cE= 4-6cC

For s = 3

+ +

E{H(c)}= 3(c3 2 ) 2(3c2C+3 ~ 2=) 3(1-cC) E{HZ(c)}= 9(C3+2)+4(3CZF+3C?) = 3(3-5cC)

For s = 4

+

+

E{H(c)}= 4(c4+ P) 3(4c3F+4 ~ 2 ) 2(6c22)= 4( 1-CF- c’i?) E{H2(c)}= l6(c4+C’)+9(4c3F+4c2) +2(6c2$) = 4 ( 4 - 7 ~ T ; - 4 ~ ~ 1 )

For s = 6

+ +

+

E{H(c)}= 5(c5 c”) 4(5dF+ 5cP) 3(10c3?

+ 1 0 ~ ~=25() 1-CC-

~‘1)

E{H2(c)}= 25(c5+c”)+ 1 6 ( 5 ~ ~ C + 5 ~ ~ ) + 9 ( 1 O1c ~0 i~? +~=125-9cC)

The case of parent-type 4

+

For s = 2, we find by enumeration of cases, putting e =f2+2f1 fn, that

e2P(N= 4)=

f: + 2f; +f2,

e z W = 3) =

4f,(f0 +fJ

ezP(N= 2) =

2fzf0 2ff

Total

+

(fo+ 2fl + f J Z

13321

338

J. R. GREENAND S. SHAH

The power of the N-test of haplotype concordance.

The N-test of haplotype concordance among siblings affected by some disease under investigation is used to decide whether there is a disease susceptib...
381KB Sizes 0 Downloads 0 Views