STATISTICS IN MEDICINE. VOL. 11, 1607-1614 (1992)

CORRELATED BINOMIAL VARIATES: PROPERTIES OF ESTIMATOR OF INTRACLASS CORRELATION AND ITS EFFECT ON SAMPLE SIZE CALCULATION ZIDING FENG A N D JAMES E. GRIZZLE Cancer Preuention Research Program, Fred Hutchinson Cancer Research Center. I124 Columbia Streer, MP 702, Seattle, WA 98104, W.S.A.

SUMMARY In group randomized studies, the sample size calculations are complicated by within group (worksite, community, etc.) correlation. We compare by simulation the moment method and the more standard ANOVA method of estimating the intraclass correlation. We find the former is less biased for a small to moderate number of clusters but the difference disappears when the appropriate degree of freedom is used for the ANOVA estimator. We propose a simulation approach for sample size determination and illustrate it with an example.

1. INTRODUCTION In a group intervention study, the group (worksite, community, school, county, country, etc.) is the unit of randomization. This introduces the complication that individuals within the group may behave more alike than individuals who belong to different groups. This decreases the amount of information about the intervention effect provided by the response of each individual. In behaviour change oriented intervention programmes, the responses are often binary (smoking quit rate, smoking prevalence, etc.). Some problems in group randomized trials have been discussed by Cornfield’ and subsequently investigated further by Donner and The key difference from the usual situation in which one randomizes individuals is that we must adjust the variance of the estimator of the mean for the intraclass correlation p . Donner and Donald4 used an ANOVA estimator of p adapted from Fleiss.’ Moore and Tsiatis6 proposed a moment estimator based on the Pearson statistic. It is of interest to compare the two competing estimators. Jacobs et al.’ pointed out that the number of groups has more effect on reducing the variance of the estimator than the sample size within a group and they advocated that future publications include information about the intraclass correlation. Sample size determination depends on the intraclass correlation which is usually unknown. Estimates of p found in the literature may be subject to large sampling error. The common approach is to do a sensitivity analysis, but this does not attach a confidence statement to the result. Since the cost of conducting a group randomized trial is mainly driven by the number of groups required to achieve the desired power,’ a better understanding of the effect of the sampling distribution of on the sample size requirement is in order. Section 2 briefly describes some properties of the moment estimator of p. Section 3 compares the two estimators. A simulation approach to determine the sample size appears in Section 4 and Section 5 provides an example. 0277-6715/92/12 160748$09.00 0 1992 by John Wiley & Sons, Ltd.

Received January 1992 Revised March 1992

1608

Z. FENG AND J. E. GRIZZLE

2. AN ESTIMATOR OF p AND ITS PROPERTIES Suppose we observed in the ith group ( i = 1,. . . , k ) x i positive responses and n - xi negative responses. A simple extension of the quasi-likelihood approach to estimate overdispersion based on the generalized Pearson statistic, suggested by Cox and Snell,* leads to a direct estimator of the intraclass correlation coefficient:

where 6 = C x i / k n and P i = .xi/n. The estimation of p extends to cases with unequal group sizes and to the inclusion of covariates, but we can only obtain i, numerically in those cases. For example, in the case of unequal group sizes, we obtain i, by solving the equation:

Notice that ( 1 ) is the special case of the moment estimator suggested by Moore and Tsiatk6 when there are no covariates or link function involved, and the group sizes are equal. The estimator of p from (2) is not exactly the same as the estimator proposed by Moore and Tsiatis in that they estimate fi and i, simultaneously while we use fi = x i / c ni. Moore and Tsiatis's method might be more efficient but a limited simulation indicates that the performances of the two methods are similar in the k sample case and the addition of one more estimating equation in small samples can produce unreliable estimates with small probability. For more general cases, one can use the method of Moore and Tsiatis6 to estimate the regression parameters and intraclass correlation simultaneously. i, obtained from ( 1 ) has the following properties (with the proof in the Appendix):

1

1. E ( 3 ) = p + O(l/k). 2. The asymptotic variance of

(3) is:

for large k. 3. The asymptotic (1 - c ( ) 100 per cent confidence interval of p is:

for large k.

3. COMPARISON OF MOMENT ESTIMATOR AND ANOVA ESTIMATOR The ANOVA estimator has had wide use.4- This is produced by applying directly the ANOVA table of the mixed model for a continuous random variable to a corresponding binary random variable to estimate the components of variance. McCulloch9 discusses the weakness of this approach. Basically, the unequal variance of the binary response and the range restriction of proportions violate the assumptions of the linear model. Although in one-way or two-way ANOVA, the predicted values of a proportion are likely within (0, l), there is no warranty that the range of predicted values of proportions is within (0, 1) when one adds quantitative covariates.

1609

CORRELATED BINOMIAL VARIATES

It is informative to compare these two competing estimators by simulation in the simplest case: sampling from k groups, equal group size n, population proportion p , and true intraclass correlation p. The ANOVA estimator is BMS - WMS

pa = BMS - ( n - 1)WMS ' where k

k

1 xi(n - x i )

1 (xi - nB)' BMS = i = l

WMS

nk

=

i=

n(n - l)k

'

The divisor of BMS has k instead of k - 1. It is used by Fleiss5 and subsequently used by Donner and D ~ n a l dA. ~referee points out that it would be more fair and interesting to use k - 1 as in Donald and Donner" for comparison. Therefore, we use both in simulation. We carried out simulation on an IBM PS/2 model 70 using GAUSS, a mathematical programming language. For a given design configuration (n k p p), we generated x i from the beta-binomial distribution 1 B(a x i , p n - X i ) x i = o , 1,2,. . . , n , P(XJ = (7) n + 1 B ( X i + 1 , n - xi + l)B(a, p)'

+

+

with

W can easily obtain the simulated value of x i by comparing a uniform (0, 1) random variable generated from GAUSS to the cumulative distribution of the beta-binomial distribution. The justification for choosing the beta-binomial distribution is that it is the most widely used parametric model for correlated binary response' '-I4 and neither estimation method depends on any particular underlying distribution for its justification. We use the following parameter configurations in the simulations: k = 10, 30, 90; n = 30, 90, 270; p = 005, 0.20, 0.35; and p = 0.00,0.03, 0.06. These parameters cover the common range of group randomized trials. We simulated 500 replications for each configuration. Table I summarizes the mean and standard deviation of p from (1) and pa from (6). We averaged the means and standard deviations over three different values of p for p = 0.00and 003 since the results are similar. We can draw several conclusions from Table I:

1. Both the moment estimator and the two ANOVA estimators are consistent for k sufficiently large ( > 30), but the bias is much smaller for the moment method than for the ANOVA estimator from Fleiss' when k is small. For example, for a typical size of k = 10 and n = 30 in a community smoking intervention programme with a smoking prevalence of 20 per cent, fi has a mean bias of 3 per cent but fiahas a mean bias of 20 per cent from the true value of p = 0.06. The moment estimator and the ANOVA estimator by Donald and Donner" are similar. Therefore, the appropriate degree of freedom should be used even for k as large as 30. 2. The variances of the two estimators are similar. 3. For fixed n, k, bias increases when p increases or when p nears zero or 1. These limited simulations indicate that the moment estimator and the ANOVA estimator are similar for many circumstances. The moment estimator has more flexibility when one includes

1610

Z . FENG AND J. E. GRIZZLE

Table I. Comparison of mean (mean of SD) of 6 from moment and ANOVA methods for different configurations: n = 30.90.270; k = LO, 30. 90: p = 0, 0.03. 0-06. and p = 0-05. 020, 0.35 respectively; and 500 replications for each configuration. The entries are means over different p values except lor p = 006 k = 10

n=30

n=90

k n=270

n=30

p = 0.00

Moment

-

3E-7

(0.015)

ANOVA (Fleiss5)

-00035 (0.014)

0~000I 2 (0~0051) -00010 (OOM6)

4.4E-5 (0.0017) -

3.3E-4 (0.0015 )

0.00052 (0.0088) - 0o0065 (0.0085)

=

30

n=90

k=90 n=270

7.5E-5 - 2.5E-5 (0,0030) (OW095) 3,0E-4 - 1.5E-4 (0~00092) (0.0029)

n=30

OW015 (0.0052) - 000023

n=90

8.5E-5 (0.0016) - 4.OE-5

n=270

3.1E-5 (0.00056)

- l,lE-5

(OXQ48)

(0.0016)

(04XlO56)

(WW

p = 0.03

Moment ANOVA (Fleiss')

0.030 (0,031) 0.023 (0.028)

0.030 (0.020) 0.026 (0.018)

0.029 (0.016) 0.026 (0.015 )

0,055 (0.042) 0060 (0034) 0.06 1 (0.031)

0,054 (0.034) 0.06I (0.030) 0.059 10029)

0.029 (0.018)

0.027 (0.0 I 7)

0.030

0.030 (0,012) 0.029 (0.01 I )

0.030 (0.0093) 0.029 (0.0089)

0.030 (0.010) 0.030 (0.010)

0.030 (0~0065) 0.029 (O~OOSS)

0.030 (0.0053)

0.057 (0.025) 0.06I (0.019) OG50 (0.017)

Q059 (0424)

0.060 (0.020) (0,014) oc61 (0.013)

0.059 (0.014) 0.059 (0.011) 0.059 (0.010)

(0.014) 0.060 (0010) 0060 (0,0083)

0056 (0.023)

0.059 (0.020) 0.059 (0.014) 0.060 (0.012)

0.059 (0.014) 0.059 (0.011) 0.059 (0,094)

0060 (0.020) 0.060 (0.014)

0.059 (0,014) 0.059 (0.010) 0,059 (0.0 10)

p = 0.06

Moment p = 0.05

0.048

(0.046) 0.20 0.35

0.058 (0.041) 0,060 (0.041)

0.058

(0.032) 0060 (0,024) 0.062 (0.022)

0.060 (0-017)

0060 (0.015 )

0460

0a60

A N O V A (Fleiss5) p = 0.05

0.20 0.35

0.042

0.048

0.048

0.055

0.055

(0,060)

(0.038) 0.053 (0.03I ) 0.054 (0.0281

(0.03I ) 0.055 (0.027) 0.053 (0.026)

(0.031) 0057 (0023) 0059 10.022)

(0.024) 0.058 (0.019)

0.054 (0.034) 0.061 (0.030) 0.059 (0.029)

0058 (0.034)

0.057 (0.025) 006 1

0.048 (0.037) 0.05 I (0.037)

ANOVA (Donald and Donner") 0.047 0.054

p = 0.05

0.20 0.35

(0.045)

(0.041)

0.057 (0040) 0.059

0.059 (0,034) 0.06 I (0.03I )

(0.040)

0.060 (0,024) 0.062 (0022)

0.058 (0.0 16)

0.058

(0.016) 0.0 5 8 (0.014) 0.058 (0.024)

0,060

0.060 (0.0I 7) 0.060

(0.017)

(0.015)

(0.019)

0061

(0.013)

0.059

(0.014) 0.059 (0010) 0.060 (00082)

0.060 (0.013) 0.060 (0.010) 0.060

(OC084)

covariates and a link function in the model. Another advantage is that in the ANOVA method the approximate variance of is available only for p = 0 by (13.46) on p. 228 of Reference 5: var (fia)=2/kn(n - 1) under p = 0. From (4), var(fi)=2[1 (n - l)pI2/(k - l)(n - l ) 2 . When p = 0, the two formulae are similar but the latter applies to any value of p . Var(i3) agrees with values in Table I extremely well for the configurations used in the simulations. Var(i)) and confidence interval of p by ( 5 ) require equal ni. For nearly balanced designs, they should be adequate. For unbalanced designs such as the example in Section 5, a limited simulation shows that substitution of the average value of ni results in an underestimate of the variance of b. In these cases we recommend use of the simulation approach for variance and confidence interval for p as explained in Section 4.

+

4. SAMPLE SIZE ESTIMATION BY SIMULATION

To concentrate on the main points, we use the simplest design: 2k groups are randomized, k groups receive intervention and k groups are controls, and a random sample of size n from each

CORRELATED BINOMIAL VARIATES

1611

group is selected for evaluation. We assume that homogeneity of the proportions p , and p 2 is the hypothesis of interest. It is well known that the intraclass correlation generally increases the total sample size required to achieve a specified power. The variance of the mean of a sample of size nk is a; = [l + ( n - l ) p ] a 2 / n k .Some call the inflation factor 1 + ( n - 1)p the design effect. Following the usual sample size formula with appropriate substitution for 0 ; yields

for fixed k ,

(9)

where Z, and Z , are the critical values for the Gaussian distribution with tail probabilities 4 2 and jl, respectively, p is the intraclass correlation, IP1 - P 2 I

Ilogit(p1) - log(p2)l 1 arcsin(p:”) - arcsin(p;’’) 1

1ij

- Pj)

uj =

l/pj(l - pj)

for untransformed proportions for logit transformation for arcsin transformation,

for p for logit model for arcsin transformation,

wherej = 1,2 represent the control and treatment respectively. In some cases, the denominator of (9) can be negative. This occurs when the fixed k is too small and the study cannot achieve the desired power for any n. As noted above, group randomization introduces an additional parameter p into sample size calculation. Estimates of p , especially for binary data, are generally unavailable. Even when available, they-are usually poor since k is often small. One can investigate the effect of variation in i, on sample size by reporting sample size calculation for selected values of p. Such calculations, however, would not show how the underlying distribution ofi, for fixed N and K (where N and K are the sample sizes of the experiment that resulted in the p used in the current sample size calculation) affect the n and k for the experiment designed. We suggest an analogue of the bootstrap procedure. That is, simulate the entire process by assuming i, based on fixed N, K. One uses these values of i,, N and K as parameters to generate B beta-binomial experiments (or other n,*, distribution function as seems appropriate). For the bth simulation experiment, calculate and k f . The simulated distribution of n,* and k,* can provide guidance in designing the new experiment. For example, we might choose a 90 percentile of the simulated distribution of n,* and k t as the n and k required for the new experiment. Notice that this procedure has a close connection to the bootstrap method. The distribution of a,* based on B experiments is the simulated parametric bootstrap distribution of p if j is available from a previous experiment of size N, K and is estimated by the same formula. If i, is estimated by a different procedure, we can still apply the simulation method by estimating i, * from the corresponding procedure. The normal approximation to the observed proportions makes the size and the power of the test still approximate even if the distribution of i, is fully determined. The procedure is, therefore, not an exact bootstrap procedure unless we also estimate the distribution of the j.

a,*,

1612

Z. FENG AND J. E. GRIZZLE

The crucial parameter, p , is the ratio of the variances. Therefore, its variance is likely to dominate sample size calculations when the number of groups is small. Equations (4) and (5) are for large k and their small sample properties are unknown. The simulations in Section 3 indicate that the two common estimation methods have compatible variances. Therefore, use of either method does not eliminate the need for simulation. The distribution used to generate the samples can be any deemed reasonable to the researchers. 5. AN EXAMPLE

The data in this example were collected on 2458 GTE workers in 24 worksites in the Florida Tampa area as part of a survey. The number of respondents to the survey for each worksite ranged from 20 to 468 with mean 102. The response is smoking prevalence, which is 25 per cent averaged over the 24 worksites. In this example the intraclass correlation is only 0-0052 by (2), possibly due to the fact that those 24 worksites all belong to GTE and share some common characteristics. If this value is used for a new study with 12 control and 12 intervention worksites (the randomization unit is worksite since the intervention is at worksite level) and an equal number of 102 workers per worksite, with the desired difference of 5.8 per cent in smoking prevalence, the new study would have power 0 8 and size 0.05. In sensitivity analysis, one could substitute a sequence of values of p in (8) and obtain the required number of worksites for each treatment, k, to be 13.4, 15.8, 17.4 and 19.0 corresponding to p of 0.007, 0.010, 0.012 and 0.014 respectively. In practice, we should round up these numbers, but we keep the extra decimal places for comparison purposes. The central question is choice of the value of k to ensure the power of 80 per cent with strong confidence. We simulated the distribution of @ and the number of worksites required by (8) 500 times by using a beta-binomial distribution with p = 0.0052, p = 0.25, the actual number of workers N iin the ith worksite, and K = 24 worksites. The mean of simulated 8; is 0.0060 with SD 0.0059. The quantiles of :k worksites per treatment required to achieve 80 per cent power are 11.9, 13.0, 14.2, 15.1, 16.0, 17.0, 19.0,21.9 and 27.2 for 50,60, 70, 75, 80, 85,90, 95 and 99 per cent quantiles respectively. The interpretation is that if k = 12, approximately half of the time the power is lower than 0.80 owing to use of estimated p. Even if we double the value of p to 0.10, which amounts to k = 15.8 in sensitivity analysis, there is still about a 20-25 per cent chance that we should use a more conservative p to achieve the power of 80 per cent. Figure 1 is the histogram of the distribution of k:. For a fixed size of k, the variation of n: is much larger since the variance of 6 is mainly determined by k. It is worth noting that another simulation assuming equal number of workers per worksite ( N = 102) leads to a mean of simulated p of 0.0053 with SD of 0.0044. These are very close to @ defined by (2) and the SD by (4) which are 0-0052 and 0.0045 respectively. This supports the need for the simulation approach when K is small and N i are unbalanced. This example is more convincing in the sense that these 24 worksites are a quite homogeneous group. When the intraclass correlation becomes larger in sampling from a more heterogeneous population of groups, the effect demonstrated in this section should be more pronounced.

6. DISCUSSION The lack of independence of individual observations in a group randomized study introduces several complications in study design. Generally, the intraclass correlation is not well estimated owing to small K . Even with moderately large K , the inferential procedure for p is not reliable when p is near zero or one. A researcher should be aware of the effect of small K on the estimation of p. The choice of k, the number of groups to be randomized in each arm, needs careful planning.

1613

CORRELATED BINOMIAL VARIATES

C

u 0

5

10

15

I,20

25

30

35

I( Figure 1. Histogram of simulated distribution of k b from 500 simulations for a = 0.05 and power 1 - j = 0.8. K N ranged from 20 to 468 with mean 102, p, = 0.2525, p 2 = 01945, n = 102

= 24,

Although k is quite sensitive to p from our simulation, from (9) we readily see that n is so sensitive to p that the estimation of n is of little use when we substitute an inaccurate estimate for p. We believe that the usual practice of substituting estimates for the parameter values should entail extra caution when the sample size is sensitive to this parameter, and that instead of using a single value of 3, one should simulate the results of studies of the size that yielded the estimates. One can substitute the values calculated from each simulation into the formulae to generate a distribution of sample size. One can choose a point on the distribution of sample sizes that reflects the degree of conservativeness the analyst desires. This process is instructive about the uncertainty of sample size calculation in many circumstances and thus quite general. We recommend the simulation approach when K is less than 30 and group sizes are unbalanced. With an adequate PC and a well-chosen package such as GAUSS, the simulations require less than one hour. Interested readers can obtain the program written in GAUSS from the authors. APPENDIX: PROOF OF (3H5) Assume that the conditional proportion of group i, pi, follows a distribution G with mean p . Taking the expectation of the Taylor expansion of (1) leads to

E ( 8 )= p

+k-1

+ max(.(;),o(~)).

pz

+

p’(1 - p ) ’

1614

Z . FENG AND J. E. GRIZZLE

In the beta-binomial model, G is beta (a, b ) with a,p defined as in (7). Then E(p’)

= (r

a ( r + l ) ( r + 2) + b ) ( a + + l ) ( a + B + 2)’

Equations (4) and (5) follow directly from the asymptotic chi-squared distribution of the generalized Pearson goodness-of-fit statistic. ACKNOWLEDGEMENTS

The work was performed with the support of funds from the Statistical Methods for Disease Prevention Trials CA 53996-01 provided by NCI. The authors are indebted t o Dr. Mark Thornquist and Dr. Garnet Anderson for their thoughtful comments and Dr. Jill Varnes, University of Florida, College of Health and Human Performance, for access for the data used in Section 5.

REFERENCES 1. Cornfield, J. ‘Randomization by group: a formal analysis’, American Journal of Epidemiology, 108, 100-102 (1978). 2. Donner, A., Birkett, N. and Buck, C. ‘Randomization by cluster, sample size requirements and analysis’, American Journal of Epidemiology, 114(6), 906-914 (1981). 3. Donner, A. ‘An empirical study of cluster randomization’, International Journal of Epidemiology, 11(3), 283-286 (1982). 4. Donner, A. and Donald, A. ‘Analysis of data arising from a stratified design with the cluster as unit of randomization’, Statistics in Medicine, 6, 43-52 (1987). 5. Fleiss, J. L. Statistical Methodsfor Rates and Proportions, 2nd edn., Wiley, New York, 1981, pp. 226-228. 6. Moore, D. F. and Tsiatis, A. ‘Robust estimation of the variance in moment methods for extra-binomial and extra-Poisson variation’, Biornetrics, 47, 383-401 (1991). 7. Jacobs, D. R., Jeffrey, R. W. and Hannah, P. J. ‘Methodological issues in worksite health intervention research: 11. Comparison of variance in worksite data: unit of analysis’, 1988 Methodological Issues In Worksite Research Proceedings, 1988, pp. 77-88. 8. Cox, D. R. and Snell, E. J. The Analysis of Binary Data, Chapman and Hall, London, 1989. 9. McCulloch, C. E. ‘Random effects models for binary data applied to environmental/ecological studies’, technical report, Biometrics Unit, Cornell University, 1991. 10. Donald, A. and Donner, A. ‘Adjustments to the MantelLHaenszel chi-square statistic and odds-ratio variance estimator when the data are clustered’, Statistics in Medicine, 6, 491-499 (1987). 1 1 . Griffiths, D. A. ‘Maximum likelihood estimation for the beta-binomial distribution and an application to the household distribution of the total number of cases of a disease’, Biornetrics, 29,637-648 (1973). 12. Williams, D. A. ‘The analysis of binary response from toxicological experiments involving reproduction and teratogenicity’, Biometrics, 31, 949-952 (1975). 13. Crowder, M. J. ‘Beta-binomial ANOVA for proportions’, Applied Statistics, 27, 34-37 (1978). 14. Prentice, R. L. ‘Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors’, Journal of the American Statistical Association, 81, 321-327 (1986).

Correlated binomial variates: properties of estimator of intraclass correlation and its effect on sample size calculation.

In group randomized studies, the sample size calculations are complicated by within group (worksite, community, etc.) correlation. We compare by simul...
488KB Sizes 0 Downloads 0 Views