MAIN PAPER (wileyonlinelibrary.com) DOI: 10.1002/pst.1654

Published online 22 October 2014 in Wiley Online Library

Sample size calculation for the one-sample log-rank test Jianrong Wu* In this paper, an exact variance of the one-sample log-rank test statistic is derived under the alternative hypothesis, and a sample size formula is proposed based on the derived exact variance. Simulation results showed that the proposed sample size formula provides adequate power to design a study to compare the survival of a single sample with that of a standard population. Copyright © 2014 John Wiley & Sons, Ltd. Keywords: counting process, one-sample log-rank test, time to event, sample size formula, standard population

1. INTRODUCTION

26

The two-sample log-rank test is a popular test statistic with which to compare the survival distributions between two treatment groups in a randomized trial. In some cases, it is also interesting to compare the survival of a single sample with that of a standard population. For example, in an epidemiological study in which the survival data for patients with a life-threatening disease have been prospectively collected, it may be of interest to know whether the survival of the study sample is as good as the demographically matched standard population. It is not appropriate to use the methods developed for a two-sample comparison to do this analysis, because the variance would be incorrectly calculated; thus, the p-value from a two-sample log-rank test would be invalid [1]. However, an analog test statistic called one-sample log-rank test can be used for such a comparison. The one-sample log-rank test was first introduced by Breslow [2]. Its asymptotic property has been studied by Hyde [3], Anderson et al. [4], and Gill and Ware [5]. The applications can be found in the works of Finkelstein et al. [1], Berry [6], and Woolson [7]. The study designs using the one-sample log-rank test are proposed by Finkelstein et al. [1]. Kwak and Jung [8], Jung [9], and Sun et al. [10] applied the one-sample log-rank test to phase II clinical trial designs. If a study is undertaken to determine if there is improvement in survival between a sample from the current study and a standard population, then the study must be carefully designed to ensure sufficient power to detect a specific difference in the survival distributions. For the study design purpose, Finkelstein et al. [1] gave a sample size formula. Kwak and Jung [8] recently developed an alternative sample size formula for the one-sample log-rank test to design single-arm phase II clinical trials. However, our simulation results (Section 3) showed that the sample sizes calculated from the formulas of both Finkelstein et al. and Kwak and Jung could be underestimated. Therefore, the study design based on their formulas tend to be underpowered, even when sample size is relatively large. To provide a study design with adequate power to compare the survival of a sample with that of a standard population, a new sample size formula is derived based on the exact variance

Pharmaceut. Statist. 2015, 14 26–33

of the one-sample log-rank test. The derived sample size formula gives power close to the nominal level; thus, it can be used to design studies to compare the survival outcome of a sample with that of a standard population. The rest of the paper is organized as follows. In Section 2, a new sample size formula for the one-sample log-rank test is derived based on the exact variance of the test statistic. Simulations are conducted to compare the empirical power based on three sample size formulas in Section 3. An example is given in Section 4. Concluding remarks are given in Section 5.

2. SAMPLE SIZE FORMULA If a new study is designed to compare the survival outcome data with that of the standard population, then the one-sample log-rank test can be used to design and make statistical inference for such a study. The study design based on the one-sample log-rank test has been discussed by Finkelstein et al. [1], Kwak and Jung [8], and Jung [9]. To introduce the one-sample log-rank test, let ƒ0 .x/ and S0 .x/ be the known cumulative hazard function and survival function, respectively, for the standard population, and let ƒ.x/ and S.x/ be the unknown cumulative hazard and survival functions for the new study. Then the study may consider the following hypothesis of interest: H0 : S.x/ 6 S0 .x/ vs S.x/ > S0 .x/, or equivalent to the hypothesis, in terms of cumulative hazard function, H0 : ƒ.x/ > ƒ0 .x/ vs ƒ.x/ < ƒ0 .x/, and an alternative ƒ.x/ D ƒ1 .x/.< ƒ0 .x// needs to be specified for the sample size or power calculation. Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA *Correspondence to: Department of Biostatistics, St. Jude Children’s Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105, USA. E-mail: [email protected]

Copyright © 2014 John Wiley & Sons, Ltd.

J. Wu Suppose during the accrual phase of the study, n subjects are enrolled. Let Ti and Ci denote the failure time and censoring time, respectively, of the ith subject. We assume that the failure time Ti and censoring time Ci are independent and fTi , Ci , i D 1, : : : , ng are independent and identically distributed. Then the observed failure time and failure indicator are Xi D Ti ^ Ci and i D I.Ti 6 Ci /, respectively, for the ith subject. On the Pnbasis of observed data fXi , i , i D 1, : : : , ng, we define O D iD1 i as the observed P number of events and E D niD1 ƒ0 .Xi / as the expected number of events (asymptotically), then the one-sample log-rank test is defined by OE LD p . E

n1

Pn

iD1 Yi .x/ uniformly converges to G.x/S1 .x/, so that

2

O !

Z

02

1

D

G.x/S1 .x/dƒ0 .x/, 0

where S1 .x/ D expfƒ1 .x/g is the survival distribution under the alternative. On the other hand, under thealternative,  W is p p 2   2 , where approximately normal with mean n! D n  1 0 R1 12 D 0 G.x/S1 .x/dƒ1 .x/. Assuming that ƒ1 .x/ and ƒ0 .x/ are close, Kwak and Jung calculate the variance of W under the alternative by N 2 D

Z

1

N N G.x/S.x/d ƒ.x/,

0

To study the asymptotic distribution of the one-sample log-rank test statistic, we formulate it using counting process notations [11]. Specifically, let Ni .x/ D i IfXi 6 xg and Yi .x/ D IfXi > xg be the failure and at-risk processes, respectively, then OD

n Z X

1

dNi .x/, E D

iD1 0

n Z X

N N N with ƒ.x/ D fƒ0 .x/ C ƒ1 .x/g=2 and S.x/ D expfƒ.x/g. Hence, the power of the study should satisfy the following equations: p p  n! W  n! 1ˇ D P.L < z1˛ jH1 / ' P < z1˛  jH1 0 0 p p   0 n! W  n! <  z1˛  jH1 DP N N N p   n! 0 , ' ˆ  z1˛  N N 

1

Yi .x/dƒ0 .x/.

iD1 0

Thus, the counting process formulation of the one-sample log-rank test is given by L D W=O ,

where ˆ.x/ is the standard normal distribution function. Thus, the sample size required for the study for the one-sample log-rank test is given by

where W D n1=2

n Z X

1

fdNi .x/  Yi .x/dƒ0 .x/g

iD1 0

nD

and O 2 D n1

n Z X

1

Yi .x/dƒ0 .x/.

iD1 0

P Under the null hypothesis H0 , n1 niD1 Yi .x/ ! G.x/S0 .x/, where G.x/ is the Rsurvival distribution of censoring time C. Thus, 1 O 2 converges to 0 G.x/S0 .x/dƒ0 .x/, which is the exact variance of W under the null (Appendix). Therefore, by central limit theorem, L is asymptotically standard normal distribution under the null hypothesis. Hence, we reject null hypothesis H0 with one-sided type I error rate ˛ if L D W=O < z1˛ , where z1˛ is the 100.1  ˛/ percentile of the standard normal distribution. To design the study, sample size must be calculated to detect a specified relative risk R, given the type I error rate ˛ and power 1  ˇ. Finkelstein et al. [1] provided a sample size formula for the one-sample log-rank test. Unfortunately, the formula was recorded in error, and they recently derived a revised version of it [12], which is given as follows: nD

p . Rz1˛ C Rz1ˇ /2 .R  1/2 p1

!2

.

(2)

However, our simulation results showed that the sample size calculated from both formulas (1) and (2) is underestimated. Thus, the study design based on formulas (1) and (2) could be underpowered. To correct the sample size calculation, we derived the exact variance of W in this paper (Appendix). Let the exact mean and p variance of W at alternative hypothesis be EH1 .W/ D n! and 2 VarH1 .W/ D  , respectively, then the power of the one-sample log-rank test L D W=O should satisfy the following equations: 1  ˇ D P.L < z1˛ jH1 / p p   0 W  n! n! <  z1˛  jH1 'P    p   n! 0 . ' ˆ  z1˛    Therefore, the required sample size is given by

,

(1) nD

.0 z1˛ C  z1ˇ /2 !2

,

(3)

where ! D 12  02 and  2 D p1  p21 C 2p00  p20  2p01 C 2p0 p1 , with p0 , p1 , p00 , and p01 given as follows (see Appendix for the derivation):

Copyright © 2014 John Wiley & Sons, Ltd.

27

where R is the relative risk and p1 is the probability of failure of a subject during the study (Appendix). Kwak and Jung [8] and Jung [9] recently proposed an optimal two-stage design with one-sample log-rank test. In their paper, they derived the following sample size formula. Under H1 ,

Pharmaceut. Statist. 2015, 14 26–33

.0 z1˛ C N z1ˇ /2

J. Wu

Table I. Sample size and simulated empirical type I error (˛) and power (1  ˇ) based on 100,000 simulation runs from the Weibull distribution with nominal type I error 0.05 and power of 90% (one-sided test). ı=1.2 

ı=1.3

ı=1.4

Formula

n

˛

1ˇ

n

˛

1ˇ

n

˛

1ˇ

0.1

(3) (2) (1)

534 525 510

0.048 0.047 0.048

0.903 0.901 0.893

269 263 252

0.046 0.046 0.046

0.906 0.902 0.888

169 166 157

0.044 0.046 0.045

0.907 0.901 0.885

0.25

(3) (2) (1)

492 479 466

0.047 0.047 0.047

0.904 0.896 0.890

247 239 230

0.045 0.046 0.047

0.906 0.895 0.886

156 150 143

0.046 0.045 0.044

0.907 0.897 0.884

0.5

(3) (2) (1)

432 415 405

0.047 0.046 0.046

0.905 0.895 0.888

217 206 199

0.046 0.044 0.046

0.907 0.894 0.885

137 129 123

0.046 0.043 0.045

0.909 0.894 0.881

Formula

n

0.1

(3) (2) (1)

121 118 110

0.045 0.044 0.043

0.908 0.900 0.893

93 91 84

0.044 0.044 0.044

0.909 0.904 0.888

75 73 67

0.043 0.042 0.043

0.911 0.902 0.885

0.25

(3) (2) (1)

111 107 100

0.044 0.044 0.044

0.912 0.899 0.880

85 82 76

0.045 0.044 0.044

0.908 0.900 0.879

69 66 61

0.042 0.043 0.042

0.911 0.898 0.877

0.5

(3) (2) (1)

97 91 86

0.044 0.043 0.044

0.912 0.893 0.876

75 70 65

0.042 0.042 0.043

0.913 0.895 0.874

60 56 52

0.043 0.041 0.042

0.910 0.893 0.871

Formula

n

ı=1.8 ˛ 1ˇ

n

ı=1.9 ˛ 1ˇ

n

0.1

(3) (2) (1)

63 62 56

0.041 0.044 0.042

0.911 0.908 0.878

54 53 48

0.042 0.042 0.041

0.911 0.905 0.878

47 47 42

0.041 0.042 0.041

0.909 0.909 0.877

0.25

(3) (2) (1)

58 56 51

0.042 0.041 0.041

0.913 0.903 0.878

50 48 43

0.042 0.042 0.041

0.915 0.902 0.874

44 42 32

0.041 0.042 0.040

0.915 0.904 0.875

0.5

(3) (2) (1)

50 47 43

0.041 0.041 0.042

0.912 0.894 0.871

43 40 37

0.041 0.041 0.041

0.913 0.895 0.871

38 35 32

0.041 0.040 0.040

0.915 0.894 0.868

Formula

n

ı=1.2 ˛ 1ˇ

n

ı=1.3 ˛ 1ˇ

n

1.0

(3) (2) (1)

356 330 325

0.047 0.047 0.047

0.907 0.889 0.885

178 161 157

0.045 0.045 0.045

0.909 883 0.877

112 99 96

0.044 0.045 0.045

0.912 0.881 0.872

2.0

(3) (2) (1)

306 269 267

0.046 0.045 0.045

0.910 0.878 0.874

153 128 127

0.043 0.043 0.045

0.915 0.869 0.867

97 77 76

0.042 0.042 0.040

0.922 0.861 0.857

5.0

(3) (2) (1)

288 247 247

0.046 0.046 0.046

0.912 0.873 0.873

144 116 116

0.044 0.043 0.043

0.917 0.860 0.860

91 69 69

0.042 0.042 0.042

0.925 0.850 0.850







ı=1.5 ˛ 1ˇ

n

ı=1.6 ˛ 1ˇ

n

ı=1.7 ˛ 1ˇ

ı=2.0 ˛ 1ˇ

ı=1.4 ˛ 1ˇ

28 Copyright © 2014 John Wiley & Sons, Ltd.

Pharmaceut. Statist. 2015, 14 26–33

J. Wu

Table I. Continued. 

ı=1.5 ˛ 1ˇ

1.0

(3) (2) (1)

80 69 66

0.043 0.043 0.041

0.916 0.878 0.863

61 52 50

0.042 0.042 0.042

0.916 0.874 0.863

49 42 39

0.041 0.041 0.039

0.919 0.880 0.854

2.0

(3) (2) (1)

69 52 51

0.042 0.040 0.040

0.927 0.850 0.840

53 39 38

0.040 0.038 0.038

0.929 0.849 0.839

43 30 29

0.040 0.038 0.036

0.934 0.838 0.827

5.0

(3) (2) (1)

65 46 46

0.040 0.039 0.039

0.930 0.836 0.836

50 34 34

0.039 0.038 0.038

0.935 0.832 0.832

40 26 26

0.040 0.037 0.037

0.937 0.819 0.819

Formula

n

ı=1.8 ˛ 1ˇ

n

ı=1.9 ˛ 1ˇ

n

1.0

(3) (2) (1)

41 35 32

0.040 0.039 0.038

0.921 0.878 0.850

35 29 27

0.040 0.037 0.038

0.921 0.870 0.847

31 26 24

0.040 0.038 0.039

0.925 0.878 0.853

2.0

(3) (2) (1)

36 24 24

0.038 0.037 0.036

0.938 0.829 0.827

31 20 20

0.038 0.036 0.036

0.940 0.819 0.820

27 17 17

0.038 0.035 0.034

0.942 0.813 0.814

5.0

(3) (2) (1)

34 21 21

0.040 0.034 0.034

0.943 0.814 0.814

29 17 17

0.038 0.034 0.034

0.945 0.796 0.796

25 14 14

0.036 0.032 0.032

0.943 0.782 0.782

Formula

n

ı=1.5 ˛ 1ˇ

n

ı=1.6 ˛ 1ˇ

n

1.0

(3) (2) (1)

80 69 66

0.043 0.043 0.041

0.916 0.878 0.863

61 52 50

0.042 0.042 0.042

0.916 0.874 0.863

49 42 39

0.041 0.041 0.039

0.919 0.880 0.854

2.0

(3) (2) (1)

69 52 51

0.042 0.040 0.040

0.927 0.850 0.840

53 39 38

0.040 0.038 0.038

0.929 0.849 0.839

43 30 29

0.040 0.038 0.036

0.934 0.838 0.827

5.0

(3) (2) (1)

65 46 46

0.040 0.039 0.039

0.930 0.836 0.836

50 34 34

0.039 0.038 0.038

0.935 0.832 0.832

40 26 26

0.040 0.037 0.037

0.937 0.819 0.819

Formula

n

ı=1.8 ˛ 1ˇ

n

ı=1.9 ˛ 1ˇ

n

1.0

(3) (2) (1)

41 35 32

0.040 0.039 0.038

0.921 0.878 0.850

35 29 27

0.040 0.037 0.038

0.921 0.870 0.847

31 26 24

0.040 0.038 0.039

0.925 0.878 0.853

2.0

(3) (2) (1)

36 24 24

0.038 0.037 0.036

0.938 0.829 0.827

31 20 20

0.038 0.036 0.036

0.940 0.819 0.820

27 17 17

0.038 0.035 0.034

0.942 0.813 0.814

5.0

(3) (2) (1)

34 21 21

0.040 0.034 0.034

0.943 0.814 0.814

29 17 17

0.038 0.034 0.034

0.945 0.796 0.796

25 14 14

0.036 0.032 0.032

0.943 0.782 0.782





n

ı=1.7 ˛ 1ˇ

n



n

ı=1.6 ˛ 1ˇ

Formula

ı=2.0 ˛ 1ˇ

ı=1.7 ˛ 1ˇ

ı=2.0 ˛ 1ˇ

29

Pharmaceut. Statist. 2015, 14 26–33

Copyright © 2014 John Wiley & Sons, Ltd.

J. Wu

Table II. Comparison between various variance estimates and simulated variance of the test statistic W based on 100,000 simulation runs from the Weibull distribution with sample size n D 100.  D 0.1

 D 0.25

ı

2

02

N 2

v2

2

02

N 2

v2

1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

0.521 0.513 0.503 0.493 0.482 0.470 0.459 0.448 0.437

0.467 0.440 0.416 0.395 0.376 0.358 0.342 0.327 0.314

0.499 0.487 0.476 0.467 0.458 0.451 0.444 0.438 0.432

0.525 0.513 0.504 0.491 0.481 0.472 0.458 0.450 0.441

0.577 0.573 0.566 0.558 0.549 0.539 0.529 0.518 0.508

0.510 0.482 0.458 0.435 0.415 0.396 0.379 0.363 0.349

0.544 0.531 0.520 0.510 0.501 0.493 0.486 0.480 0.474

0.580 0.573 0.566 0.558 0.554 0.539 0.529 0.519 0.509

ı

2

v2

2

02

N 2

v2

1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

0.683 0.689 0.690 0.689 0.685 0.680 0.673 0.666 0.657

0.621 0.609 0.597 0.587 0.578 0.570 0.562 0.556 0.549

0.681 0.689 0.692 0.689 0.681 0.679 0.678 0.663 0.654

0.910 0.956 0.996 1.028 1.055 1.077 1.095 1.109 1.120

0.733 0.707 0.683 0.659 0.637 0.616 0.597 0.578 0.560

0.763 0.752 0.742 0.733 0.725 0.718 0.711 0.705 0.699

0.911 0.959 0.992 1.033 1.056 1.082 1.094 1.112 1.117

ı

2

N 2

v2

2

02

N 2

v2

1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

1.215 1.370 1.527 1.686 1.846 2.006 2.166 2.326 2.484

0.904 0.899 0.894 0.890 0.886 0.883 0.880 0.877 0.874

1.225 1.370 1.535 1.675 1.861 2.010 2.174 2.322 2.504

1.368 1.590 1.827 2.079 2.347 2.630 2.928 3.241 3.570

0.964 0.960 0.957 0.953 0.950 0.947 0.944 0.940 0.937

0.968 0.967 0.965 0.964 0.963 0.962 0.961 0.960 0.959

1.369 1.581 1.824 2.080 2.330 2.629 2.940 3.246 3.593

 D 0.5 02 N 2 0.587 0.558 0.532 0.508 0.486 0.466 0.447 0.430 0.414 D2 02 0.890 0.878 0.866 0.854 0.842 0.831 0.819 0.808 0.797

D1

D5

Here 02 , N 2 ,  2 , and v 2 are the asymptotic variance, Kwak and Jung’s variance estimate, exact variance, and simulated variance, respectively. Z

1

p0 D

G.x/S1 .x/dƒ0 .x/,

on the hazard ratio ı, underlying survival distribution S0 .x/ of the standard population and censoring distribution G.x/.

G.x/S1 .x/dƒ1 .x/,

3. SIMULATION STUDY

G.x/S1 .x/ƒ0 .x/dƒ0 .x/,

To study the performance of three sample size formulas and the one-sample log-rank test, we conducted simulation studies to compare the power and type I error under different scenarios. In simulation studies, the survival distribution of the standard population was taken as the Weibull distribution Ä S0 .x/ D e log.2/.x=m0 / or cumulative hazard function ƒ0 .x/ D  log.2/.x=m0 / , with a known shape parameter  and median survival time m0 . Assume the cumulative hazard function at alternative is ƒ1 .x/ D log.2/.x=m1 / , where m1 > m0 . Therefore, the underlying Weibull model is a proportional hazards model with a

0

Z

1

p1 D 0

Z

1

p00 D 0

Z p01 D

1

G.x/S1 .x/ƒ0 .x/dƒ1 .x/. 0

30

If the underlying model is a proportional hazards model with hazard ratio ı, then the survival distribution and cumulative hazards function under the alternative are determined as S1 .x/ D ŒS0 .x/ı and ƒ1 .x/ D ıƒ0 .x/, respectively. Therefore, the quantities p0 , p1 , p00 , and p01 , thus the sample size, are dependent only

Copyright © 2014 John Wiley & Sons, Ltd.

Pharmaceut. Statist. 2015, 14 26–33

J. Wu

Pharmaceut. Statist. 2015, 14 26–33

4. AN EXAMPLE In this section, we illustrate a single-arm phase II survival trial design using the following example. Between January 1974 and May 1984, the Mayo Clinic conducted a double-blind randomized trial in primary biliary cirrhosis of the liver (PBC), comparing the drug D-penicillamine (DPCA) with a placebo [11, p. 2]. PBC is a rare but fatal chromic liver disease of unknown cause, with a prevalence of about 50 cases per million population. The primary pathologic event appears to be the destruction of interlobular bile ducts, which may be mediated by immunologic mechanisms. A total of 65 had died among 158 patients treated with DPCA. The median survival time was 9 years. Suppose a new treatment is now available and investigators want to design a single-arm phase II trial using Mayo Clinic patients treated with DPCA as the historical control group for the comparison. The survival distribution of DPCA data was estimated by the Kaplan–Meier method and the Weibull model (Figure 1). The Weibull distribution fitted well with a shape parameter  D 1.22 and median survival time m0 D 9. Thus, we can assume that the known survival distriÄ bution of the historical control group is S0 .x/ D e log.2/.x=m0 /  or the cumulative hazard function is ƒ0 .x/ D log.2/.x=m0 / , with m0 D 9 and  D 1.22. Assuming that the survival distribution of the new trial S1 .x/ satisfies the following proportional hazards model: S1 .x/ D ŒS0 .x/ı , where ı is the hazard ratio. Under this proportional hazards model, the survival distribution S1 .x/ is still the Weibull model with median m1 D m0 ı 1= and  D 1.22. The study aim is to test the following hypothesis: H0 : ı D 1 vs H0 : ı < 1 with significance level ˛ D 0.05 and power of 1  ˇ D 80% to detect an alternative hazard ratio ı D 1=1.75. We assume that subjects are recruited with a uniform distribution over the accrual period ta D 5 years and followed for a period of tf D 3 years and

0.6 0.4 0.0

0.2

Survival probability

0.8

1.0

DPCA data

0

2

4

6

8

10

12

Year from on study Figure 1. The step function is the Kaplan–Meier survival curve, and dotted lines are the 95% confidence boundaries. The solid curve is the fitted Weibull survival distribution function.

Copyright © 2014 John Wiley & Sons, Ltd.

31

hazard ratio ı D .m1 =m0 / . The parameter settings for the simulation studies were  D 0.1, 0.25, 0.5, 1, 2, and 5 to reflect cases of decreasing ( < 1), constant ( D 1), and increasing ( > 1) hazard functions. The hazard ratio ı under the alternative hypothesis was set to 1.2–2.0, with other parameters fixed as follows: m0 D 1, accrual period ta D 3, and follow-up time tf D 1. Note that the hazard ratio ı here is the reciprocal of the relative risk R defined by Finkelstein et al. [1, p. 1438]. We assumed that subjects were recruited with a uniform distribution over the accrual period ta and were followed for tf . We further assumed that no subject was lost to follow-up or dropout during the study. Then the censoring time is uniformly distributed on interval Œtf , ta C tf . Thus, under the Weibull model, quantities p0 , p1 , p00 , and p01 can be calculated by numerical integrations. Therefore, given the nominal significance level 0.05 and power of 90%, the required sample sizes for each design scenario were calculated based on three sample size formulas (1)–(3) and recorded in Table I. For each calculated sample size n, 100,000 samples were generated from the Weibull distribution to estimate empirical type I error and power of the one-sample log-rank test, which were also recorded in Table I. The sample size calculations showed that formulas (1) and (2) had sample sizes smaller than that calculated from formula (3). Simulation results showed that the empirical powers based on formula (3) were close to the nominal level and slightly large than the nominal level when sample sizes were small. The empirical powers based on Kwak and Jung’s formula (3) were also close to the nominal level when  < 1, but they became much smaller than the nominal level when  > 1. The empirical powers based on the Finkelstein et al. formula (1) were always less than the nominal level. The simulation results also showed that the one-sample log-rank test is conservative in the sense that its empirical type I errors were always less than the nominal level. To investigate the phenomenon observed from empirical power simulations, we conducted another simulation to compare the various variance estimates of W with its simulated variance v 2 under the alternative. The variance estimates, including the asymptotic variance 02 , Kwak and Jung’s estimate N 2 , and the exact variance  2 were used to derive the sample size formulas (1), (2), and (3), respectively. The simulations were performed under the same parameter configurations used previously with the sample size fixed at n D 100 and 100,000 simulation runs. The simulated results were recorded in Table II. The simulation results showed that the asymptotic variance 02 was always smaller than the exact variance  2 , while Kwak and Jung’s estimate N 2 was also small but close to the exact variance  2 when  < 1, and it became much smaller than that of the exact variance  2 when  > 1. The simulated variance v 2 and exact variance  2 were almost identical. This explains why sample sizes were underestimated based on formulas (1) and (2), because they used underestimated variances to derive the sample size formulas. Formula (3), which was based on exact variance, gave the correct power for the study design. Although the simulation results showed that the power was slightly overestimated based on formula (3) when sample sizes were small, this could be explained in two ways. First, the sample size calculated from the formula was rounded to a smallest integer greater than the calculated value, which resulted in a slightly increased power. Second, the sample size formula was derived based on the large sample asymptotic normal approximation. Thus, when the sample size was small, the approximation could be inaccurate and inflate the sample size or power.

J. Wu no subjects were lost to follow-up, then the censoring distribution G.x/ is a uniform distribution on Œtf , ta C tf ; that is, G.x/ D 1 if x 6 tf , D .  x/=ta if tf 6 x 6 , and D 0 otherwise, where  D ta C tf . Thus, the quantities p0 and p00 can be calculated by numerical integrations as follows: Z

tf

p0 D 0

Z p00 D

0

tf

1 ŒS0 .x/ œ0 .x/dx C ta ı

Z

C, respectively. Then, by exchange of integrations, we have EH0 ./ D Z

0

Z

0

D



ı

.  x/ŒS0 .x/ œ0 .x/dx, tf

1 ŒS0 .x/ı ƒ0 .x/œ0 .x/dx C ta

Z



Z 1 ƒ0 .x/dSX .x/ EH0 .ƒ0 .x// D  0 Z 1 SX .x/dƒ0 .x/ D 0 Z 1 G.x/S0 .x/dƒ0 .x/. D 0

Therefore, the mean of W under the null is EH0 .W/ D p nfEH0 ./  EH0 .ƒ0 .X//g D 0. By a similar calculation, we have

0

and Z 1 ƒ20 .x/dSX .x/ EH0 .ƒ20 .x// D  0 Z 1 SX .x/ƒ0 .x/dƒ0 .x/ D2 0 Z 1 G.x/S0 .x/ƒ0 .x/dƒ0 .x/. D2 0

So far, we have shown that EH0 ./ D EH0 .ƒ0 .x// and EH0 .ƒ20 .x// D 2EH0 .ƒ0 .X//. Therefore, VarH0 .W/ D EH0 .  ƒ0 .x//2

Acknowledgements

APPENDIX

1 Z y

 f0 .x/g.y/ƒ0 .x/dx dy 0 0  Z 1 Z 1 f0 .x/ƒ0 .x/ g. y/dy dx D 0 x Z 1 D G.x/S0 .x/ƒ0 .x/dƒ0 .x/ Z

EH0 .ƒ0 .X// D

5. CONCLUSION

The author gratefully acknowledges two anonymous reviewers for their valuable comments that improved an earlier version of the paper. This work was supported in part by the National Cancer Institute support grant CA21765 and ALSAC.

G.x/S0 .x/dƒ0 .x/.

Let SX .x/ be the survival distribution of X D T ^ C under the null, then SX .x/ D S0 .x/G.x/, and by integration by parts, we have

.1.650 C 0.84 /2 , !2

The exact variance of the one-sample log-rank test is derived, and a new sample size formula is proposed. Simulation results showed that the new sample size formula based on the exact variance gives an adequate power for the study design, while the sample size calculated from the formula of Finkelstein et al. or Kwak and Jung could be underestimated and result in an underpowered study. Our simulation results also showed that the one-sample log-rank test is conservative, particularly when sample size is small. On the other hand, to use the one-sample log-rank test to design a study and make inference, the underlying distribution or hazard function of the standard population has to be correctly specified, because both study design and inference depend on the validity of this assumption. Nevertheless, a new sample size formula is derived to provide a study design that ensures sufficient power to detect the differences in survival between a sample and a standard population.

x 1

0

where œ0 .x/ D  log.2/x 1 =m0 , ƒ0 .x/ D log.2/.x=m0 / , and Ä S0 .x/ D e log.2/.x=m0 / . The quantities p1 and p01 are given by p1 D ıp0 and p01 D ıp00 . Thus, the required sample sizes for the new trial is given by

where 02 D p0 , ! D 12  02 D p1  p0 , and  2 D p1  p21 C 2p00  p20  2p01 C 2p0 p1 . Hence, the required sample size n D 88, and the corresponding empirical power and type I error are 81% and 0.043, respectively. The R code for the sample size calculation is available upon request.



f0 .x/g.y/dx dy 0   Z 1 1 f0 .x/ g.y/dy dx

D

.  x/ŒS0 .x/ı ƒ0 .x/

tf

œ0 .x/dx,

nD

1 Z y

Z

D EH0 ./  2EH0 .ƒ0 .X/ C EH0 .ƒ20 .x// Z 1 G.x/S0 .x/dƒ0 .x/. D EH0 ./ D 0

Thus, O 2 D n1

n Z X

1

Yi .x/dƒ0 .x/

iD1 0

32

First, we calculate the mean and variance of W under the null hypothesis H0 by noting that EH0 .O/ D nEH0 ./ and EH0 .E/ D nEH0 .ƒ0 .X//. Let f0 .x/, S0 .x/, and ƒ0 .x/ be the density, survival, and cumulative hazard functions of failure time T under the null and g.x/ and G.x/ be the density and survival functions of censoring time

Copyright © 2014 John Wiley & Sons, Ltd.

is a consistent estimate of VarH0 .W/ under the null and W=O ! N.0, 1/. Now we derive the exact variance of W under the alternative. Let f1 .x/, S1 .x/, and ƒ1 .x/ be the density, survival, and cumulative hazard functions of failure time T under the alternative. Then by a

Pharmaceut. Statist. 2015, 14 26–33

J. Wu similar calculation, we have  Z 1 Z y EH1 ./ D f1 .x/g.y/dx dy 0 0  Z 1 Z 1 f1 .x/ g.y/dy dx D 0 x Z 1 D G.x/S1 .x/dƒ1 .x/ D p1 .D 12 /. 0

Let SX .x/ be the survival distribution of X D T ^ C under the alternative, then SX .x/ D G.x/S1 .x/, and by integration by parts, we have Z 1 EH1 .ƒ0 .x// D  ƒ0 .x/dSX .x/ 0 Z 1 SX .x/dƒ0 .x/ D 0 Z 1 G.x/S1 .x/dƒ0 .x/ D p0 .D 02 /. D 0

Thus, EH1 .W/ D Similarly, we have

p

nfEH1 ./  EH1 .ƒ0 .x//g D

p

n.12  02 /.

1 Z y

 f1 .x/g.y/ƒ0 .x/dx dy 0 0  Z 1 Z 1 f1 .x/ƒ0 .x/ g. y/dy dx D 0 x Z 1 D G.x/S1 .x/ƒ0 .x/dƒ1 .x/ D p01 Z

EH1 .ƒ0 .X// D

0

and Z 1 EH1 .ƒ20 .x// D  ƒ20 .x/dSX .x/ 0 Z 1 SX .x/ƒ0 .x/dƒ0 .x/ D2 0 Z 1 G.x/S1 .x/ƒ0 .x/dƒ0 .x/ D 2p00 . D2 0

Therefore, the exact variance of W under the alternative is given by VarH1 .W/ D VarH1 ./ C VarH1 .ƒ0 .X//  2CovH1 ., ƒ0 .X//

  D EH1 ./  EH2 1 ./ C EH1 ƒ20 .X/  EH2 1 .ƒ0 .X//  2EH1 .ƒ0 .X// C 2EH1 ./EH1 .ƒ0 .X// D p1  p21 C 2p00  p20  2p01 C 2p0 p1 D  2 . Because underpthe alternative H1 , O 2 is an consistent estimate of 02 , thus, W  n! ! N.0,  2 / and p n! 2 W ! N.0, / as n ! 1.  O 02 02

REFERENCES [1] Finkelstein DM, Muzikansky A, Schoenfeld DA. Comparing survival of a sample to that of a standard population. Journal of the National Cancer Institute 2003; 95:1434–1439. [2] Breslow NE. Analysis of survival data under the proportional hazards model. International Statistical Review 1975; 43:44–58. [3] Hyde J. Testing survival under right censoring and left truncation. Biometrika 1977; 64:225–230. [4] Anderson PK, Borgan O, Gill RD, Keiding N. Statistical Models Based on Counting Processes. Springer: New York, 1993. [5] Gill MH, Ware JH. Comparing observed life table data with a known survival curve in the presence of random censorship. Biometrics 1979; 35:385–391. [6] Berry G. The analysis of mortality by the subject-years method. Biometrics 1983; 29:173–184. [7] Woolson RF. Rank-tests and a one-sample log-rank test for comparing observed survival-data to a standard population. Biometrics 1981; 37:687–696. [8] Kwak M, Jung SH. Phase II clinical trials with time-to-event endpoints: optimal two-stage designs with one-sample log-rank test. Statistics in Medicine 2014; 33:2004–2016. [9] Jung SH. Randomized Phase II Cancer Clinical Trial. CRC Press: Chapman and Hall, 2013. [10] Sun X, Peng P, Tu D. Phase II cancer clinical trials with a one-sample log-rank test and its corrections based on the Edgeworth expansion. Contemporary Clinical Trials 2011; 32:108–113. [11] Fleming TR, Harrington DP. Counting Processes and Survival Analysis. John Wiley and Sons: New York, 1991. [12] Finkelstein DM, Muzikansky A, Schoenfeld DA. Correction of formula for sample size required for a study that will use one sample test. Available at: http://hedwig.mgh.harvard.edu/biostatistics/ node/30(Accessed 06.01.2014).

33

Pharmaceut. Statist. 2015, 14 26–33

Copyright © 2014 John Wiley & Sons, Ltd.

Sample size calculation for the one-sample log-rank test.

In this paper, an exact variance of the one-sample log-rank test statistic is derived under the alternative hypothesis, and a sample size formula is p...
167KB Sizes 6 Downloads 11 Views