This article was downloaded by: [New York University] On: 06 July 2015, At: 17:45 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: 5 Howick Place, London, SW1P 1WG

Journal of Biopharmaceutical Statistics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/lbps20

Test Statistics and Confidence Intervals to Establish Noninferiority between Treatments with Ordinal Categorical Data a

b

c

Fanghong Zhang , Etsuo Miyaoka , Fuping Huang & Yutaka Tanaka

d

a

Biomedical Data Sciences Department, GlaxoSmithKline K.K., Tokyo, Japan b

Department of Mathematics, Tokyo University of Science, Tokyo, Japan c

Coppell, Texas, USA

d

Click for updates

Okayama University, Okayama, Japan Accepted author version posted online: 11 Jun 2014.Published online: 11 Jun 2014.

To cite this article: Fanghong Zhang, Etsuo Miyaoka, Fuping Huang & Yutaka Tanaka (2014): Test Statistics and Confidence Intervals to Establish Noninferiority between Treatments with Ordinal Categorical Data, Journal of Biopharmaceutical Statistics, DOI: 10.1080/10543406.2014.920865 To link to this article: http://dx.doi.org/10.1080/10543406.2014.920865

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Downloaded by [New York University] at 17:45 06 July 2015

Conditions of access and use can be found at http://www.tandfonline.com/page/termsand-conditions

Journal of Biopharmaceutical Statistics, 00: 1–18, 2015 Copyright © Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543406.2014.920865

TEST STATISTICS AND CONFIDENCE INTERVALS TO ESTABLISH NONINFERIORITY BETWEEN TREATMENTS WITH ORDINAL CATEGORICAL DATA Fanghong Zhang1, Etsuo Miyaoka2, Fuping Huang3, and Yutaka Tanaka4 1

Biomedical Data Sciences Department, GlaxoSmithKline K.K., Tokyo, Japan Department of Mathematics, Tokyo University of Science, Tokyo, Japan 3 Coppell, Texas, USA 4 Okayama University, Okayama, Japan

Downloaded by [New York University] at 17:45 06 July 2015

2

The problem for establishing noninferiority is discussed between a new treatment and a standard (control) treatment with ordinal categorical data. A measure of treatment effect is used and a method of specifying noninferiority margin for the measure is provided. Two Z-type test statistics are proposed where the estimation of variance is constructed under the shifted null hypothesis using U-statistics. Furthermore, the confidence interval and the sample size formula are given based on the proposed test statistics. The proposed procedure is applied to a dataset from a clinical trial. A simulation study is conducted to compare the performance of the proposed test statistics with that of the existing ones, and the results show that the proposed test statistics are better in terms of the deviation from nominal level and the power. Key Words: Noninferiority margin; Shifted null hypothesis; U-statistics; Wilcoxon–Mann–Whitney test; Z-type test.

1. INTRODUCTION The approach of testing a nonzero null hypothesis to establish equivalence/noninferiority between treatments was initially proposed by Dunnett and Gent (1977). The testing framework is used in designing a noninferiority trial without a placebo arm to show that a new treatment is not inferior to an active treatment (Blackwelder, 1982; 2002). In this framework, the prespecified amount in the nonzero null hypothesis can be written as  Δ0 , where Δ0 is called a noninferiority margin. The noninferiority margin should be chosen in such a way that if we reject the null hypothesis of noninferiority, then we can conclude that the new treatment is superior to the placebo (Wiens, 2002). To illustrate this in a parametric model setting, let μ1 , μ2 , and μ0 represent the means of the new treatment, the control treatment, and the placebo, respectively. We assume that lower values of the outcome measure correspond to favorable results. There are some Received June 14, 2013; Accepted February 17, 2014 Address correspondence to Fanghong Zhang, Biomedical Data Sciences Department, GlaxoSmithKline K.K., Tokyo 151–8566, Japan; E-mail: [email protected]

1

2

ZHANG ET AL.

historical placebo-controlled trials that showed the control treatment is superior to the placebo. Let Δ2 be a certain upper confidence limit of μ2  μ0 derived from the historical data, where Δ2  0. And that μ2  μ0 < Δ 2 : We can choose Δ0 ¼ γΔ2 ð0 < γ  1Þ as the noninferiority margin. So, the hypotheses for noninferiority testing in the parametric setting can be formulated as follows:

Downloaded by [New York University] at 17:45 06 July 2015

H0 : μ1  μ2  Δ0 vs: H1 : μ1  μ2 <  Δ0 ; If the alternative hypothesis μ1  μ2 <  Δ0 ¼ γΔ2 is claimed, we can conclude that μ1  μ0 < ð1  γÞΔ2  0. This means that the new treatment is superior to the placebo. On the other hand, for the confidence interval approach, assuming a lower value of the outcome measure is favorable, and the noninferiority of new treatment as compared to standard treatment will be established if the upper confidence limit of μ1  μ2 is less than  Δ0 . So, for a noninferiority trial to compare two independent groups, we must choose a measure of treatment effect and its noninferiority margin. In the case of a binary endpoint, several measures of treatment effect are available, notably such as the difference of proportions which is absolute measures, and the relative risk and the odds ratio which are relative measures. These measures and corresponding noninferiority margins are accepted by biostatisticians and their medical colleagues (Hilton, 2010). For an ordinal three-level categorical data, Brittain and Hu (2009) provided a measure of treatment effect for noninferiority trial design. There are some measures for general ordinal categorical data, but much work is needed to promote their usefulness (Newcombe, 2006). Let the clinical outcome be denoted by X1 for the new treatment and by X2 for the control treatment. There is a relationship PðX1 < X2 Þ þ PðX1 > X2 Þ þ PðX1 ¼ X2 Þ ¼ 1: If X1 and X2 are identically distributed, PðX1 < X2 Þ ¼ PðX1 > X2 Þ, and it follows that PðX1 < X2 Þ ¼

1 1  PðX1 ¼ X 2Þ: 2 2

The quantity PðX1 ¼ X2 Þ must be accounted for in comparing the two treatments because ties occur with positive probability for ordinal categorical data. Agresti (1980) proposed a ratio pA ¼ PðX1 > X2 Þ=PðX1 < X2 Þ to summarize the difference between treatments. It follows that pA ¼ 1 if X1 and X2 are identically distributed. Wellek and Hampel (1999) proposed a conditional probability pW ¼ PðX1 < X2 jX1 Þ X2 Þ, and pW ¼ 1=2 if X1 and X2 are identically distributed. Note that pA and pW are equivalent since pW ¼ 1=ð1 þ pA Þ. A more natural measure p1 for the treatment difference for ordered categorical data is given by some authors (see, for example, Halperin, Hamdy and Thall, 1989; Brunner and Munzel, 2000) as

TO ESTABLISH NONINFERIORITY FOR ORDINAL CATEGORICAL DATA

Downloaded by [New York University] at 17:45 06 July 2015

1 p1 ¼ PfX1 12, and in general, p1 can be interpreted as the probability that X1 has a tendency to be smaller value than X2. As it reduces to PðX1 < X2 Þ which corresponds to the Wilcoxon–Mann–Whitney statistic in continuous case, this parameter p1 is a generalization of the classical Wilcoxon–Mann–Whitney effect. Note that when X1 and X2 follow binomial distributions with parameters π 1 and π 2 , respectively, pA reduces to the odds ratio π 1 ð1  π 2 Þ=π 2 ð1  π 1 Þ, and it is immediately obtained that p1 ¼ 1=2 þ ðπ 1  π 2 Þ=2. For noninferiority with ordinal categorical data, Wellek and Hampel (1999) proposed a Wald-type test statistic based on the conditional probability pW. Lui and Chang (2013) proposed a test statistic based on the general odds ratio pA and gave a formula for sample size determination. Munzel and Hauschke (2003) proposed a rank test statistic based on p1 and gave a confidence interval for p1. But in Munzel and Hauschke’s test statistic, the estimator of the asymptotic variance is derived under the alternative hypothesis and is biased. In this article, we try to improve the test statistic by modifying Munzel and Hauschke’s approach in two instances. First, we estimate the asymptotic variance more precisely by using U-statistics. Second, we estimate the variance under the shifted null hypothesis for noninferiority. Furthermore, we give a confidence interval for p1 and provide a formula for sample size determination based on the proposed test statistic. In the remainder of this article, Section 2 formulates the testing procedure for the noninferiority briefly. In Sections 3 and 4, the estimator of the generalized Wilcoxon– Mann–Whitney effect and the estimators of its asymptotic variance are derived. In Section 5, the test statistic previously proposed by Munzel and Hauschke (2003) is reviewed, and two modified test statistics are proposed. Section 6 gives a confidence interval for p1 and a formula for sample size determination based on the proposed test statistic. Section 7 uses a clinical example to compare our method with Munzel and Hauschke’s in terms of confidence interval. A simulation study is performed in Section 8 to compare the proposed statistic with previously proposed statistics. Section 9 concludes with some discussions.

2. HYPOTHESIS FOR NONINFERIORITY TESTING Let X1 and X2 be mutually independent ordered categorical variables with a categories. Let π 1k ¼ PðX1 ¼ kÞ and π 2k ¼ PðX2 ¼ kÞ be the probabilities that category k, k ¼ 1;    ; a is observed in the new treatment group and in the control group, respectively. Let X1i and X2j denote a possible categories of outcome as Table 1, where i ¼ 1;    ; n1 ; j ¼ 1;    ; n2 . We use the probability p1 in (1) as the measure of treatment difference for ordinal categorical data. Assuming that smaller values of the outcome measure correspond to favorable results, a negative value of p1  1=2 implies that X1 has a tendency to larger values

4

ZHANG ET AL. Table 1 Ordered categorical data Treatment New Control Total

1

2

···

a

Total

n11 n21 m1

n12 n22 m2

··· ··· ···

n1a n2a ma

n1 n2 N

Downloaded by [New York University] at 17:45 06 July 2015

than X2 and that X1 is inferior to X2. Let  δ0 denote the extent of irrelevant inferiority; the nonparametric test problem for noninferiority is formulated as ( H0 : p1  p10 ¼ 12  δ0 (2) H1 : p1 > p10 ¼ 12  δ0 where 0  δ0 < 12 is the noninferiority margin. In Appendix A, we discuss the selection of noninferiority margin that ensures that rejecting the null hypothesis of noninferiority testing implies that the new treatment is superior to the placebo. 3. PARAMETER ESTIMATION Let ϕ be a function of two real variables given by ϕðx; yÞ ¼

8 y

(3)

A well-known unbiased and consistent estimator of the relative effect p1 in (1) is given by Pn1 Pn2 ^ p1 ¼

i¼1

j¼1

n1 n2

Uij

;

(4)

  where Uij ¼ ϕ X1i ; X2j . Let σ 211 be the variance of Uij and let σ 210 and σ 201 denote the covariances as follows. 8 2 < σ 11 ¼ V ðUij Þ; σ 2 ¼ CovðUij ; Uil Þ; j Þ l; : 210 σ 01 ¼ CovðUij ; Ukj Þ; i Þ k:

(5)

The variance of ^ p1 can be expressed as V ð^ p1 Þ ¼

 1  2 σ þ ðn2  1Þσ 210 þ ðn1  1Þσ 201 : n1 n2 11

(6)

σ 210 ¼ p2  p21 ; σ 201 ¼ p3  p21 ;

(7)

Notice that

      where p2 ¼ E Uij Uil , for j Þ l; p3 ¼ E Uij Ukj , for iÞk; p1 ¼ E Uij .

TO ESTABLISH NONINFERIORITY FOR ORDINAL CATEGORICAL DATA

5

The following limiting distribution for p^1 is well-known (see, for example, Lehmann, 1998, p. 364, Theorem 9; Brunner and Munzel, 2000). Let N ¼ n1 þ n2 and n1 =n2 ! λ as N ! 1, where λ Þ 0. T¼

pffiffiffiffi N ð^p1  p1 Þ

is asymptotically normally distributed with expectation 0 and variance 

σ 2N

Downloaded by [New York University] at 17:45 06 July 2015

Let Zσ2N ¼ T

 σ 210 σ 201 : ¼N þ n1 n2

(8)

.pffiffiffiffiffiffi σ 2N , then ^ p1  p1 p^1  p1 Zσ2N ¼ qffiffiffiffi2 ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi σ 210 σ 201 σN n1 þ n2 N

(9)

is asymptotically normally distributed with expectation 0 and variance 1. We now show that the two covariances in (7) can be expressed as variances of conditional expectations: σ 210 ¼ V ½E ðY jX1 Þ; σ 201 ¼ V ½EðY jX2 Þ; where Y ¼ ϕðX1 ; X2 Þ; ϕ is the index function defined in (3). From the definition of σ 210 in (5), σ 210 ¼ Cov½ϕðX1 ; X21 Þ; ϕðX1 ; X22 Þ. Since X21 and X22 are iid, ϕðX1 ; X21 Þ and ϕðX1 ; X22 Þ are conditionally independent and identically distributed given X1, then Cov½ϕðX1 ; X21 Þ; ϕðX1 ; X22 Þ ¼ Cov½E ðϕðX1 ; X21 ÞjX1 Þ; E ðϕðX1 ; X22 ÞjX1 Þ ¼ V ½EðY jX1 Þ: Therefore σ 210 ¼ V ½EðY jX1 Þ. Similarly σ 201 ¼ V ½E ðY jX2 Þ. And also p2 ¼ E ½EðY jX1 Þ2 ; p3 ¼ E ½EðY jX2 Þ2 : Further, let F  ðxÞ ¼ 12½PðX < xÞ þ PðX  xÞ. Straightforward computations show that the conditional expectations can be expressed by E ½Y jX1  ¼ 1  F2 ðX1 Þ; E½Y jX2  ¼ F1 ðX2 Þ: Then, the variances of conditional expectations can be expressed as σ 210 ¼      2  2 V 1  F2 ðX1 Þ ; σ 201 ¼ V F1 ðX2 Þ . And also p2 ¼ E 1  F2 ðX1 Þ ; p3 ¼ E F1 ðX2 Þ .   For the ordered categorical data with a categories, let Fg ðkÞ ¼ P Xg  k ¼ Pk π gl for k ¼ 1;    ; a and assume Fg ð0Þ ¼ 0; g ¼ 1; 2. It follows Fg ðkÞ ¼  l¼1  Fg ðk  1Þ þ Fg ðkÞ =2; g ¼ 1; 2; k ¼ 1;    ; a. Then

6

ZHANG ET AL.

8 > > p1 > > > > < p2 > > > > > > : p3

¼

a P

π 2k

h

;

i2 2 ðkÞ π 1k 1  F2 ðk1ÞþF ; 2 k¼1 h i a 2 P 1 ðkÞ ¼ π 2k F1 ðk1ÞþF : 2 ¼

k¼1 a P

h

i

F1 ðk1ÞþF1 ðkÞ 2

(10)

k¼1

Downloaded by [New York University] at 17:45 06 July 2015

So σ 210 ¼ p2  p21 and σ 201 ¼ p3  p21 can be expressed by the probability π gk ; g ¼ 1; 2; k ¼ 1;    ; a. Notice that when p1 ¼ 1=2, or F1 ¼ F2, it can be derived that σ 210 ¼   Pa 2 1 3 σ 01 ¼ 12 1  k¼1 π k , where π k ¼ PðX ¼ kÞ is the probability that category k, k ¼ 1;    ; a is observed (Barnard, 1990). 4. ESTIMATION OF VARIANCES 4.1. Maximum Likelihood Estimator In Table 1, ngk denotes the count for (g, k)th cell with g ¼ 1; 2 and k ¼ 1;    ; a. Consider a product multinomial model where ðπ 11 ;    ; π 1a Þ and ðπ 21 ;    ; π 2a Þ are the parameters of the two multinomial distributions. Then π^gk ¼ ngk =ng is the maximum likelihood estimator (MLE) of π gk . P ^g ð0Þ ¼ 0; g ¼ 1; 2. The MLEs σ^2 and σ^2 can be ^g ðkÞ ¼ k π^gl , and F Let F 10 01 l¼1 ^g ðkÞ in (10) respecobtained when π gk and Fg(k) are replaced by their MLEs π^gk and F tively, g ¼ 1; 2; k ¼ 1;    ; a. 2 2 It is shown that the MLEs σ^10 and σ^01 can be obtained too using the delta method. 2 2 The MLEs σ^10 and σ^01 are consistent but are biased. 4.2. An Approximately Unbiased Estimator In this subsection, approximately unbiased estimators of σ 210 ¼ p2  p21 and σ 201 ¼ p3  p21 that are defined in (7) are derived. For p1 in (1), the estimator ^p1 in (4) is unbiased and consistent. The estimators of p2 and p3 are given by " # n1 X n2 X n2 n1 n2 X X X 1 1 2 2 ~2 ¼ Uij Uil ¼ Ui  Uij p n1 n2 ðn2  1Þ i¼1 j¼1 lÞj n1 n2 ðn2  1Þ i¼1 j¼1

(11)

" # n1 X n1 X n1 n2 n1 X X X 1 1 2 2 ~3 ¼ Uij Ukj ¼ Uj  Uij p n2 n1 ðn1  1Þ j¼1 i¼1 kÞi n2 n1 ðn1  1Þ j¼1 i¼1

(12)

where Ui ¼

n2 X j¼1

Uij ; Uj ¼

n1 X i¼1

~ p3 as U-statistics are unbiased and consistent. p2 and ~

Uij :

TO ESTABLISH NONINFERIORITY FOR ORDINAL CATEGORICAL DATA

7

We obtain the relationship between the MLEs and the unbiased estimators for p2 and p3 in (11) and (12). (

~ p2  n211 ð^ p1  ^p2 Þ þ 4ðn211Þ ^p0; p2 ¼ ^ 1 ^3  n1 1 ð^ ~ p1  ^p3 Þ þ 4ðn111Þ ^p0; p3 ¼ p

(13)

Downloaded by [New York University] at 17:45 06 July 2015

Pa where p0 ¼ PðX1 ¼ X2 Þ ¼ . k¼1 π 1k π 2k is the probability of two treatments being equal, P P P 2 and ^ p0 ¼ 4 i j ðUij  U ij Þ n1 n2 ¼ ak¼1 π^1k π^2k . Using (10) and (13), we can obtain the unbiased estimator of p2 and p3 for the ordered categorical data. Now we seek an approximately unbiased estimator of p21 , or equivalently, of σ 200 ¼ p1 ð1  p1 Þ: An approximately unbiased estimator of σ 200 is provided in Halperin et al. (1989). The quantity of σ 211 in (6) is approximated by using the quantity σ 200, since σ 211 ¼ p1 ð1  p1 Þ  p0 =4 with a small p0. So Eð^ p21 Þ ¼ V ð^ p1 Þ þ ðE^ p1 Þ 2 

p1 ð1  p1 Þ þ ðn2  1Þðp2  p21 Þ þ ðn1  1Þðp3  p21 Þ þ p21 : n1 n 2

An approximately unbiased estimator of σ 200 is provided by 2 σ~00 ¼

n1 n2 ð^ p1  ^ p21 Þ  ðn2  1Þð^ p1  ~p2 Þ  ðn1  1Þð^p1  ~p3 Þ ðn1  1Þðn2  1Þ

(14)

Since σ 200 ¼ p1  p2 þ σ 210 ¼ p1  p3 þ σ 201 , the approximately unbiased estimators of σ 210 and σ 201 can be provided by 2 σ~10 ¼

n1 n2 ð^ p1  ^ p21 Þ  n1 ðn2  1Þð^ p1  p~2 Þ  ðn1  1Þð^p1  ~p3 Þ ; ðn1  1Þðn2  1Þ

(15)

2 σ~01 ¼

n1 n2 ð^ p1  ^ p21 Þ  ðn2  1Þð^ p1  ~p2 Þ  ðn1  1Þn2 ð^p1  ~p3 Þ : ðn1  1Þðn2  1Þ

(16)

2 2 ~3 are unbiased and consistent, σ~10 Since ^ p1 ; ~ p2 and p and σ~01 are approximately unbiased 2 2 and consistent. The expressions of σ~10 and σ~01 are a bit complex but provide more precise estimators which could be preferable for small sample sizes.

5. CONSTRUCTION OF TEST STATISTICS For testing the hypotheses (2) by using Zσ2N in (9), it is critical to estimate the variance σ 2N . Mainly, there are two possible methods to estimate the variance σ 2N : (i) under the alternative hypothesis; (ii) under the null hypothesis. For method (i), Munzel and Hauschke (2003) use the following test statistic,

8

ZHANG ET AL.

ZM ¼

^ p1  p10 qffiffiffiffi2 ; σ^N N

(17)

2 2 where σ^N2 can be calculated using the MLEs σ^10 and σ^01 . We consider a pivotal quantity

^ p1  p1 ZP ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; σ^N2 =N p1 ð1  p1 Þ σ^2

(18)

Downloaded by [New York University] at 17:45 06 July 2015

00

2 2 2 ^1 ð 1  p ^1 Þ; σ^N2 can be calculated using the MLEs σ^10 where σ^00 ¼p and σ^01 . As described in Appendix B, ZP has an asymptotically standard normal distribution. Now consider a rejection region

f^ p1 > cg; for the composite hypothesis (2), where c is a constant, c > p10 . Let 2 2 σ^10 σ^01 σ^N2 N n1 þ n2 ; ^N ¼ 2 ¼ 2 σ^00 σ^00

(19)

under the null hypothesis, then the Type I error rate is p1 > cjp1  p10 g αðp1 Þ ¼ Pf^ ( ) c  p1 ¼ P ZP > pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi jp1  p10 ^N p1 ð1  p1 Þ ! c  p1 ¼ 1  Φ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; ^N p1 ð1  p1 Þ where ΦðÞ is the standard normal distribution function. It is easy to show that c  p1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ^N p1 ð1  p1 Þ is a monotonic decreasing function of p1, so that αðp1 Þ is monotonic increasing function of p1. The maximum of the Type I error is obtained at p1 ¼ p10 . So test statistic ^ p1  p10 ZPE ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; 2 σ^N =N p10 ð1  p10 Þ σ^2

(20)

00

can be used to test the hypothesis (2) with a significant level of α Using an approximately unbiased estimator on the basis of U-statistics, instead of the MLEs, for the variance, we propose another test statistic

TO ESTABLISH NONINFERIORITY FOR ORDINAL CATEGORICAL DATA

^ p1  p10 ZPU ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; 2 σ~N =N p10 ð1  p10 Þ σ~2

9

(21)

00

Downloaded by [New York University] at 17:45 06 July 2015

2 2 can be calculated using (14) and σ~N2 can be calculated using σ~10 in (15) where σ~00 2 in (16). and σ~01 Note that the variance is estimated under the shifted null hypothesis in (20) and (21). As Lachin (2000) says that the method (ii) is preferable because the Type I error is closer to the nominal level α, it is expected that Type I error of test statistics (20) and (21) is closer to the nominal level than Munzel and Hauschke’ test statistic (17). We will compare the performance of the two methods in Section 8. If we adopt the standard error of ^ p1 using variance estimation in Wilcoxon rank sum test under the hypothesis p1 ¼ 1=2, the following test statistic can be derived

^ p1  p10 ZW ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

ffi : a   P 3 mk N 1 n1 n2 12 1  N

(22)

k¼1

6. CONFIDENCE INTERVAL FOR P1 AND SAMPLE SIZE DETERMINATION We now derive a confidence interval for p1 and a formula for sample size determination based on the proposed test statistics ZPE and ZPU in Section 5. For the problem of obtaining confidence interval for a binomial parameter π, the well-known Wilson’s score method is much better in view of achieved coverage probability to its nominal confidence level (Agresti and Coull, 1998; Newcombe, 1998). Wilson’s confidence interval is based on inverting the score test for the null hypothesis H0 : π ¼ π 0 , where the null, rather than estimated, standard error is used. Its limits are obtained by solving the equation of π 0 ð^ π  π0Þ

.pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi π 0 ð1  π 0 Þ=n ¼ zα=2 ;

(23)

where zα=2 denotes the upper α=2point of the standard normal distribution. Similarly, for the parameter p1, the procedure of inverting the approximately normal test with shifted null rather than estimated standard error is desirable. The proposed test statistics ZPE and ZPU are with the shifted null standard error, and the confidence intervals based on inveting the proposed test statistic ZPE and ZPU will be superior to the confidence interval based on Munzel and Hauschke’ test ZM, using estimated standard error. A two-sided 100ð1  αÞ% confidence interval can be obtained by solving the equation for p10, ZPE ¼ zα=2 , also ^ p1  p10 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ zα=2 : σ^N2 =N p10 ð1  p10 Þ σ^2 00

10

ZHANG ET AL.

It can be written as ^ p1  p10 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ zα=2 ; ^N p10 ð1  p10 Þ

(24)

where ^N is given in (19). Equation (24) for p1 has a similar form as of Equation (23) for π 0 can be solved and the confidence interval for p1 is . qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ^ p1 þ ^N z2α=2 2 ^N z2a=2 ^ p1 ð1  ^p1 Þ þ ð^ N z2a=2 Þ2 = 4

Downloaded by [New York University] at 17:45 06 July 2015

1 þ ^N z2α=2

:

(25)

It is also easy to confirm that (25) is proper in that it always fall in [0, 1]. One alternative can be obtained by using the test statistic ZPU, replacing ^N by ~N in σ~2 =N the Equation (24), where ~N ¼ Nσ~2 . 00

Now we consider the problem of sample size determination for the following hypothesis H0 : p1 ¼ p10 ; H1 : p1 ¼ p11 ; where p10 ¼ 1=2  δ0 is determined by the margin probability δ0, and p11 is a postulation for new treatment efficacy. We seek how many subjects are needed in order to achieve a power 1  β to reject the null hypothesis H0 in favor of the alternative hypothesis H1 with significance level α. Based on the pivotal quantity ZP in (18), we obtain the following equation for sample size: ðp11  p10 Þ2 ¼

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 σ 210 =n1 þ σ 201 =n2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi zα p10 ð1  p10 Þ þ zβ p11 ð1  p11 Þ ; p11 ð1  p11 Þ

where zα denotes the upper α  point of the standard normal distribution.

7. A CLINICAL EXAMPLE In this section, we apply the test statistics ZPE and ZPU to the clinical trail considered by Munzel and Hauschke (2003). The clinical outcome of each patient was assessed and classified as much improved ðscore  2Þ, improved ðscore  1Þ, no change (score 0), worse (score 1) or much worse (score 2). The results for the 219 patients are listed in Table 2. The estimate of the parameter p1 in (1) is ^p1 ¼ 0:54423. The MLEs of the two 2 2 ¼ 0:091952 and σ^01 ¼ 0:060760, which are slightly different covariances in (7) are σ^10 2 from the corresponding quantities σ^1 ¼ 0:093 and σ^22 ¼ 0:061 in Munzel and Hauschke 2 2

n1 =ðn1  1Þ ¼ σ^12 and σ^01

n2 =ðn2  1Þ ¼ σ^22 . (2003). In fact, there are relations σ^10 Therefore σ^N2 ¼ 0:30701 from (8) while σ^2 ¼ 0:310 in Munzel and Hauschke (2003). At 2 the same time, σ^00 ¼ 0:24804. Hence, for the selected noninferiority margin δ0 ¼ 0:20, the value for the proposed test statistic (20) is ZPE ¼ 7:08913 against ZM ¼ 6:52286 for Munzel and Hauschke’s statistic (17). The value of the original test statistic is W ¼ 6:492 in Munzel and Hauschke (2003). In addition, the value of the proposed statistic (21) is ZPU ¼ 7:08987, ZW ¼ 6:53487 for the statistic (22).

TO ESTABLISH NONINFERIORITY FOR ORDINAL CATEGORICAL DATA

11

Table 2 A randomized trial on acute rheumatoid arthritis Treatment New Active control

Much improved

Improved

No change

Worse

Much worse

Total

24 11

37 51

21 22

19 21

6 7

107 112

Downloaded by [New York University] at 17:45 06 July 2015

From (25), we obtained a two-sided 95% confidence interval for p1 as (0.47068, 0.61589), while the Munzel and Hauschke’s method would give an interval as (0.47084, 0.61761). Our result is tighter than Munzel and Hauschke’s.

8. SIMULATION RESULTS In this section, the performance of the noninferiority test statistics ZM, ZW, ZPE, and ZPU in Section 5 will be compared for ordered categorical data through simulation. The simulation is conducted in SAS® 8.2 with the function RANTBL for generating a random number with the category probabilities. The performance of the noninferiority test statistics are compared in terms of their actual Type I error and powers for various sample sizes. We consider a one-sided test with a significant level of 0.025 for several cases of the noninferiority null hypotheses of H0 : p10 ¼ 0:45, 0.40, 0.35, 0.3 corresponding to the noninferiority margin δ0 ¼ 0:05, 0.1, 0.15, 0.2. The Type I error is calculated using the generated random numbers under the noninferiority null hypotheses with two equal sample sizes ranging from 9 to 120. The sample size increment is 3. And the powers are calculated under the alternative hypotheses of p1 ¼ 0:5 and p1 ¼ 0:55 with the appropriate ranging for the two equal sample sizes. We only give some results for a simple case with three categories. We consider X1 has three categories with equal probabilities, π 11 ¼ π 12 ¼ π 13 ¼ 13. To obtain the category probabilities π 21 ; π 22 , and π 23 given p1, we use the same method as Halperin et al. (1989). We consider the categories that are discretizations of an exponential variable. Let F1 ðxÞ ¼ 1  expðxÞ, F2 ðxÞ ¼ 1  expðλxÞ and c1, c2 are the cutoff values then π 11 ¼ F1 ðc1 Þ; π 11 þ π 12 ¼ F1 ðc2 Þ; π 21 ¼ F2 ðc1 Þ; π 21 þ π 22 ¼ F2 ðc2 Þ: So, c1 ¼ logð23Þ; c2 ¼ log ð13Þ and π 21 ¼ 1 

λ

λ λ

λ 2 2 1 1 ; π 22 ¼  ; π 23 ¼ : 3 3 3 3

At the same time, λ is obtained by solving the following nonlinear equation about λ when given p1 3p1 

1 ¼ 2

λ λ 2 1 þ : 3 3

The values of λ and the probabilities of the three categories of X2 for p1 ¼ 0:45, 0.40, 0.35, and 0.3 are listed in Table 3.

12

ZHANG ET AL.

Table 3 The values of λ and the probabilities of the three categories of X2 for each p1 = = = =

λ λ λ λ

0.45 0.40 0.35 0.30

= = = =

π21 π21 π21 π21

1.26092 1.58792 2.018502 2.62934

= = = =

π22 π22 π22 π22

0.40026 0.47473 0.55888 0.65565

= = = =

π23 π23 π23 π23

0.34948 0.35054 0.33225 0.28869

= = = =

0.25026 0.17473 0.10888 0.055653

Each case of the following results is based on 100,000 simulations. Figure 1 compares the actual Type I error of the statistics ZM and ZPE for four noninferiority null hypotheses p10 ¼ 0:45, 0.40, 0.35, and 0.3. We did not provide the results for statistic ZPU because the results are very similar to ZPE using the simulation data. From Figure 1, the proposed statistic ZPE inflate the Type I error somewhat, but it is closer to the nominal significant level for n1 ¼ n2  30. On the other hand, ZM is somewhat conservative. The proposed statistic is better than that of ZM in terms of deviation from the nominal level, specially for the cases of p10 ¼ 0:35 and p10 ¼ 0:30. For the case p10 ¼ 0:45, these statistics have the same behavior. We found that ZW is the most conservative among ZW, ZM, ZPE, and ZPU. Null Hypothesis: p_10 = 0.40

0.035

Z_M Z_PE

0.015

0.025

Type I Error Rate

0.025

0.035

Z_M Z_PE

0.015 20

40

60

80

100

120

20

40

60

80

100

Sample Size per Group

Null Hypothesis: p_10 = 0.35

Null Hypothesis: p_10 = 0.30

Type I Error Rate

Z_M Z_PE

0.015

0.015

0.025

0.035

Z_M Z_PE

0.035

0.045

0.045

Sample Size per Group

120

0.025

Type I Error Rate

0.045

0.045

Null Hypothesis: p_10 = 0.45

Type I Error Rate

Downloaded by [New York University] at 17:45 06 July 2015

p1 p1 p1 p1

20

40

60

80

100

Sample Size per Group

120

20

40

60

80

100

120

Sample Size per Group

Figure 1 Comparison of the actual Type I error rates based on 100,000 simulations. Horizontal dotted line indicates the nominal level of 2.5%.

TO ESTABLISH NONINFERIORITY FOR ORDINAL CATEGORICAL DATA p_1 = 0.5, p_10 = 0.40 0.90

100

110

0.60

120

30

40

50

60

70

80

p_1 = 0.5, p_10 = 0.35

p_1 = 0.55, p_10 = 0.35

90

0.80

Power

0.70

0.70

0.80

0.90

0.90

Sample Size per Group

Z_M Z_PE

0.60

0.60

Power

Z_M Z_PE

Sample Size per Group

Z_M Z_PE

40

50

60

70

15

80

20

25

30

35

40

45

Sample Size per Group

Sample Size per Group

p_1 = 0.5, p_10 = 0.30

p_1 = 0.55, p_10 = 0.30

50

0.80

Power 0.80

0.90

0.90

30

15

20

25 30 35 40 Sample Size per Group

45

Z_M Z_PE

0.70

Z_M Z_PE

0.70

Power

0.80

Power

90

0.70

0.80 0.70 0.60

Power

0.90

p_1 = 0.55, p_10 = 0.40

Z_M Z_PE 80

Downloaded by [New York University] at 17:45 06 July 2015

13

50

10

15 20 25 Sample Size per Group

30

Figure 2 Comparison of the powers based on 100,000 simulations. Horizontal dotted line indicates 80% power.

For the true parameter p1 ¼ 0:5, testing the null hypothesis p10 ¼ 0:45 has a lower power for these statistics. From Figure 2, the proposed statistic ZPE has higher power than ZM to test null hypotheses of p10 ¼ 0:40, 0.35, and 0.30 under the true parameter p1 ¼ 0:5. These findings are also replicable for the true parameter p1 ¼ 0:55. There is a possibility that the proposed method gains the improved power at the cost of inflating the Type I error rate shown in Figure 1. We found that ZW has the lowest power among ZW, ZM, ZPE, and ZPU.

9. DISCUSSION In this article, a measure of treatment effect p1 for ordinal categorical data is used and a method of specifying noninferiority margin for the measure is provided. The proposed pivotal quantity ZP in (18) is a modification of the Munzel and Hauschke’s method in such a way that it estimates the test statistic’s variance using σ 200 ¼ p1 ð1  p1 Þ 2 instead of σ^00 ¼^ p1 ð1  ^ p1 Þ. When the margin is small, p1 is close to the central value of 0.5 and the variance component of p1 ð1  p1 Þ is close to its maximum, and hence the proposed method and the Munzel and Hauschke’s method become similar.

Downloaded by [New York University] at 17:45 06 July 2015

14

ZHANG ET AL.

We proposed two Z-type test statistics, ZPE and ZPU, where the estimation of variance is obtained using MLEs and U-statistics, respectively. The limited simulation study suggested that proposed test statistics ZPE and ZPU are preferable to statistic ZM previously proposed by Munzel and Hauschke (2003) in terms of deviation from the nominal level and the power. However, the gain of the improved power may be attributable to the inflated Type I error rate. As the proposed statistics have a property of being anticonservative slightly, we recommend the study designers to confirm that it is negligible in the context of their own trials. ZPE and ZPU are considered to be a modification of ZM to take into account the asymptotic variance using noninferiority null hypothesis information. Therefore, ZPE and ZPU should be considered as an improved approach. For binomial distributions, Munzel and Hauschke’s test statistic becomes the well-known Wald-type statistic that suffers serious drawbacks which have been pointed out by many authors. The performance of the proposed confidence interval of p1 and sample size determination method will be evaluated and compared with the method proposed by Munzel and Hauschke (2003) in a separate paper. For the noninferiority testing, it needs to determine the noninferiority margin. If some historical clinical data about control treatment and placebo are available, a method described in Appendix A can be used. The method shows that the margin ð δ0 may be chosen as a fraction of the lower limit of a 95% confidence interval for F2 dF0  12 ¼ PðX2 < X0 Þ þ 12 PðX2 ¼ X0 Þ  12 estimated from the historical clinical trial data. Here X0 denotes the clinical outcome for the placebo group. Notice that the margin δ0 for p1 is half of the margin for the difference of proportions, π 1  π 2 , when X1 and X2 follow binomial distributions with parameters π 1 and π 2 , respectively. So we can think about the magnitude of the margin by reference to the already well-studied noninferiority study for the difference between binomial proportions. As described in Munzel and Hauschke (2003, p. 32 left), in general including the case where two cdfs cross, the measure p1 is interpreted as the probability that X1 has a tendency to smaller values than X2. With this interpretation for p1, both the proposed test statistics and Munzel and Hauschke’s test statistic can be applied without assuming noncrossing cdfs. In Appendix A, the method is discussed to determine the noninferiority margin in the same manner as in parametric model setting. It will be desirable that the chosen margin ensures that establishmentð of noninferiority implies the superiority of the new treatment to the placebo, that is F1 dF0 >

1 2.

ð

However, different from the parametric

   model setting, only weaker relationship of F0  F1 dF2 < 0 can be assured. To ð ensure F1 dF0 > 12 , some conditions are needed in the nonparametric model setting. A sufficient condition is that the cdfs F1 ðxÞ and F0 ðxÞ do not cross. When we design a noninferiority study, we usually have some historical clinical data about control treatment and placebo, and early-phase clinical trial data about new treatment. So we can make an ^  ðxÞ and ^  ðxÞ, F initial assessment of the assumption on the basis of the empirical cdfs F 1 2  ^ ðxÞ obtained so far. F 0

TO ESTABLISH NONINFERIORITY FOR ORDINAL CATEGORICAL DATA

15

APPENDIX A: METHOD FOR SPECIFYING THE NONINFERIORITY MARGIN In this appendix, we provide a method to specify the noninferiority margin with some statistical justification in nonparametric models. We should specify the noninferiority margin to ensure that establishment of noninferiority implies the superiority of the new treatment to the placebo. Let X1, X2, and X0 denote the clinical outcome for the new treatment, the control treatment, and the placebo group. We assume that smaller values of the outcome measure correspond to favorable results. Suppose that X1, X2, and X0 are mutually independent with distribution functions F1(x), F2(x), and F0(x), respectively. Let Fi ðxÞ ¼ 12½PðXi

1 þ δ0 : 2

(A1)

For ð example, δ0 may be a fraction of the lower limit of a 95% confidence interval for F2 dF0  12 estimated from some historical clinical trial data. We hope to say that the new treatment is superior to placebo, when the noninferiority is established. It means that we hope to have ð

F1 dF0 >

ð

1 ; 2

(A2)

when null hypothesis H0 : F1 dF2 ¼ 12  δ0 is rejected in favor of the following alternative hypothesis ð

H1 : F1 dF2 >

1  δ0 : 2

So we have ð

ðF2 þ F1  F2 ÞdF2 >

1  δ0 ; 2

then ð

ðF1  F2 ÞdF2 >  δ0 :

Next, from (A1), we have ð

F0 dF2

Test Statistics and Confidence Intervals to Establish Noninferiority between Treatments with Ordinal Categorical Data.

The problem for establishing noninferiority is discussed between a new treatment and a standard (control) treatment with ordinal categorical data. A m...
281KB Sizes 0 Downloads 2 Views