Comment

5

6

7

8

Coustan-Smith E, Behm FG, Sanchez J, et al. Immunological detection of minimal residual disease in children with acute lymphoblastic leukaemia. Lancet 1998; 351: 550–54. van Dongen JJM, Seriu T, Panzer-Grünmayer ER, et al. Prognostic value of minimal residual disease in acute lymphoblastic leukaemia in childhood. Lancet 1998; 352: 1731–38. Vora A, Goulden N, Wade R, et al. Treatment reduction for children and young adults with low-risk acute lymphoblastic leukaemia defined by minimal residual disease (UKALL 2003): a randomised controlled trial. Lancet Oncol 2013; 14: 199–209. Vora A, Goulden N, Mitchell C, et al. Augmented post-remission therapy for a minimal residual disease-defined high-risk subgroup of children and young people with clinical standard-risk and intermediate-risk acute lymphoblastic leukaemia (UKALL 2003): a randomised controlled trial. Lancet Oncol 2014; 15: 809–18.

9

10 11

12

Pui C-H, Pei D, Coustan-Smith E, et al. Clinical utility of sequential minimal residual disease measurements in the context of risk-based therapy in childhood acute lymphoblastic leukaemia: a prospective study. Lancet Oncol 2015; published online March 19. http://dx.doi.org/10.1016/ S1470-2045(15)70082-3. Pui CH, Campana D, Pei D, et al. Treating childhood acute lymphoblastic leukemia without cranial irradiation. N Engl J Med 2009; 360: 2730–41. Faham M, Zheng J, Moorhead M, et al. Deep-sequencing approach for minimal residual disease detection in acute lymphoblastic leukemia. Blood 2012; 120: 5173–80. Logan AC, Zhang B, Narasimhan B, et al. Minimal residual disease quantification using consensus primers and high-throughput IGH sequencing predicts post-transplant relapse in chronic lymphocytic leukemia. Leukemia 2013; 27: 1659–65.

Non-inferiority trials: why oncologists must remain wary

See Online for appendix

364

Heightened interest in comparative effectiveness research has made head-to-head comparisons between anti-cancer drugs increasingly common. One strategy is to show non-inferiority of a therapy to another in terms of efficacy and to rely on improvements in safety (toxicity), quality of life, convenience, or cost to guide the choice. Many noninferiority trials have followed this model, which raises questions regarding the use of non-inferiority designs in comparative effectiveness research. We address three issues that affect non-inferiority trials: the arbitrary nature of what constitutes a non-inferior therapy; the problem of censorship and its effect on outcomes; and the arbitrary choice of what represents an advantage if non-inferior efficacy is established. As the US Food and Drug Administration (FDA) noted,1 the design and interpretation of noninferiority trials “is a formidable challenge”. Sometimes erroneously thought of as equivalence trials, non-inferiority trials do not show superiority of the test drug, but “show the new treatment is not inferior to an unacceptable extent”. But as others have noted,2,3 and as summarised in the appendix, the margin of non-inferiority is often inadequately justified. The variable and often arbitrary thresholds show an absence in consensus of what constitutes “an unacceptable extent”, with hazard ratios (HR) for non-inferiority ranging from 1·049 to 1·43. Note that, unlike superiority trials that aim for statistically significant HRs of less than 1, non-inferiority trials choose HRs greater than 1—ie, values that, with their confidence intervals, represent the limits of how much worse the experimental therapy can be, yet

still be considered non-inferior. The desired efficacy of the experimental drug relative to that of the reference drug also varies greatly (appendix). The large distribution of these two non-inferiority margins (ie, HRs and percentage of efficacy retained) underscore the arbitrary nature of the choices and, in our view, render many choices questionable. Consider, for example, capecitabine for metastatic breast cancer. Regulatory approval of adding capecitabine to docetaxel after failure of a prior anthracycline-containing regimen was based on the gains achieved in a superiority trial by adding standard dose capecitabine (1250 mg/m² twice daily) to docetaxel. Although outcomes were statistically significant, with HRs of 0·652 (95% CI 0·545–0·780, p=0·0001) for time to progression and 0·775 (95% CI 0·634–0·947, p=0·0126) for overall survival, the absolute gains of 1·9 months in progression-free survival and 3 months in overall survival achieved were modest. The non-inferiority trial summarised in the appendix compared the addition of a lower (825 mg/m² twice daily) or standard dose capecitabine to docetaxel, thereby seeking to improve tolerability with the lower capecitabine dose without compromising efficacy.4 Yet, as this and other trials show, a guarantee that “the test drug is not any (even a little) less effective than the control can only be demonstrated by showing the test drug is superior. What non-inferiority trials seek to show is that any difference between the two treatments is small enough that the new drug has at least some effect or, in many cases, an effect that is not too much smaller than the active control”.1 But when the gains of the reference drug are modest, as with www.thelancet.com/oncology Vol 16 April 2015

Comment

the addition of capecitabine to docetaxel, the choice of inferiority boundaries becomes critical. The design of this particular trial envisioned that non-inferiority would be established if the lower dose retained 25% or more of the contribution of 1250 mg capecitabine to the capecitabine plus docetaxel combination—ie a HR less than 1·35. HRs aside, how much less efficacy than 1·9 months in time to progression and 3 months in overall survival would a woman with breast cancer consider significant? When the trial was designed, the investigators probably knew that a comparison of the lower dose of capecitabine plus docetaxel against docetaxel alone would never be performed. In view of the modest gains with standard dose capecitabine and the extent to which activity could be reduced yet remain non-inferior, we would not know if the addition of a lower, non-inferior dose of capecitabine to docetaxel is better than docetaxel alone. We would only know it was non-inferior to standard dose capecitabine. The appendix also shows how reference studies of efficacy are often not available. Eight (47%) of the 17 summarised non-inferiority trials did not use a previously established therapy as the benchmark and instead chose arbitrary differences. For example, in the comparison of bevacizumab plus capecitabine with bevacizumab plus paclitaxel, the non-inferiority margin (HR ≥1·33) was selected “according to medical judgment of a clinically appropriate and acceptable margin”, since a margin of non-inferiority “could not be selected on the basis of improvement in overall survival in previous trials of bevacizumab plus paclitaxel—because no trial has shown a significant overall survival benefit”.5 The primary objective was to show that the experimental regimen, capecitabine plus bevacizumab, was non-inferior to paclitaxel plus bevacizumab, a regimen that has never been shown to be of clinical value. As the FDA has noted, “similarity of test drug and active control can mean either that both drugs were effective or that neither was effective”.1 In several non-inferiority trials, we found high degrees of censoring that we believe can confound interpretations of results. For example, in a comparison of linifanib and sorafenib in hepatocellular carcinoma,6 high rates of censoring render conclusions regarding the secondary time-to-progression endpoint questionable (appendix). At study cut-off, disease progression had occurred in only 62% and www.thelancet.com/oncology Vol 16 April 2015

69% of patients in the linifanib and sorafenib groups, respectively. The balance was censored with regards to time to progression, primarily for adverse events and withdrawal of consent (30% and 20%, respectively). Neither Kaplan nor Meier envisioned censoring, an adjustment to maximally harvest information, would be so greatly abused. And whereas high degrees of censoring can also affect superiority trials, censoring in non-inferiority trials, especially those with tolerability as an endpoint, can be especially problematic. In 11 non-inferiority trials in which published data allowed an estimate of censoring, we found a remarkable correlation between the relative rates of censoring and the relative rates of drug reduction and discontinuation between the two arms (R²=0·75). In other words, as therapies become more intolerable, censoring becomes more common. And to the extent that censoring is driven by toxicity, high rates of censoring can then both improve outcomes by eliminating from calculations those patients who cannot tolerate drug–often the more infirm–while also masking the problem of tolerability. Thus, in noninferiority trials, the extent of censoring must be very carefully examined. To inform treatment decisions, non-inferiority trials often rely on attributes other than efficacy, including safety (toxicity), quality of life, convenience, or cost. Unfortunately, there are no guidelines, so these comparisons can arbitrarily focus on selected endpoints or draw conclusions despite only limited participation and length of treatment time surveyed and thus risks abusing patient-reported outcomes.7 For example, a non-inferiority trial8 compared pazopanib and sunitinib as first-line therapy for metastatic renal cell carcinoma. With progressionfree survival as the primary endpoint, investigators concluded that “pazopanib and sunitinib have similar efficacy, but the safety and quality-of-life (QOL) profiles favor pazopanib”. The Cancer Therapy Satisfaction Questionnaire asked two concordant questions (feelings about side-effects and satisfaction with therapy), whereas the supplementary qualtityof-life analysis included five questions about handfoot syndrome—thereby inflating the tolerability of pazopanib because this side-effect is well known to occur preferentially with sunitinib. Furthermore, as often seen, participation in quality-of-life analyses was 365

Comment

low and covered only the first 6 months of treatment, providing no insight into long-term tolerability. Low participation in quality-of-life analyses constitutes a censoring of data since those who do not feel well are less likely to participate, and discontinuation of treatment for any reason means other toxicities are never scored. For this reason, we suggest there should be a more objective measure of tolerability. Just as analyses of overall survival are less prone to informative censoring than those of progression-free survival,9,10 the percentage of patients discontinuing treatment is a better tolerability metric (appendix). Patients who find side-effects intolerable discontinue treatments, and this endpoint is not subject to censoring. By this criterion, sunitinib outperformed or at least equaled pazopanib (20% vs 24%). Although the authors argued higher pazopanib discontinuation rates were due to abnormalities in liver function, discontinuation for any reason is important. Diarrhoea, not queried in quality-of-life surveys, has been much higher with pazopanib, and median rates of drug discontinuation have been 19% with sunitinib, and 20% with pazopanib, underscoring their comparable tolerability and ratifying values in a recent study.11 Thus, we have concerns about the use of noninferiority trials in oncology. In particular, the margin of non-inferiority is often not clearly justified, and there is large variability in this metric. Additionally, the rates of censoring must be scrutinised because treatment choices are increasingly made on the basis of improvements in safety, quality of life, convenience, and cost. And while convenience and cost are often

366

assessed objectively, tolerability can be subject to bias. To provide the highest standard of patient care, we must remain critical of non-inferiority studies. Mauricio Burotto, Vinay Prasad, *Tito Fojo Medical Oncology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA [email protected] We declare no competing interests. 1

2

3

4

5

6

7

8 9 10 11

Food and Drug Administration. Guidance for industry: non-inferiority clinical trials. 2010. http://www.fda.gov/downloads/Drugs/Guidances/ UCM202140.pdf (accessed March 12, 2015). Riechelmann RP, Alex A, Cruz L, Bariani GM, Hoff PM. Non-inferiority cancer clinical trials: scope and purposes underlying their design. Ann Oncol 2013; 24: 1942–47. Tanaka S, Kinjo Y, Kataoka Y, Yoshimura K, Teramukai S. Statistical issues and recommendations for noninferiority trials in oncology: a systematic review. Clin Cancer Res 2012; 18: 1837–47. Buzdar AU, Xu B, Digumarti R, et al. Randomized phase II non-inferiority study (NO16853) of two different doses of capecitabine in combination with docetaxel for locally advanced/metastatic breast cancer. Ann Oncol 2012; 23: 589–97. Lang I, Brodowicz T, Ryvo L, et al. Bevacizumab plus paclitaxel versus bevacizumab plus capecitabine as first-line treatment for HER2-negative metastatic breast cancer: interim efficacy results of the randomised, open-label, non-inferiority, phase 3 TURANDOT trial. Lancet Oncol 2013; 14: 125–33. Cainap C, Qin S, Huang WT, et al. Linifanib versus Sorafenib in patients with advanced hepatocellular carcinoma: results of a randomized phase III trial. J Clin Oncol 2015; 33: 172–79. Hao Y. Patient-reported outcomes in support of oncology product labeling claims: regulatory context and challenges. Expert Rev Pharmacoecon Outcomes Res 2010; 10: 407–20. Motzer RJ, Hutson TE, Cella D, et al. Pazopanib versus sunitinib in metastatic renal-cell carcinoma. N Engl J Med 2013; 369: 722–31. Booth CM, Eisenhauer EA. Progression-free survival: meaningful or simply measurable? J Clin Oncol 2012; 30: 1030–33. Basler MH. Utility of the McNamara fallacy. BMJ 2009; 339: b3141. Massey PR, Okman JS, Wilkerson J, EW C. Tyrosine kinase inhibitors directed against the vascular endothelial growth factor receptor (VEGFR) have distinct cutaneous toxicity profiles: a meta-analysis and review of the literature. Suppor Care Cancer 2014; published online Dec 5. DOI:10.1007/s00520-014-2520-9.

www.thelancet.com/oncology Vol 16 April 2015

Non-inferiority trials: why oncologists must remain wary.

Non-inferiority trials: why oncologists must remain wary. - PDF Download Free
45KB Sizes 3 Downloads 7 Views