Some current issues in the design of HIV noninferiority trials Philippe Flandrea,b Since the introduction of protease inhibitors and their combination with two nucleoside reverse transcriptase inhibitors in tri-therapy, there has been a continuous improvement in the efficacy of antiretroviral treatments. Such combinations have been rendered even more effective by the introduction of non-nucleoside reverse transcriptase inhibitors and, more recently, integrase inhibitors. This progress has led to a move away from superiority designs towards noninferiority designs for randomized clinical trials for HIV. Noninferiority trials aim to demonstrate that a new regimen is no worse than the current standard. The methodological issues associated with such designs have been discussed, but recent HIV trials provide us with an opportunity to consider the choice of hypotheses. Recent HIV trials have been overpowered, due to the assumption of lower success rates than observed and the enrollment of a large number of patients. The use of stratified statistical methods for primary endpoint analysis, with sample size calculated by classical methods (without stratification), also increases the statistical power. Some HIV trials have a statistical power close to 99%. Surprisingly, the results of some previous studies or phase II trials are not taken into account when designing the corresponding phase III trials. We discuss alternative hypotheses and designs. ß 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins

AIDS 2014, 28:1921–1929 Keywords: HIV trials, noninferiority design, statistical methods, statistical power

Introduction The efficacy of HAART has led to a move towards the use of noninferiority designs for HIV drug development [1,2]. Indeed, with viral suppression rates of 80–90% in recent studies, it would be difficult to design a superiority trial showing any meaningful improvement. In addition, such superiority trials would necessarily require the use of very large samples. Classical superiority trials are, therefore, rarely carried out on HIV-infected patients. Noninferiority trials aim to demonstrate that an experimental treatment (EXP) is not unacceptably worse than the standard regimen (STD). Some clinical and statistical parameters are required for the design of a noninferiority trial, including the a level, statistical power, success rates in the two treatment groups and the noninferiority margin DL. In practice, the number of patients is calculated so as to attain a certain

statistical power. One of the key issues is the choice of the noninferiority margin, which specifies the maximum loss of efficacy considered acceptable in the trial. Establishing this margin is not straightforward though different approaches have been discussed. The choice of the margin should be justified on statistical and clinical grounds. In HIV trials, a margin of 10–12% is generally used, with or without a suitable rationale, and it has been suggested that this margin should be decreased in future trials [1]. No such decrease in this margin has, however, been observed in recent years. It has been argued that such a reduction would lead to a large increase in sample size [2]. Indeed, the width of the noninferiority margin drives the overall sample size (Fig. 1). However, sample size is also driven by power, and some recent trials have adopted a 95%-power design involving large sample sizes. The unacceptable increase in sample size to reduce the

a INSERM, and bSorbonne Universite´s, UPMC Univ Paris 06, UMR-S 1136, Pierre Louis Institute of Epidemiology and Public Health, Paris, France. Correspondence to Philippe Flandre, INSERM UMR-S 1136, 56 Boulevard Vincent Auriol, 75013 Paris, France. Tel: +33 14164275; e-mail: [email protected] Received: 7 April 2014; revised: 26 May 2014; accepted: 27 May 2014.

DOI:10.1097/QAD.0000000000000369

ISSN 0269-9370 Q 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins

1921

Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

AIDS

2014, Vol 28 No 13 800

8%-margin

Power = 95%

700

Power = 90% Power = 80%

600

Sample size per group

1922

500 400 300 200 100

10%-margin

0 75

80

85

90

95

Common response rate

Fig. 1. Relationship between power, sample size, common response rate and noninferiority margin.

noninferiority margin is thus seen as acceptable to increase statistical power. The magnitude of the increase in power is similar to the effect of a substantial reduction of the noninferiority margin. For example, the sample size required for an 8% margin in a study with 80% power is similar to that for a 10% margin with 95% power (Fig. 1). Given the pressure to demonstrate the noninferiority of new agents, it seems likely that study power may even increase in the next few years. Indeed, some recent trials have already been found to have a power close to 99%. A wide margin and a high power will tend to favor the development of ‘me-too’ drugs, leading to an additional marketable product, but not a substantial improvement in therapeutic care [3]. Recent HIV studies raise the question of sample size and decisions about study power in the design of efficient trials. Fleming [4] discussed a widely held myth that ‘noninferiority trials having scientifically rigorous margins always require very large sample sizes’. In this study, we will discuss some of the hypotheses used in HIV noninferiority trials and investigate whether other design options are available. We specifically focused on the choice of success rates, power and noninferiority margin. We used six recent trials to illustrate these issues.

Design considerations Success rates One recent trial [Efficacy Comparison in treatment-naive HIV-infected subjects of TMC278 and Efavirenz (ECHO)] investigated the efficacy, safety and tolerability of rilpivirine (EXP), comparing this drug with efavirenz (EFV) (STD), each with a background regimen of tenofovir-disoproxil-fumarate and emtricitabine (TDF/ FTC), in treatment-naive patients infected with HIV-1

[5]. The primary endpoint was a confirmed response at week 48, defined by a viral load below 50 copies/ml. Another trial [TMC278 against HIV in a once-daily regimen versus efavirenz (THRIVE)] with a similar design compared rilpivirine with EFV, but with different background regimens [TDF/FTC, zidovudine and lamivudine (ZDV/3TC) and abacavir and lamivudine (ABC/3TC)] [6]. In practice, there is generally information available concerning the STD response rate, but much less is known about the EXP response rate. However, a common response rate in the two groups is usually assumed [1,7], and success rates of 75% have been hypothesized. In the original publication, seven studies were referenced to justify this choice, but only four of these studies provide valuable information (three randomized trials and one observational study) [5]. Two studies used a limit of detection greater than 50 copies/ml and another study was undoubtedly identified erroneously, because EFV was not used. During the same period, three other trials were published in which EFV was investigated as a part of tri-therapy regimen. Overall, only two groups of patients can then be identified using EFV þ TDF/FTC; the other groups used different background regimens (Table 1). A belief that the response rate is at least as high as a certain threshold value, together with observations from previous clinical trials, leads to the use of simple Bayesian methods. The use of a beta distribution in the Monte Carlo Markov Chain framework (WinBUGS) provides different estimates of the standard response rate, depending on the data used [8]. On the basis of the success rates from the four references mentioned above, the mean success rate was 73.12%, consistent with the hypothesized success rate of 75%. A more accurate estimate, however, can be obtained by considering only the TDF/FTC backbone allowed for the ECHO trial. Using the two EFV þ TDF/FTC

Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

HIV noninferiority trials Flandre

1923

Table 1. Rilpivirine versus efavirenz in HIV-1-infected patients results of both ECHO and THRIVE trials (primary endpoint HIV-1 RNA observed success rate

Fig. 3. Difference in percentage between the observed and hypothesized response rate for six HIV trials (STD for standard arm and EXP for experimental arm).

Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

HIV noninferiority trials Flandre

factor (baseline plasma viral load 100 000, >100 000 to 500 000, and >500 000 copies/ml). We conducted a small simulation to investigate the power of CMH and logistic regression methods, using the parameters of the QUAD and ECHO/THRIVE studies, respectively. In the ‘QUAD’ simulation, two strata were considered, with two different allocation ratios (50% in each viral load stratum or 60% in the lower viral load stratum and 40% in the upper stratum). The common response rate was 79.5%, with a difference in response rates between strata of 5, 10 and 15%. In the ‘ECHO/ THRIVE’ simulation, three strata were investigated, with an allocation ratio of 50, 40 and 10%. The common response rate was 75%, with a difference in response rates between the viral load strata of 10, 15 and 20%. One thousand simulated trials with sample sizes of 340 and 350 patients per arm, respectively, were generated. For the ‘QUAD’ simulation, a nominal power of 98–99% was obtained with the CMH method, whereas, for the ‘ECHO/THRIVE’ simulation, based on adjusted logistic regression, the power achieved was 95–97%.

failure by week 48 (96) for the other regimen of 23.57% (41.5%). This approach was adopted because the team felt that a difference in the probability of virologic failure by week 48 of approximately 5%, equivalent to a probability of virologic failure by week 96 of approximately 10%, was acceptable for equivalence. Thus, the hazard ratio margin of 1.40 corresponds to a 5% margin in terms of the difference in risk at week 48. A 10% margin in risk difference at week 48 would correspond to a probability of failure of 17.46% (hazard of 0.004) for one regimen, but a probability of failure of 27.46% (hazard of 0.0067) for the other regimen. This would result in an upper limit for the hazard ratio of 1.68 (0.0067/0.004) instead of 1.4. The final results indicated that, for patients randomly assigned to receive ATV/r or EFV, the hazard ratio (with EFV taken as the reference) for time to virologic failure was 1.13 (95% CI 0.82–1.56) for ABC/3TC and 1.10 (95% CI 0.70–1.46) for TDF/FTC [20]. With an upper limit of 1.40, neither of these comparisons met the criteria for equivalence defined at the outset of the study. However, the use of an upper limit of 1.68 would have led to a conclusion of equivalence for the two pair-wise comparisons.

Noninferiority margin Most of the HIV trials used a margin of 10–12%, although it is recognized that these values are generally chosen for reasons of feasibility rather than on statistical and clinical ground [19]. One recent equivalence trial used a smaller margin, although this is not clearly apparent. The AIDS Clinical Trials Groups (ACTG) 5202 study compared ATV/r and EFV as part of a three-drug regimen for the initial treatment of HIV-1 infection. This study used a factorial design with two factors: ATV/r versus EFV and ABC/3TC versus TDF/FTC. The primary endpoint for efficacy was the time from randomization to virologic failure. The planned study duration was 96 weeks after the enrollment of the last patient. Due to the nature of the endpoint, Cox’s proportional-hazards model was used to estimate the hazard ratio for time to virologic failure between the various randomized groups. Regimens were considered equivalent if the two-tailed 95% CI for the hazard ratio was between 0.71 and 1.40 [20]. A planned sample size of 1800 patients (450 per group) was calculated, to give an 89.8% probability of equivalence being declared if two regimens were the same.

Design based on results from phase II trials All the trials discussed in this study were conducted after promising results had been obtained in phase II trials. However, these results were not used fully in the design of the phase III trials. The results of the phase II studies are displayed at the top of Table 2. The use of rilpivirine, elvitegravir/cobicistat and dolutegravir, with a similar endpoint to that used in phase III trials, resulted in virologic success in 79.6, 89.6 and 90.2% of patients, respectively, whereas success rates of 75, 79.5 and 75%, respectively, were hypothesized in the phase III trials. The use of the success rates observed in phase II studies in the design of phase III trials would have led to either a decrease in sample size or to a decrease in the noninferiority margin (Table 2). For example, in the design of a phase III trial for rilpivirine, making use of the results of the NCT00110305 study, a common response rate of 80% could have been assumed, implying that a sample size of 175 and 290 patients per arm would have been sufficient to achieve powers of 80 and 95%, respectively. Such a design would correspond to decreases in sample size of 49 and 15%, respectively. The maintenance of a sample size of 340 would have led to the use of a margin of 8.5 and 11% for a power of 80 and 95%, respectively, rather than the margin of 12% actually used.

The supplemental material provided information about the hypotheses concerning sample size and accrual (supplemental appendix of [20]). In particular, a probability of virologic failure of 17.46% (31.89%) by week 48 (96) was assumed in one regimen and the hazard ratio for virologic failure for this regimen with respect to the other regimen was 1.4, implying a probability of

The results of phase II studies for elvitegravir/cobicistat and dolutegravir suggest that the experimental regimen may be of potential marginal benefit. This would lead to a hybrid design with a smaller sample size or a smaller noninferiority margin than a design based on the use of a common response rate [4,21]. Similar designs have been already discussed and implemented [21,22]. On the basis

Alternative hypotheses and designs

1925

Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

1926

AIDS

2014, Vol 28 No 13

Table 2. Results of phase II trials and potential design of corresponding phase III trials. Virologic response Phase II study

Experimental drug

Treatment group

NCT00110305

Rilpivirine

NCT00869557

Elvitegravir/cobicistat

NCT009500859

Dolutegravir

RPV þ 2NRTIs EFV þ 2 NRTIs EVG/COBI þ TDF/FTC EFV þ TDF/FTC DTG þ 2 NRTIs EFV þ 2 NRTIs

Corresponding phase II

n/N

%

74/93 72/89 43/48 19/23 46/51 41/50

79.6% 80.9% 89.6% 82.6% 90.2% 82.0%

Designing phase III trials Maintenance of the initial margin and reducing the sample size

NCT00110305 NCT00869557 NCT009500859

Arm (assumed response rate)

Margin

Power

N/group

Sample size reduction

RPV (80%)–EFV (80%) RPV (80%) – EFV (80%) EVG/COBI (85%)–EFV (82%) EVG/COBI (85%)–EFV (82%) DTG (85%)–EFV (82%) DTG (85%)–EFV (82%)

12% 12% 12% 12% 10% 10%

80% 95% 80% 95% 80% 95%

175 290 96 160 130 212

49% 15% 73% 54% 67% 46%

Maintenance of the initial sample size and reducing the margin

NCT00110305 NCT00869557 NCT009500859

Arm (assumed response rate)

Power

N/group

Margin

RPV (80%)–EFV(80%) RPV (80%)–EFV(80%) EVG/COBI (85%)–EFV (82%) EVG/COBI (85%)–EFV (82%) DTG (85%)–EFV (82%) DTG (85%)–EFV (82%)

80% 95% 80% 95% 80% 95%

340 340 350 350 394 394

8.5% 11% 5% 7% 4.5% 6.5%

COBI, cobicistat; DTG, dolutegravir; EFV, efavirenz; EVG, elvitegravir; RPV, rilpivirine.

of the 90% response rate for dolutegravir obtained in the phase II study, it seems reasonable to hypothesize an 85% response rate in the corresponding phase III trial. We can thus hypothesize success rates of 85 and 82% for the dolutegravir (EXP) and EFV (STD) arms, respectively. Keeping the original margin of 10% applied in the SINGLE study would give sample sizes of 130 and 212 for powers of 80 and 95%, respectively, corresponding to decreases in sample size of 67 and 46%, respectively. The maintenance of a sample size of 394 would give margins of 4.5 and 6.5% for powers of 80 and 95%, respectively. As discussed by Fleming [4], these situations highlight a common flaw, due to the need to consider the EXP to be either truly much better than the STD, leading to superiority trial, or essentially the same as the STD, leading to a noninferiority trial.

Other considerations Unbalanced groups Although unbalanced allocation can reduce the sample size requirement for noninferiority trials, none of the six HIV trials considered this potential advantage. A review of the noninferiority trials published in the New England Journal of Medicine showed that only two of the 21 trials using a binary outcome made use of unbalanced group allocation [7]. A similar situation applies to work in the

field of HIV, in which the use of unbalanced sample sizes is highly infrequent. Several authors have suggested that unbalanced allocation ratios may be optimal for noninferiority designs [7,23]. When the outcome is a difference in risk, the decrease in sample size is greater when response rates are close to 0 or 100%, with the allocation of more patients to the experimental group (g ¼ nEXP/nSTD and g >1) [23]. We use the Farrington and Manning formulae, which give results similar to those obtained with the Hilton’s approach [7,23]. We calculated the optimal sample sizes for unbalanced groups according to the initial set of hypotheses (common response rate, noninferiority margin and power) used in the ECHO/THRIVE, QUAD 102/103 and SPRING-2/SINGLE trials. The optimal overall sample size was found to be 673 (g ¼ 1.22) for ECHO/THRIVE, 587 (g ¼ 1.28) for QUAD 102/ 103 and 784 (g ¼ 1.18) for SPRING-2/SINGLE. The potential decrease in sample size was important only for the QUAD trials, in which a decrease in sample size of 16% (587 versus 700 patients) would have been possible with this approach. However, with unbalanced allocation ratios, more patients would have been allocated to the experimental group in these trials, with nEXP ¼ 370 vs. nSTD ¼ 303 for ECHO/THRIVE, nEXP ¼ 330 vs.

Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

HIV noninferiority trials Flandre

nSTD ¼ 257 for Quad 102/103, and nEXP ¼ 424 vs. nSTD ¼ 360 for SPRING-2/SINGLE. This would have increased the accuracy of the response and adverse event rates for the experimental group. Population sets In superiority trials, the intention-to-treat (ITT) analysis is the primary analysis, though the per-protocol analysis brings some useful information on efficacy. One difficulty is the rather vague definition of a per-protocol analysis defining it as an analysis in which patients’ data are included in the analysis dataset only if the patients adhered to the requirements of the protocol. Patients can then be excluded due to violation of some inclusion criteria. The difficulty arises from patients excluded on the basis of information obtained after randomization. Potential bias and effects on type I error probabilities have been described and the inflation of type I error probability is even larger with large sample size even when the null hypothesis is true [24].

In superiority design, ITT analysis corresponds to a conservative strategy because it often leads to smaller observed treatment effects than if all patients had been adhered to the treatment. In noninferiority trials, recent comments suggest that the per-protocol analysis is more conservative than the ITT analysis [25]. Unfortunately, this is simply wrong and the role of each population sets should be considered very carefully as it is difficult to claim that the ITT or the per-protocol analysis is more conservative [26]. For example, in a recent trial comparing darunavir/ritonavir maintenance monotherapy with darunavir/ritonavir triple therapy, the ITT analysis did not demonstrate the noninferiority (could not reject the null hypothesis) and was then more conservative than the per-protocol analysis that demonstrated the noninferiority [27]. Then, it is admitted that there is a greater confidence in results when the conclusions of both analyses are consistent [28]. Nevertheless, the ITT analysis can be somewhat considered as the primary analysis because power and sample size calculations are done for that ITT population as it is impossible to expect how many patients will be excluded to the PP population.

Discussion The success of HIV drug development has involved the use of the noninferiority design. It has been argued that conducting superiority trials or reducing the noninferiority margin would lead to the inclusion of large numbers of patients. However, we observed a different trend, with an increase in the numbers of patients enrolled in HIV studies to maintain a margin of 10–12% whilst ensuring a statistical power of at least 95%. Our aim here is not to criticize the results/findings of particular HIV trials, but to question the exactness of some of the

1927

hypotheses used. First, the hypothetical success rates used are underestimated with respect to what could be reasonably expected based on the results of the previous studies, including phase II trials. Second, large numbers of patients are being enrolled with the aim of achieving a statistical power of 95% or higher in the event of stratified analysis. These two factors are actually resulting in trials being carried out with a true statistical power close to 99%. The issue of appropriate statistical power for noninferiority trials should be considered and discussed further. Indeed, the question of the maximum power targeted is also relevant beyond the field of HIV treatment. For example, a trial comparing telbivudine and lamivudine in patients with chronic hepatitis B declared a 99% power to demonstrate noninferiority [29]. As we have seen, most trials have used a 10–12% margin. The ACTG 5202 study, however, showed that a smaller margin, equivalent to a 5% margin for risk difference at week 48, could be used. We have shown that, in the ACTG 5202 trial, the use of a 10% margin would have led to a demonstration of noninferiority. The protocolspecified noninferiority margin is intended to reflect the maximum clinically acceptable difference between the therapeutic alternatives, as well as to prevent successive erosion of efficacy as new agents become the old standards in future trials [30–32]. In reality, however, the margin is mostly driven by logistical considerations around sample size and power calculations [2,19,31,33–35]. In addition wide margin increases the probability of lying in a difficult situation in which the entire 95% CI lied within the noninferiority margin and the 0 bound [30,36]. In this case, both noninferiority and inferiority are somewhat proven. High statistical power favors the development of ‘me-too’ drugs, especially if the drugs being compared are of the same class. Some authors consider that such a development is due to the real innovation crisis: pharmaceutical research and development turn out mostly minor variations on existing drugs [37]. There is an important concern for ‘me-too’ drugs that are most costly than the comparator, that are in reality slightly inferior to the comparator and that cost large amounts to develop, test and approve. Some authors consider that, in the HIV field, the new integrase inhibitors fall in this category [38]. Through noninferiority trials, we have seen that a new regimen may be approved even if it is less effective than the standard regimen. If the comparator for the next noninferiority trial is the new regimen, this raises the possibility that, after a series of noninferiority trials with each drug being a little worse than its predecessor, an ineffective therapy may falsely be deemed efficacious. Fleming [4] discussed such a phenomenon in antifungal therapy. The hazards of bio-creep are obviously higher in

Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

1928

AIDS

2014, Vol 28 No 13

case of large margins. In HIV trials, however, we observe almost the inverse of a bio-creep effect since, in the six trials discussed here, the new regimen was always better or slightly better than the comparator. This emphasizes the need of a superiority trial to clearly establish new standard regimens. In certain case, superiority trial may be conducted instead of conducting two parallel trials with similar design (ECHO and THRIVE for instance). Many efforts have been made to standardize clinical trials for HIV, with the use, for example, of a viral load below 50 copies/ml as the primary endpoint. Other design options are possible (Table 2), with either a smaller sample size or a smaller noninferiority margin. The most favorable situation occurs when a modest improvement in efficacy is expected from the new combination (hybrid design). Hybrid designs have already been described in statistical studies and have been implemented [4,21]. In such designs, rather than assuming a common success rate, the success rate of the new combination is considered to be slightly higher than that in the standard group. Unbalanced allocation ratios can also be used to optimize sample size. This would be particularly advantageous in HIV trials, because the treatment effect is often measured on a risk-difference scale, with success rates close to 90%. The use of appropriate hypotheses for success rates and of unbalanced allocation ratios should make it possible to enroll smaller numbers of patients than for some recent studies. In addition to the need to standardize the design of noninferiority trials, there is also a need to highlight the potential advantages of new treatments. These advantages may include lower toxicity, better tolerability, higher genetic barriers, and greater ease of administration, improving compliance and reducing costs. The magnitude of benefit in terms of toxicity is not defined clearly enough in the rationale of HIV trials. New endpoints combining efficacy and benefits could be explored as alternative endpoints in future trials.

Acknowledgements Conflicts of interest There are no conflicts of interest.

References 1. Hill A, Sabin C. Designing and interpreting HIV noninferiority trials in naive and experienced patients. AIDS 2008; 22:913– 921. 2. Mani N, Murray J, Gulick RM, Josephson F, Miller V, Miele P, et al. Novel clinical trial designs for the development of new antiretroviral agents. AIDS 2012; 26:899–907. 3. Pocock SJ. The pros and cons of noninferiority trials. Fundam Clin Pharmacol 2003; 17:483–490.

4. Fleming TR. Current issues in noninferiority trials. Stat Med 2008; 27:317–332. 5. Molina JM, Cahn P, Grinsztejn B, Lazzarin A, Mills A, Saag M, et al. Rilpivirine versus efavirenz with tenofovir and emtricitabine in treatment-naive adults infected with HIV-1 (ECHO): a phase 3 randomised double-blind active-controlled trial. Lancet 2011; 378:238–246. 6. Cohen CJ, Andrade-Villanueva J, Clotet B, Fourie J, Johnson MA, Ruxrungtham K, et al. Rilpivirine versus efavirenz with two background nucleoside or nucleotide reverse transcriptase inhibitors in treatment-naive adults infected with HIV-1 (THRIVE): a phase 3, randomised, noninferiority trial. Lancet 2011; 378:229–237. 7. Hilton JF. Noninferiority trial designs for odds ratios and risk differences. Stat Med 2010; 29:982–993. 8. Julious SA. The ABC of noninferiority margin setting from indirect comparisons. Pharm Stat 2011; 10:448–453. 9. Raffi F, Rachlis A, Stellbrink HJ, Hardy WD, Torti C, Orkin C, et al. Once-daily dolutegravir versus raltegravir in antiretroviral-naive adults with HIV-1 infection: 48 week results from the randomised, double-blind, noninferiority SPRING-2 study. Lancet 2013; 381:735–743. 10. Walmsley SL, Antela A, Clumeck N, Duiculescu D, Eberhard A, Gutierrez F, et al. Dolutegravir plus abacavir-lamivudine for the treatment of HIV-1 infection. N Engl J Med 2013; 369: 1807–1818. 11. Lennox JL, DeJesus E, Lazzarin A, Pollard RB, Madruga JV, Berger DS, et al. Safety and efficacy of raltegravir-based versus efavirenz-based combination therapy in treatment-naive patients with HIV-1 infection: a multicentre, double-blind randomised controlled trial. Lancet 2009; 374:796–806. 12. Eron JJ Jr, Rockstroh JK, Reynes J, Andrade-Villanueva J, Ramalho-Madruga JV, Bekker LG, et al. Raltegravir once daily or twice daily in previously untreated patients with HIV-1: a randomised, active-controlled, phase 3 noninferiority trial. Lancet Infect Dis 2011; 11:907–915. 13. Sax PE, Tierney C, Collier AC, Fischl MA, Mollan K, Peeples L, et al. Abacavir-lamivudine versus tenofovir-emtricitabine for initial HIV-1 therapy. N Engl J Med 2009; 361:2230– 2240. 14. DeJesus E, Rockstroh JK, Henry K, Molina JM, Gathe J, Ramanathan S, et al. Co-formulated elvitegravir, cobicistat, emtricitabine, and tenofovir disoproxil fumarate versus ritonavir-boosted atazanavir plus co-formulated emtricitabine and tenofovir disoproxil fumarate for initial treatment of HIV-1 infection: a randomised, double-blind, phase 3, noninferiority trial. Lancet 2012; 379:2429–2438. 15. Sax PE, DeJesus E, Mills A, Zolopa A, Cohen C, Wohl D, et al. Co-formulated elvitegravir, cobicistat, emtricitabine, and tenofovir versus co-formulated efavirenz, emtricitabine, and tenofovir for initial treatment of HIV-1 infection: a randomised, double-blind, phase 3 trial, analysis of results after 48 weeks. Lancet 2012; 379:2439–2448. 16. Newcombe RG. Interval estimation for the difference between independent proportions: comparison of eleven methods. Stat Med 1998; 17:873–890. 17. Kernan WN, Viscoli CM, Makuch RW, Brass LM, Horwitz RI. Stratified randomization for clinical trials. J Clin Epidemiol 1999; 52:19–26. 18. Mohamed K, Embleton A, Cuffe RL. Adjusting for covariates in noninferiority studies with margins defined as risk differences. Pharm Stat 2011; 10:461–466. 19. Struble K. From the meeting ‘Emerging issues in clinical trials for new ARV development’. http://www.hivforum.org/storage/ documents/2010ClinicalTrials/1_2_struble_justifyinganoninferi oritymargin.pdf. [Accessed 2014] 20. Daar ES, Tierney C, Fischl MA, Sax PE, Mollan K, Budhathoki C, et al. Atazanavir plus ritonavir or efavirenz as part of a 3-drug regimen for initial treatment of HIV-1. Ann Intern Med 2011; 154:445–456. 21. Freidlin B, Korn EL, George SL, Gray R. Randomized clinical trial design for assessing noninferiority when superiority is expected. J Clin Oncol 2007; 25:5019–5023. 22. Gradishar WJ, Tjulandin S, Davidson N, Shaw H, Desai N, Bhar P, et al. Phase III trial of nanoparticle albumin-bound paclitaxel compared with polyethylated castor oil-based paclitaxel in women with breast cancer. J Clin Oncol 2005; 23:7794–7803.

Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

HIV noninferiority trials Flandre 23. Farrington CP, Manning G. Test statistics and sample size formulae for comparative binomial trials with null hypothesis of nonzero risk difference or nonunity relative risk. Stat Med 1990; 9:1447–1454. 24. Lachin JM. Statistical considerations in the intent-to-treat principle. Control Clin Trials 2000; 21:167–189. 25. Schrijvers R, Desimmie BA, Debyser Z. Rilpivirine: a step forward in tailored HIV treatment. Lancet 2011; 378:201–203. 26. Brittain E, Lin D. A comparison of intent-to-treat and perprotocol results in antibiotic noninferiority trials. Stat Med 2005; 24:1–10. 27. Katlama C, Valantin MA, Algarte-Genin M, Duvivier C, Lambert-Niclot S, Girard PM, et al. Efficacy of darunavir/ ritonavir maintenance monotherapy in patients with HIV-1 viral suppression: a randomized open-label, noninferiority trial, MONOI-ANRS 136. AIDS 2010; 24:2365–2374. 28. Piaggio G, Elbourne DR, Altman DG, Pocock SJ, Evans SJ, Group C. Reporting of noninferiority and equivalence randomized trials: an extension of the CONSORT statement. JAMA 2006; 295:1152–1160. 29. Lai CL, Gane E, Liaw YF, Hsu CW, Thongsawat S, Wang Y, et al. Telbivudine versus lamivudine in patients with chronic hepatitis B. N Engl J Med 2007; 357:2576–2588. 30. DiNubile MJ, Sklar P, Lupinacci RJ, Eron JJ. Paradoxical interpretations of noninferiority studies: violating the excluded middle. Future Virol 2012; 7:1055–1063.

1929

31. Fleming TR, Powers JH. Issues in noninferiority trials: the evidence in community-acquired pneumonia. Clin Infect Dis 2008; 47 (Suppl 3):S108–120. 32. Powers JH, Ross DB, Brittain E, Albrecht R, Goldberger MJ. The United States Food and Drug Administration and noninferiority margins in clinical trials of antimicrobial agents. Clin Infect Dis 2002; 34:879–881. 33. Kaul S, Diamond GA. Good enough: a primer on the analysis and interpretation of noninferiority trials. Ann Intern Med 2006; 145:62–69. 34. Lange S, Freitag G. Choice of delta: requirements and reality: results of a systematic review. Biom J 2005; 47:12–27discussion 99–107. 35. Spellberg B, Talbot GH, Boucher HW, Bradley JS, Gilbert D, Scheld WM, et al. Antimicrobial agents for complicated skin and skin-structure infections: justification of noninferiority margins in the absence of placebo-controlled trials. Clin Infect Dis 2009; 49:383–391. 36. Flandre P. Design of HIV noninferiority trials: where are we going? AIDS 2013; 27:653–657. 37. Light DW, Lexchin JR. Pharmaceutical research and development: what do we get for all that money? BMJ 2012; 345:e4348. 38. Serrao E, Odde S, Ramkumar K, Neamati N. Raltegravir, elvitegravir, and metoogravir: the birth of ‘me-too’ HIV-1 integrase inhibitors. Retrovirology 2009; 6:25.

Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

Some current issues in the design of HIV noninferiority trials.

Since the introduction of protease inhibitors and their combination with two nucleoside reverse transcriptase inhibitors in tri-therapy, there has bee...
470KB Sizes 0 Downloads 3 Views