Urologic Oncology: Seminars and Original Investigations 33 (2015) 116–121

Seminar article

A primer on clinical trial design Chad Ellimoottil, M.D., M.S.a,*, Sandeep Vijan, M.D., M.S.b, Robert C. Flanigan, M.D.a a b

Loyola University Medical Center, Department of Urology, Maywood, IL Department of Internal Medicine, University of Michigan, Ann Arbor, MI

Received 4 September 2014; received in revised form 15 December 2014; accepted 17 December 2014

Abstract A well-designed and executed clinical trial is the gold standard of evidence-based medicine. It is important for readers to understand the rationale for the study design, identify common pitfalls, and scrutinize limitations. Herein, we present a brief overview of types of designs used for clinical trials and discuss the use of appropriate end points, the selection of study participants, randomization, sample size calculation, blinding, and analysis of data. Finally, we emphasize the importance of accurate and transparent reporting. Our goal is to provide a primer for practicing urologists to enhance their understanding of the clinical trial literature. Published by Elsevier Inc.

Keywords: Clinical trial design; Randomized control study; Clinical trial; Study design; Urology

Introduction A well-designed and executed clinical trial is the gold standard of evidence-based medicine. Each month, dozens of trials are published in top journals with the intent of guiding clinical practice with reliable evidence. Although it is assumed that by virtue of the peer-review process, a published trial has met certain standards, the reader is ultimately responsible for critical appraisal of the quality and applicability of the trial. Namely, the reader should be able to understand the rationale for the study design, identify common pitfalls, and scrutinize limitations. Herein, we present a brief overview of clinical trials to provide the practicing urologist with a basic methodological understanding of the clinical trial literature.

What is a clinical trial? A clinical trial is a prospective study design that is aimed at understanding the effect of an intervention (e.g., surgical procedure, drug, or medical device) in a predetermined population over a defined period of time. High-quality clinical trials adhere to stringent design and reporting Corresponding author. Tel.: þ1-630-440-4458. E-mail address: [email protected] (C. Ellimoottil). *

http://dx.doi.org/10.1016/j.urolonc.2014.12.014 1078-1439/Published by Elsevier Inc.

guidelines [1]. However, not all research questions are appropriate for a clinical trial. The benefits of a clinical trial are outweighed if (1) the intervention poses undue harm or risk to participants (e.g., a study that exposes participants to cigarette smoke), (2) the control group is denied a treatment that is known to work, (3) the trial is economically infeasible (e.g., a study that requires 15 y of follow-up). In these situations, a well-designed observational or simulation study can replace a clinical trial.

What are clinical trial phases? Because many clinical trials involve the use of drugs, it is important to review the phases of clinical trials (Fig. 1). Phase I trials are designed to establish the safety of a new drug, to determine dosing, and to better understand the pharmacokinetics of the treatment. During phase I trials, a small number of healthy volunteers are used to determine the “maximum tolerated dose.” The maximum tolerated dose is established by giving slowly increasing doses of the experimental drug to groups of 2 to 3 volunteers and observing for toxicity (i.e., dose escalation). As an example, investigators recently published a phase I trial using the liquid formulation of imiquimod in patients with non–muscle-invasive bladder cancer [2]. The investigators gave volunteers small doses of the intravesical agent and observed them for side effects.

C. Ellimoottil et al. / Urologic Oncology: Seminars and Original Investigations 33 (2015) 116–121

• Establish safety • Determine biological acvity

• Determine clinical effecveness

• Establish longterm efficacy and/or safety

Fig. 1. Clinical trial phases. (color version of the figure is available online.)

Once the safety of the treatment has been established, drug trials typically move into phase II. During phase II trials, investigators use a larger sample of volunteers to determine whether the experimental drug has biological activity. For example, investigators recently published results of a phase II trial examining the efficacy of intravesical nanoparticle albumin-bound (nab-)paclitaxel for the treatment of non–muscle-invasive bladder cancer [3]. The authors of this study used the most common phase II design (i.e., “2-stage design”). In this design, patient exposure to ineffective interventions is minimized by first giving a safe dose (established in phase I) of the drug to a small number of participants. These participants are observed for a predetermined minimum level of biological activity before additional participants are invited to join the trial. In the aforementioned study, investigators enrolled only 10 patients initially. Once the efficacy threshold was met, an additional 18 patients were enrolled. At the end of their phase II study, the authors found that 37.5% of enrollees responded to the treatment. Toxicity in these patients was documented, and they were followed up for a year to determine the durability of the response. Once efficacy is established, investigators then determine the clinical effectiveness using a phase III trial. The elements of the phase III trial are the subject of the remainder of this article. Finally, once the drug is determined to be effective and enters clinical practice, investigators may conduct a phase IV trial, which is often focused on long-term efficacy or safety of the intervention or a combination of both. What are the primary and secondary end points? The primary research question is typically a test of the effect of a specific intervention on a primary end point. Clinical trials typically have a single primary end point and secondary end points. Secondary end points may be tests of additional outcomes related to the primary research question; for example, if the primary end point is all-cause mortality, secondary end points could include serious cardiac events or specific causes of death. Investigators may also report exploratory end points or post hoc analyses. It is important to note that exploratory end points are different from primary/secondary end points because the trial was not designed to test these end points. Exploratory end points are typically used to drive future research. Investigators may choose to examine the effect of the intervention in specific subgroups (i.e., men, patients older

117

than 65 years, or patients at varying baseline risk of outcomes). All subgroup analyses should be determined a priori. In a study on nephrectomy followed by administration of interferon alfa-2b compared with alfa-2b alone for the treatment of metastatic renal cell carcinoma, subgroup analyses were performed on patients stratified by measurable disease (yes/no), performance status (0/1), and location of metastases (lung/other) [4]. Although subgroup analyses may prove to be clinically helpful, it is important for the reader to be cautious when interpreting the results. For example, if the findings are unremarkable, it may simply be a result of inadequate power, as the trial was designed around the primary question. On the contrary, if the findings are remarkable, one has to consider that in the process of testing many subgroups, some findings would positive owing to chance alone. What are appropriate end points? The selection of appropriate end points is critical for a successful clinical trial. Typically, a single end point is selected for the primary question. However, in select circumstances, the primary question may be best answered through a “composite” end point. For example, in a prospective study on the need for imaging in patients with low-risk prostate cancer, the investigators created a composite end point of negative imaging findings, which included results of bone scan, computed tomography, and magnetic resonance imaging [5]. The use of composite end points allows investigators to perform their trial with a smaller sample size. “Surrogate end points” are often used in the interest of cost and time. For example, although survival is an important end point, it is time intensive and expensive to measure in a clinical trial setting. For men with castrationresistant prostate cancer, biomarkers have been suggested as appropriate end points to assess the treatment potential of new therapeutic agents [6]. It is important for readers to assess whether the surrogate markers truly reflect important clinical end points by assessing whether the association between surrogate markers and the end points has a strong causal relationship in the patients included in the study. For example, prostate-specific antigen (PSA) is not likely to be a good surrogate marker for prostate cancer in a general screening population, but it may be adequate in a population of patients being treated for advanced cancer. Regardless of the end point selected, appropriate end points should be predetermined, assessed in all patients by reviewers blinded to treatment group, and measured in an unbiased fashion [7]. How are patients selected? Patient selection is an important component of clinical trial design and interpretation. Investigators should document, in a highly transparent manner, specific inclusion and exclusion

118

C. Ellimoottil et al. / Urologic Oncology: Seminars and Original Investigations 33 (2015) 116–121

Table 1 Methods of sampling Sampling design

Description

Advantages

Disadvantages

Convenience

A sample is generated by asking participants who are easily accessible (e.g., all patients who arrive in clinic today will be asked to enroll) A sample is randomly generated from the entire population of interest (e.g., patients are selected from a list of all patients who underwent prostatectomy in the United States) Subgroups of interest (e.g., race, ethnicity, and age) are determined a priori and random samples are drawn from within these groups All patients within a cluster are selected (e.g., teaching hospitals, all patients in Friday clinics are enrolled) Begin at a random starting point and systematically select participants (e.g., select every fifth patient who arrives in clinic)

Inexpensive and quick

Prone to bias

Unbiased and highly representative

Expensive

Study can be powered to evaluate subgroups of interest Convenient

Hard to implement

Simple to implement

If there is a pattern in the population, some subgroups may be overrepresented

Simple random

Stratified random

Cluster Systematic

criteria and the rationale for these criteria. Inclusion and exclusion criteria are important for the reader to note because these affect the generalizability of the results (i.e., external validity). Investigators often exclude patients who are not likely to benefit from the intervention, who may have severe side effects from the intervention (e.g., pregnant women), and who are unlikely to follow up or have competing risks (e.g., patients with cancer may be excluded from a noncancer trial). Once the selection criteria are established, patients are recruited using a predetermined sampling scheme. Sampling schemes include convenience sampling, simple random sampling, stratified random sampling, systematic sampling, and cluster sampling (Table 1) [7]. The choice of sampling depends on the research question. There are many common pitfalls that may occur during the patient selection and recruitment process. Readers should be critical of inclusion and exclusion criteria that threaten the generalizability of the study or introduce bias. For example, in many PSA screening trials, prostate biopsies are only performed on patients with a PSA level greater than 4 ng/ml [8]. However, this practice is based on the false assumption that no patients with a PSA level o4 ng/ml has prostate cancer. This results in biased estimates of the value of PSA testing [9]. Moreover, readers should be aware that a widely known threat to the validity of any clinical trial is the “healthy volunteer effect,” where individuals who agree to participate in a clinical trial may be fundamentally different from nonparticipants. The “healthy volunteer effect” may also bias results to the null. For example, the “healthy volunteer effect” was seen in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) where the study cohort had lower rates of overall mortality compared with the general population [10].

trials are designed to show that the intervention being studied is superior to a standard of care or no intervention (i.e., placebo). For simplicity, clinical trial designs can be categorized into several groups based on key features (Table 2). The control group can receive no intervention, a placebo, standard of care, or the same intervention at a different time (e.g., cross over). The ethics of placebocontrolled trials have been widely debated [11]. Advocates of placebo-controlled trials state that new treatments that are no better than the existing therapies may still be beneficial if they have a favorable side effect profile or are effective in subgroups. Therefore, the study of the intervention against placebo alone is appropriate. On the contrary, critics state that if an effective therapy is available, it is unethical to withhold treatment to conduct a placebo-controlled trial. Investigators who choose to perform a placebo-controlled trial should report a compelling reason why the intervention was not compared with a standard of care. Most trials are designed to show that the intervention is superior to the control. As an alternative, investigators may design a “noninferiority” trial. In this trial design, the new intervention is not expected to be better than the standard of care. Noninferiority trials are appropriate if the side effect Table 2 Typical clinical trial designs Trial design

Description

Parallel design Cross over

Intervention and control group are compared simultaneously All patients will eventually receive treatment; however, treatment occurs at different time (patients “cross over” from one treatment group to another at a specified time point) Intervention group is compared with historical outcomes. This is generally considered a weak design compared with others

How are clinical trials designed? Investigators may choose from a number of different trial designs predicated on their specific research question. Most

Prone to cluster-based bias

Historical control

C. Ellimoottil et al. / Urologic Oncology: Seminars and Original Investigations 33 (2015) 116–121

profile of the intervention is superior to the standard of care. For example, several studies have used the noninferiority design to show the cancer-control equivalence of intermittent and continuous hormone therapy [12]. Because intermittent hormone therapy has a lower rate of side effects, the noninferiority design is appropriate in this situation.

119

Simple randomizaon

I = Intervenon group C = Control group

All Parcipants

Block randomizaon Block 1

How is sample size determined? Although the statistical calculations used to determine sample size are beyond the scope of this review, it is important to emphasize the fact that sample size should be determined a priori and reported transparently. Sample size is determined by the primary end point variable type (e.g., continuous, dichotomous, or event rates) and planned analysis (e.g., logistic regression or survival analysis), the significance level (alpha), the power (1-beta), and the desired magnitude of the difference in treatment response. Clinical trials should have sufficient statistical power to detect differences in the primary end point. In some instances, interim analyses are performed, and if they show a difference before the predetermined sample size is reached, the trial may be stopped. For example, in a recent trial comparing enzalutamide and placebo in patients with metastatic castration-resistant prostate cancer without prior chemotherapy, the primary end point was overall survival and progression-free survival. In this study, sample size was determined to be 1,680 patients (assuming an alpha of 0.05). Investigators planned an interim analysis of overall survival after 67% of the required events (i.e., deaths) occurred. At the time of the interim analysis, the study demonstrated that enzalutamide had significantly better rates of progression-free survival and overall survival [13]. Accordingly, this study was stopped after interim analysis results were reported. How are patients randomized? Proper randomization is essential to prevent biased results [14,15]. In theory, randomization produces intervention and control groups that are similar on all important characteristics that may bias the study findings (i.e., confounders). Although there are many nuances to consider in the randomization process, for purposes of this overview, we discuss the 3 common randomization methods (Fig. 2). Simple randomization involves randomizing a patient based on a coin flip, random number allocation, or any other technique where there is no prespecified method to ensure that the randomization is balanced. Subjects may be randomized using a 1:1 allocation ratio or an alternative ratio (e.g., 2:1 or 3:1). With a large group of participants, balance between the number of participants in the intervention and control group is usually achieved. In contrast to simple randomization, blocked randomization involves assigning participants to groups (e.g., group of 8) and then

All Parcipants

Block 2

Strafied randomizaon Clinic 1 All Parcipants

Clinic 2

Fig. 2. Randomization schemes. C ¼ control group; I ¼ intervention group. (color version of the figure is available online.)

equally randomizing participants within the group. Blocked randomization ensures that the total numbers of patients in the control and intervention group is balanced. Stratified randomization involves developing specific strata (e.g., age groups, race categories, or practice) and randomizing within these strata [16]. For example, in their study of decision aids for prostate cancer screening, investigators randomized patients to in a 1:1:1 ratio to the web decision aid, print decision aid, or usual care. In this study, randomization was stratified by practice and race (white, African American, or other) [17]. There are 2 main advantages of stratified randomization. First, it ensures that randomization is achieved for the prespecified variables that are potentially important confounders. Second, it may allow analysis of subgroups, as long as the study has enough sample size within these subgroups. How is blinding performed? Along with randomization, proper blinding is important to prevent the introduction of bias into the study results. In a single-blinded trial only the investigator is aware of the intervention that the participant received. Single-blinded trials are necessary is some circumstances (e.g., procedure trials). For example, when investigators sought to understand the influence of bladder neck preservation on urinary continence after radical prostatectomy, they randomized 208 men to 2 different approaches to bladder neck preservation (complete or none) [18]. The study was single-blinded because the investigators were aware of the type of intervention performed, but the patients were not made aware. In a double-blinded study, neither the investigator nor the study participant is aware of the intervention received. Although less commonly seen, a triple-blinded study involves the use of a separate analytic group which is also unaware of the intervention received. Trials that are not blinded may be subject to biased outcome assessment, particularly for self-reported outcomes such as pain or

120

C. Ellimoottil et al. / Urologic Oncology: Seminars and Original Investigations 33 (2015) 116–121

quality of life; this is a common problem with surgical trials because sham surgeries are difficult to conduct.

underlying assumptions of simulation models as they assess their validity. Standard modeling guidelines have been disseminated and investigators should adhere to these practices [23].

How are data in clinical trials analyzed? The analysis of data in clinical trials is as important as the design. The principle of “intention to treat” (ITT) is the notion that after randomization, patients are analyzed based on the initial group assigned. Accordingly, if a patient switches to another group, the patient's outcome is still credited toward their originally assigned group. This analytic principle is in contrast to “per-protocol” and “astreated” designs, where only patients who complete the study using the protocol for which they were initially assigned are included in the final analysis. Per-protocol designs may influence effect size estimates if the excluded patients are systematically different from the included patients (e.g., excluded patients are less healthy than included patients are). For example, in their randomized control trial comparing monopolar to bipolar transurethral resection of bladder tumors, investigators performed both an ITT and per-protocol analysis. In their per-protocol analysis, patients who inadvertently received general anesthesia (i.e., broke protocol) were excluded [19]. In this particular study, the results were similar for both ITT and per-protocol analysis. Although considered the standard method of analyzing randomized controlled trials, ITT analysis does not account for patients who are noncompliant with their assigned treatment. For example, if patients do not adhere to the study protocol or if they cross over to the other intervention group, so-called treatment contamination is said to occur. There are several statistical methods available to adjust for bias from treatment contamination. In one such method called “contamination-adjusted intention to treat” (CA ITT), investigators use a statistical technique called instrumental variable analysis to adjust for dropout and cross over [20]. Because contamination can lead to a substantial distortion of effect size, contamination-adjusted ITT may lead to better estimates of the benefits and harms of an intervention. For example, investigators report that in the PLCO screening trial, contamination may have inflated mortality risk in the intervention arm by as much as 28% [21]. In addition to treatment bias, some randomized control trials have additional limitations such as short-term followup and a limited number of comparison groups. In these situations, simulation models can be used to enhance the interpretation of clinical trial data. For example, many argue that the results of the European Randomized Study of Screening for Prostate Cancer and PLCO cancer screening trials did not account for the long-term population effect of PSA screening [22]. Through simulation modeling, investigators have found that the benefit of PSA screening is greater in the long run than what is reported with clinical trial data. Of course, readers should always be aware of the

How should clinical trial results be reported? The importance of accurate and transparent reporting of clinical trial data cannot be overstated. An article published in Lancet reported that up to 50% of published studies had major reporting deficiencies [24]. These deficiencies included, but were not limited to, how participants were randomized, reporting sample size calculations, and defining primary end points. To improve the reporting of clinical trials, the Consolidated Standards of Reporting Trials guidelines were developed [1] (Table 3). While all of the elements of these guidelines are beyond the scope of this overview, it is important for readers to be familiar with these guidelines. Authors should report clinical trial data in a balanced manner. Both efficacy and safety parameters should be transparently reported. It is good practice to register trials on publically available databases such as clinicaltrials.gov so that the results are made available whether the findings support the author's hypothesis or not. Although all prespecified end points should be reported, primary end points should be reported before secondary end points, as the trial was designed around the primary end points. Moreover, the manuscript should not emphasize exploratory end points and post hoc analyses before the primary and secondary end points are reported. Clinical trials are at the heart of medical practice. A properly executed clinical trial can provide the evidence needed to change clinical practice, create guidelines for patient management, aid in reimbursement and policy decisions, and above all, improve patient care. This Table 3 Selected examples of CONSORT guidelines for reporting clinical trials Section

Checklist item

Introduction (1b) Structured summary of trial design, methods, results, and conclusions (2a) Scientific background and explanation of rationale Methods (3a) Description of trial design (such as parallel, factorial) including allocation ratio (6a) Completely defined prespecified primary and secondary outcome measures, including how and when they were assessed (7b) When applicable, explanation of any interim analyses and stopping guidelines (8a) Method used to generate the random allocation sequence Results (13a) For each group, the numbers of participants who were randomly assigned, received intended and were analyzed for the primary outcome (19) All important harms or unintended effects in each group Adapted from Moher et al. [1].

C. Ellimoottil et al. / Urologic Oncology: Seminars and Original Investigations 33 (2015) 116–121

overview is merely a primer for the practicing urologist interested in enhancing his or her understanding of the wealth of knowledge provided in our medical journals and how individual clinical trials may be better evaluated. References [1] Moher D, Schulz KF, Altman D. The CONSORT statement: revised recommendations for improving the quality of reports of parallelgroup randomized trials. J Am Med Assoc 2001;285:1987–91. [2] Falke J, Lammers RJM, Arentsen HC, et al. Results of a phase 1 dose escalation study of intravesical TMX-101 in patients with nonmuscle invasive bladder cancer. J Urol 2013;189:2077–82. [3] McKiernan JM, Holder DD, Ghandour RA, et al. Phase II trial of intravesical nanoparticle albumin-bound (nab-)paclitaxel for the treatment of non-muscle invasive urothelial carcinoma of the bladder after BCG treatment failure. J Urol 2014;192:1633–8. [4] Flanigan RC, Salmon SE, Blumenstein BA, et al. Nephrectomy followed by interferon alfa-2b compared with interferon alfa-2b alone for metastatic renal-cell cancer. N Engl J Med 2001;345:1655–9. [5] Zacho HD, Barsi T, Mortensen JC, et al. Prospective multicenter study of bone scintigraphy in consecutive patients with newly diagnosed prostate cancer. Clin Nucl Med 2014;39:26–31. [6] Armstrong AJ, Ferrari AC, Quinn DI. The role of surrogate markers in the management of men with metastatic castration-resistant prostate cancer. Clin Adv Hematol Oncol 2011;9:1–14:[quiz 15–6]. [7] Friedman L, Furberg CD, DeMets D. Fundamentals of clinical trials, 4th ed. Springer: 2010. [8] Gupta A, Roehrborn CG. Verification and incorporation biases in studies assessing screening tests: prostate-specific antigen as an example. Urology 2004;64:106–11. [9] Ransohoff DF, Feinstein AR. Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. N Engl J Med 1978;299: 926–30. [10] Pinsky PF, Miller A, Kramer BS, et al. Evidence of a healthy volunteer effect in the prostate, lung, colorectal, and ovarian cancer screening trial. Am J Epidemiol 2007;165:874–81. [11] Emanuel EJ, Miller FG. The ethics of placebo-controlled trials—a middle ground. N Engl J Med 2001;345:915–9.

121

[12] Niraula S, Le LW, Tannock IF. Treatment of prostate cancer with intermittent versus continuous androgen deprivation: a systematic review of randomized trials. J Clin Oncol 2013;31:2029–36. [13] Beer TM, Armstrong AJ, Rathkopf DE, et al. Enzalutamide in metastatic prostate cancer before chemotherapy. N Engl J Med 2014; 371:424–33. [14] Zelen M. The randomization and stratification of patients to clinical trials. J Chronic Dis 1974;27:365–75. [15] Kernan WN, Viscoli CM, Makuch RW, et al. Stratified randomization for clinical trials. J Clin Epidemiol 1999;52:19–26. [16] Irani J, Salomon L, Oba R, et al. Efficacy of venlafaxine, medroxyprogesterone acetate, and cyproterone acetate for the treatment of vasomotor hot flushes in men taking gonadotropin-releasing hormone analogues for prostate cancer: a double-blind, randomised trial. Lancet Oncol 2010;11:147–54. [17] Taylor KL, Williams RM, Davis K, et al. Decision making in prostate cancer screening using decision aids vs usual care: a randomized clinical trial. J Am Med Assoc Intern Med 2013;173:1704–12. [18] Nyarangi-Dix JN, Radtke JP, Hadaschik B, et al. Impact of complete bladder neck preservation on urinary continence, quality of life and surgical margins after radical prostatectomy: a randomized, controlled, single blind trial. J Urol 2013;189:891–8. [19] Burnett AL, Anele UA, Trueheart IN, et al. Randomized controlled trial of sildenafil for preventing recurrent ischemic priapism in sickle cell disease. Am J Med 2014;127:664–8. [20] Sussman JB, Hayward RA. An IV for the RCT: using instrumental variables to adjust for treatment contamination in randomised controlled trials. Br Med J 2010;340:c2073. [21] Gulati R, Tsodikov A, Wever EM, et al. The impact of PLCO control arm contamination on perceived PSA screening efficacy. Cancer Causes Control 2012;23:827–35. [22] Etzioni R, Gulati R, Cooperberg MR, et al. Limitations of basing screening policies on screening trials: The US Preventive Services Task Force and Prostate Cancer Screening. Med Care 2013;51:295– 300. [23] Caro JJ, Briggs AH, Siebert U, et al. Modeling good research practices—overview: a report of the ISPOR-SMDM Modeling Good Research Practices Task Force-1. Med Decis Making 2012;32:667– 77. [24] Chan A-W, Altman DG. Epidemiology and reporting of randomised trials published in PubMed journals. Lancet 2005;365:1159–62.

A primer on clinical trial design.

A well-designed and executed clinical trial is the gold standard of evidence-based medicine. It is important for readers to understand the rationale f...
318KB Sizes 1 Downloads 7 Views