An Overview of Methods for Comparative Effectiveness Research Anne-Marie Meyer, PhD,*,†,‡ Stephanie B. Wheeler, PhD, MPH,†,‡,§ Morris Weinberger, PhD,§,‖ Ronald C. Chen, MD, MPH,†,‡,¶ and William R. Carpenter, PhD, MHA†,‡,§ Comparative effectiveness research (CER) is a broad category of outcomes research encompassing many different methods employed by researchers and clinicians from numerous disciplines. The goal of cancer-focused CER is to generate new knowledge to assist cancer stakeholders in making informed decisions that will improve health care and outcomes of both individuals and populations. There are numerous CER methods that may be used to examine specific questions, including randomized controlled trials, observational studies, systematic literature reviews, and decision sciences modeling. Each has its strengths and weaknesses. To both inform and serve as a reference for readers of this issue of Seminars in Radiation Oncology as well as the broader oncology community, we describe CER and several of the more commonly used approaches and analytical methods. Semin Radiat Oncol 24:5-13 C 2014 Elsevier Inc. All rights reserved.

Introduction

R

apid scientific advances have led to the development of numerous alternative options for cancer prevention, early detection, treatment, and control.1 This pace of discovery has contributed to challenges in the feasibility and timeliness of conducting traditional prospective randomized controlled trials (RCTs) to comparatively examine the numerous alternatives in the context of diverse patient populations. Thanks to advances in statistical methodology and research systems, several other research approaches in addition to RCTs have been employed to develop new information regarding intervention effectiveness, estimate economic inputs and outcomes

*Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC. †Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC. ‡Cecil G. Sheps Center for Health Services Research, University of North Carolina, Chapel Hill, NC. §Department of Health Policy and Management, University of North Carolina at Chapel Hill, Chapel Hill, NC. ‖Center for Health Services Research, Durham VA Medical Center, Durham, NC. ¶Department of Radiation Oncology, University of North Carolina, Chapel Hill, NC. The authors declare no conflict of interest. Address reprint requests to Anne-Marie Meyer, 101 E. Weaver Street, CB 7293, University of North Carolina at Chapel Hill, Carrboro, NC 27599-7293. E-mail: [email protected]

1053-4296/13/$-see front matter & 2014 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.semradonc.2013.09.002

associated with each of these interventions, and inform clinical practice and policy, while meeting the high evidentiary standards of numerous cancer care stakeholders. Collectively, these approaches comprise comparative effectiveness research (CER), a broad category of outcomes research encompassing many different methods employed by researchers and clinicians from numerous disciplines. Emphasis is placed on identifying effectiveness of a treatment or intervention, defined as the evidence of benefit or harm expected in ordinary nontrial setting within heterogeneous populations, and understanding heterogeneity of treatment effects (Table 1).3-6 In recent years, CER—also referred to as patient-centered outcomes research—has garnered the advocacy and support of the Institute of Medicine,2 numerous government centers and agencies, and other national organizations representing numerous cancer stakeholders. The ultimate goal of CER is to translate and disseminate evidence into clinical practice and encourage evaluation and feedback through a “learning health care system” that continuously improves health care and outcomes.2,7 Achieving this goal necessitates the synthesis of evidence developed through research using diverse methods, including RCTs, observational studies, systematic literature reviews, and decision sciences modeling. The ability of CER researchers to expeditiously conduct much of this research is predicated on the recent rapid advances in computing, data availability and storage, and the collection of data from diverse sources, such as electronic health records, patient registries, administrative 5

A.-M. Meyer et al

6 Table 1 Efficacy and Effectiveness: Differences and Important Methodologic Considerations Definition

Efficacy

“[D]etermine whether an intervention produces the expected result under ideal circumstances*”

Important methodologic considerations  Primarily from “explanatory trials” which test effect of treatment

under ideal, highly controlled (“research”) conditions

 Treatment is often compared against a placebo  Isolates the causal effect of treatment based on prespecified

outcomes

 Typically enroll homogeneous, least complicated, younger or

healthier patients

 Designed to maximize internal validity

Attempts to answer the question: Does the treatment work in ideal patients in a controlled setting? Effectiveness “[M]easure the degree of beneficial effect under  Primarily generated from “pragmatic trials” and observational ‘real world’ clinical settings*” studies designed to inform clinical decision making  Examine treatment under “usual” or “real-world” clinical practice or conditions  Comparisons are made between 2 or more active treatments (not a placebo)  Usually collects a wider range of outcomes than explanatory trials  Includes heterogeneous patient groups with clinically relevant subgroups Attempts to answer the question: Does the treatment work in the “real-world” practice, in different types of patients with a condition (complicated and less complicated) patients—and does treatment benefit outweigh potential risk? *Gartlehner G, Hansen R, Nissman D, Lohr K, Carey T. Criteria for Distinguishing Effectiveness From Efficacy Trials in Systematic Reviews. Technical Review 12. Rockville, MD: Agency of Healthcare and Research Quality 2006.

sources, and other sources.8 Each of these data sources has well-known strengths and limitations for their use in research.8,9 In the end, CER researchers identify which of these diverse methodologies and data sources allow them to optimally examine specific research questions, and expeditiously generate timely, high-quality research findings reflective of “real-world” patient care. In this article, we present many of the different approaches and analytical methods used in CER—several of which are employed in empirical studies presented elsewhere in this issue of Seminars in Radiation Oncology—as well as their relative strengths and weaknesses (Table 2).

Randomized Trials RCTs are considered the gold standard for establishing intervention efficacy, and are most commonly applied to the evaluation of alternative treatments. By definition, RCTs randomly assign the study population to different treatment groups, follow them up prospectively, and gather data on their health care utilization and outcomes (Fig. 1). Done well, the only difference between study groups is the intervention itself. Therapeutic or explanatory RCTs typically focus on efficacy, or the effect of a treatment among highly selected patients under controlled conditions, which are intended to strengthen researchers' ability to identify and measure the effect of the treatment itself. However, because of stringent protocols and

eligibility criteria, the generalizability of study results may be limited. In contrast to this, effectiveness trials increase the generalizability of findings to broader subpopulations by enrolling more heterogeneous patients, often from more diverse settings (Table 1). However, even with well-designed effectiveness trials, it is often difficult to drill down into questions of subpopulation differences, typically because of limitations in sample size and thus analytical power. Therefore all RCTs exist on a continuum with regard to their “explanatory” vs “pragmatic” design, and their subsequent ability to answer questions of efficacy vs effectiveness.10 To be relevant to CER, RCTs often make direct comparisons between 2 active treatments (both of which have the potential to be best practice), as opposed to a placebo or “do-nothing” control arm. They must also represent study populations and settings reflecting “typical” clinical practice; and ideally include heterogeneous patient groups between which clinically relevant subgroup differences in outcomes can be detected.11,12 These considerations require larger sample sizes (which increases costs) and logistical considerations (which may challenge feasibility).4 However, there are several specific types of RCTs that are well adapted for studies of CER, including pragmatic trials, cluster-randomized trials, and adaptive trials.

Pragmatic Trials Pragmatic trials are relevant to CER as an RCT design because they represent the real-world practice.3 Their characteristics

Methods for CER

7

Table 2 Brief Overview of Key CER Study Designs Method

Strengths

Randomized trials  Represents the “real-world” practice by including diverse Pragmatic trials patients, clinical settings, and outcomes  Comparison of 2 active treatments  Better able to determine treatment “effectiveness”  Potentially able to detect treatment heterogeneity

Weaknesses  Identifying individual causal components of

treatment is challenging

 Difficult to “blind”  May be sparse regarding intermediate

outcomes

 Often requires larger sample sizes than

traditional randomized trials Clusterrandomized trials

 Avoids potential “contamination” that may occur in trials,

which randomized on a patient level  Logistically easier to implement  Strong design for studies of organizational changes, interventions and natural variation in practice patterns

 Significant design and analysis considerations

are required to account for clustering

 Requires larger sample sizes than

traditional trials

 Blinding may be impractical  Potential ethical considerations regarding

autonomous patient decision making Adaptive trials

 Can incorporate new treatments, changing technology or

 Need to be cautious of bias introduced when

even changing doses over course of trial  Can adjust sample size to avoid being underpowered  Are flexible with regard to withdrawal, lack of compliance, or lack of benefit

changing design based on interim study results (unblinding or operational bias)  Large effort required regarding planning or design and more complicated analyses  Ethical considerations can increase in complexity as trial evolves

Observational studies  Descriptive studies, useful for hypothesis generation Case reports  Mechanism to discover new, previously unobserved finding and case series

Crosssectional study

 Using existing data can reduce cost of study and provide

Case-control study

 Good for studies of rare diseases or outcomes  Can examine multiple exposures  Can be timely, easy to conduct, and less expensive than

results quickly  Can be used to provide prevalence measures on large population-based samples

prospective studies

Cohort study

 Allows for assessment of temporality of exposure and    

Systematic reviews

outcome Can directly measure incidence and risk Can study multiple outcomes Well-designed, carefully controlled studies can provide estimates of causal inference using advanced modeling Population-based cohort studies represent the CER study design with the best generalizability

 Can help to translate evidence into policy or

recommendations  Aid generalizability by including many, diverse studies  Helps establish strength, consistency, and generalizability across a population

 Can only describe occurrence of disease in

limited sample, thus has limited usefulness for CER other than hypothesis generation  Not generalizable, subject to bias  Cannot measure incidence or risk  Unable to establish temporality or causality  Challenging for rare diseases or exposures

   

Cannot directly estimate risk Unable to determine temporality or causality Susceptible to biases (eg, selection and recall) Needs careful selection of cases and controls to be valid and generalizable

 Requires large sample sizes, lengthy follow-up,

and can be expensive

 Unable to control exposure, which can change

over time

 Careful control of biases required (both

measured and unmeasured)

 Dependent on quality and validity of

published data

 Can be challenging to identify all relevant

studies

 Must be carefully designed and conducted

within established frameworks and standards

A.-M. Meyer et al

8 Table 2 Continued Method Metaanalysis

Strengths

Weaknesses

 Able to pool heterogeneous samples or data  Can be timely and relatively inexpensive compared with

conducting prospective studies  Are more objective than other types of reviews  Can provide summary estimates or explain between study heterogeneity and conflicting results

Decision analysis

 Enumerate risk, benefits, and costs within a single model  Combine evidence across diverse sources (subjective and

 Analysis must consider differences in study

design (eg, inconsistent exposure, outcome, and covariates)  Dependent on quality and validity of published data  Hard to control biases  Adequate power may be difficult to obtain from available data  Caution required regarding quality of input data,

assumptions, and generalizability of findings

objective)—provides an overall assessment of comparative outcomes based on “best” available data at present time  Can synthesized data into efficient and valid mathematical models  Can model outcomes for diverse patient groups

include the following: (1) inclusion of diverse patients drawn from diverse practices; (2) flexibility regarding which intervention a patient receives (ie, all patients may not receive the same treatment); (3) recognition that blinding is not always possible; and (4) consideration of usual practice, rather than placebo or another treatment, in control arms.10 In other words, pragmatic trials are RCTs that are designed to inform clinical practice and policy by testing interventions in the real world, which includes heterogeneous patients and variable and changing settings, situations, and outcomes. Although pragmatic trials offer a great improvement toward understanding effectiveness over efficacy, many pragmatic trials collect a minimal amount of data and focus only on key end points, rather than secondary outcomes of interest to patients, health care providers, and policy makers. As described in the introductory article of this issue of Seminars, stakeholder involvement in the design of trials can greatly enhance the ability of trials to produce outcomes relevant to these stakeholders.13

Cluster-Randomized Trials

Source Populaon

Eligible Cases

Randomizaon

Cluster-randomized trials are designed by randomizing at the group or practice level rather than at the level of individual patients, for example, organizations 1 through 5 may use “intervention A” exclusively, whereas organizations 6 through 10 use “intervention B.”14 Cluster-

randomized trials are ideal for questions related to organization, delivery, and payment of health care services15 and the Consolidated Standards of Reporting Trials has been adapted for this design.16,17 One rationale for conducting a cluster-randomized trial rather than a traditional trial is to avoid contamination between treatment arms. A recent example of a cluster-randomized trial18 demonstrates this study design—in this study, clinics were randomized to provide or not provide patient navigation for those who had an abnormal breast or colon screening tests. A traditional RCT (randomized by patient) could be at risk of “contamination” through improving care for all patients in clinics that have patient navigation services, thereby masking the potential benefit from the intervention. In contrast, randomization at the clinic level avoids this risk of contamination. Cluster-randomized trials are also logistically easier to manage because there is only 1 treatment option within each cluster or site, which makes it logistically easier to enroll patients, obtain informed consent, and communicate with patients regarding choices. However, there are important design and analysis considerations that must be considered because patients need to be comparable or exchangeable between sites such that different sites do not have systematically biased populations; if patients within sites are systematically different from patients within other sites in ways that are unmeasured and related to the outcome, then confounding may be introduced. Lastly, patients within Treatment A

Treatment B

Figure 1 Randomized trial.

Develop Outcome Do not develop outcome

Develop Outcome Do not develop Outcome

Methods for CER clustered sites can no longer be considered independent observations and as a result, appropriate statistical models need to be applied to account for the patient clustering within site. Consequently a larger sample size is required in cluster-randomized trials compared with traditional trials.

9 result, analysis of observational studies must be conducted very carefully with special attention to potential differences in patient characteristics between groups. Several study designs are commonly used for observational data, including case series, cross-sectional, casecontrol, and cohort studies.

Adaptive Trials There are numerous specific types of adaptive trials, but in general adaptive trials incorporate change or adapt in response to information generated during the trial.19,20 They can incorporate information from other areas of CER, including systematic reviews and observational studies. Instead of assuming a typical RCT hypothesis that treatments are equivalent statistically, adaptive trials allow researchers to ask questions about the probability that a specific therapy will be the best, given current knowledge.4 They also have the flexibility to incorporate new treatments into an ongoing trial if evidence emerges that an alternative treatment may be better than a poorly performing treatment under consideration (which can then be dropped from the protocol). Some adaptive trials have used a Bayesian approach and applied probabilistic statements of uncertainty based on information from both within and outside of the study. Such a design requires a priori decisions about how information is used and rules that govern how the protocol design will change, while controlling for the probability of false-positive and falsenegative conclusions.

Observational Studies Unlike RCTs, observational studies often include large, diverse populations, and allow for examination of subgroups and heterogeneity of treatment effects (ie, where different patient subgroup may experience a different outcome related to the treatment). They can be used to compare more than 1 treatment at a time and often include extensive follow-up in which the natural history of disease, including intermediate outcomes, can be observed. However, observational studies do not use randomization to assign exposure or treatment, which is the key difference from RCTs. Instead, they observe patients during the course of disease development and “actual” clinical care. Although RCTs balance patient characteristics across treatment groups through randomization, lack of randomization in observational studies necessitates the explicit measurement of numerous characteristics that may be associated with treatment selection and patient outcomes.21-24 Even then, elements of uncertainty vis-àvis unmeasured confounders will remain, and may affect observed associations between treatment alternatives and outcomes.25 In other words, the observed associations may be a function of unmeasured patient characteristics that are related to the exposure or treatment and to the outcome of interest; if such relationships exist and are not accounted for in analyses, estimates of the association between treatment and outcome would be biased. As a

Case Reports and Case Series Case reports are singular observations usually published in medical journals that focus on an observed exposure-outcome association. Sometimes a group of cases is summarized and presented together as a case series. Case series can also constitute a group of patients with a specific attribute in common such as a treatment (new therapy or device) or outcome. Such studies are not generalizable to wider groups because they represent the limited experience of one case or a group of cases. For CER, these types of studies have limited usefulness, but may be helpful in hypothesis generation and informing elements of design in subsequent studies.

Cross-Sectional Study In cross-sectional studies, a population is defined at 1 period in time and the prevalence of the exposure and outcome is simultaneously estimated (Fig. 2). Because such studies only include prevalent exposures and cases who are alive and contactable at a particular point in time, these studies are limited in their generalizability to all cases and do not represent entire exposure history or changes in exposure status over time. Consequently, it is impossible to determine the temporality of the exposure-outcome relationship, and the associations observed are purely correlational, which means that we are unable to provide evidence regarding etiology or causality. Nevertheless, many cross-sectional studies provide excellent population-based estimates of disease, health outcomes, or exposures, which can be useful for generating CER hypotheses. These studies are predominantly from governmental health and surveillance programs and include the National Health and Nutrition Examination Survey and the Nationwide Inpatient Sample. Defined Populaon within specific me period

Exposed with the Disease

Exposed without the Disease

Unexposed with the Disease

Unexposed without the Disease

Figure 2 Cross-sectional study.

A.-M. Meyer et al

10 Exposed

Develop Outcome

Selected Cases

Unexposed

Exposed Do not develop outcome

Source Populaon Exposed

Matched Controls

Unexposed

Source Populaon Develop Outcome Unexposed Do not develop Outcome

Figure 4 Cohort study. Figure 3 Case-control study.

Case-Control Study Case-control studies define and enroll a case population based on a specific outcome or event (Fig. 3). An important example of this study design is the Carolina Breast Cancer Study in which a population-based set of cancer cases were matched to a control population without breast cancer. Both groups are then compared with regard to their exposure histories or specific biological risk factors. The comparison between the groups is made within a certain time period to determine what proportion of cases vs controls was exposed or unexposed to the risk factor or intervention; this comparison generates odds ratios as a measure of the association between the exposure and outcome. Case-control studies are particularly well suited for rare outcomes, particularly those with long latency periods, because investigators can obtain a larger sample size by starting recruitment with cases without needing to follow up a population for a long time and wait for an outcome of interest to occur. For this study design to be validly used in CER, several important design factors must be considered, including selection of both cases and controls, and recall bias. Cases and controls must be carefully selected based on the study goals, design, and generalizability to an external population so that they are truly representative of the intended target population. Depending on the question, it may be important to distinguish between incident vs prevalent cases. Similarly, controls need to be selected with a great deal of planning and understanding of the underlying general population (specifically with regard to exposures). In addition, the extent to which controls match cases in terms of personal attributes and how generalizable these groups are of the general population must be considered. Finally, recall bias may be a problem because both exposure and outcome have already occurred and may be differentially recalled by study participants.

Cohort Study Cohort studies are arguably the strongest and most useful observational study design for CER. In cohort studies, exposed and unexposed individuals are enrolled in the study and then followed up through time to observe and compare the occurrence or incidence of the outcomes of interest between the 2 groups (Fig. 4). Cohort studies can be designed as either prospective or retrospective. Prospective cohort studies begin enrollment of individuals with a condition for which they may receive different interventions, but have not yet had the research outcome of interest and agree to be followed up into the future to observe for the occurrence of these outcomes.

Nested case-control studies can be done within these cohort studies to compare treatment outcomes among subpopulations of the cohort. Prospective cohort studies are a powerful study design because they include rich amounts of data on patients specifically designed for the study aims. However, like RCTs, they can be expensive, time-consuming studies. Prospective cohort studies can be single- or multi-institutional, or even population-based. The latter is an especially powerful study design for CER because of the diversity of patients and broad representation of health care institutions. This diversity maximizes generalizability of the findings to inform the effectiveness of different interventions, which is a central goal of CER. Examples of prospective cohort studies (some of which have included nested case-control studies) include the North Carolina-Louisiana Prostate Cancer Study,25,26 the Cancer Care Outcomes Research and Surveillance study,27,28 and the currently ongoing North Carolina Prostate Cancer Comparative Effectiveness and Survivorship Study. In contrast, retrospective cohort studies in CER are designed by leveraging existing data and looking back in time to create a patient cohort based on eligibility or receipt of specific interventions. This cohort of patients is then followed up toward the present to observe for specific outcomes. Retrospective cohorts can produce large, powerful sample sizes with extensive follow-up at a fraction of the cost compared with prospective studies, but sacrifices the quality and availability of important variables. Large, administrative databases, such as the Surveillance, Epidemiology and End Results (SEER)Medicare–linked dataset and Marketscan data, provide good opportunities for creating retrospective cohort studies.29

Analytical Considerations for Using Observational Study Designs Most observational datasets used in contemporary CER was not intended for research (eg, insurance claims data) or designed to answer questions regarding specific treatment or exposure comparisons. Consequently, these data may be associated with systematic error, or bias. Bias can be introduced at many points during the research process, including at study recruitment, data collection, study design, and analysis. Luckily, the increasing emphasis on CER has resulted in a continued development of methodology particularly well suited for using observational data and minimizing the risk of bias. In fact, carefully conducted analyses using observational data can yield study results regarding treatment

Methods for CER effectiveness that are nearly equivalent to those demonstrated in RCTs.28,30 Many contemporary CER methods for examining observational data focus on factors influencing treatment decisions and their relative independence from factors associated with the outcome, otherwise known as confounding. A common type of confounding is when there is an association between a particular variable and both the exposure or intervention selection and the outcome of interest. Failure to control for these simultaneous associations may lead to biased study results and incorrect conclusions. A common criticism of many observational studies is regarding their ability to control for “confounding by indication,” in which the risk of a particular outcome or severity of symptoms directly influences the treatment selection.31 Because many elements of the underlying risk profile are not measured (and sometimes not measurable), there is a risk of imbalance in the risk profiles of the comparison groups and the analysis results can be biased.32 In many epidemiologic studies, control of confounding is incorporated into analysis by restricting, matching, or using multivariate regression methods (also called “outcome modeling”). However, 2 key CER methodologies have emerged, which concentrate on understanding and balancing the 2 treatments based on these patient characteristics or potential confounders: propensity score (PS) analysis and instrumental variables (IVs).

PS Analysis PSs are implemented as a regression model of measured covariates predicting propensity for treatment or exposure to a specific intervention.23 They are specifically focused on treatment or intervention allocation, as PSs seek to balance the treated and untreated samples based on observed covariates. This emulates randomization in that it essentially creates 2 cohorts of patients who are functionally identical in regard to all measured covariates except their treatment or intervention exposure.33 In this way, confounding by these covariates is removed much in the same way randomization works. A recent example from the CER literature is an analysis of SEERMedicare data by Sheets et al29 which compared patient outcomes after different radiation techniques for prostate cancer. This study demonstrates how PS methods can be applied to balance treatment arms on a large number of baseline-measured covariates. However, the limitation of PS methodology is that it is unable to fully account for unmeasured covariates. For example, as smoking status is not captured in SEER-Medicare, PS methods are unable to ensure that different patient groups have similar smoking rates. PS methods can be used to demonstrate how well the analytical model controls for measured confounding by assessing balance across treatment groups through stratification, matching, or weighting. Therefore PS methods allow us to better observe treatment effect heterogeneity or identify barriers to treatment or exposure to the intervention. Unmeasured confounding can also be detected by observing the outlying areas (tails) of the PS distribution, where patients are being treated contrary to prediction, for

11 example, in the instances of “treatment as last resort” or “treatment withheld.”23,34,35 Once PSs have been calculated, they are typically applied in the statistical analysis either through matching, standardized mortality or morbidity ratio weights; or through inverse probability of treatment weights. Each of these approaches has specific strengths and limitations for questions of treatment comparative effectiveness.22

IV Analysis IV methods are another important methodology for CER and are able to address unmeasured or uncontrolled confounding.36 An IV is a variable that is believed to be related to treatment or intervention exposure but not directly related to the outcome. There are 3 key assumptions that need to be made when using IV methods: (1) the IV should affect or be associated with treatment or intervention exposure because of a “common cause.” (2) The IV is independent of other patient characteristics. (3) The IV should be associated with the outcome only through its association with treatment or intervention exposure (ie, no direct association with the outcome).36,37 Several types of instruments have been successfully used in CER, including distance to care, calendar time associated with receipt of treatment or intervention, and amount of copayment or coinsurance required for certain treatments or interventions.37 Like PSs, there are different ways to apply IV methods, though a standard CER approach is known as 2-stage least squares.37 This method uses a system of equations where the first stage predicts the treatment or intervention exposure as a function of the IV and other covariates and then in the second stage the outcome is modeled by replacing the exposure with the predicted values estimated in the first stage. A recent example of this approach is included in the study of radiation therapy in patients with prostate cancer by Sheets et al.29 Because of concerns related to unmeasured confounding, an institutional variable (“preference-based” instrument) was found, which was strongly predictive of treatment, met all the additional requirements of an IV, and under further testing was found to be a strong IV. The findings from this study, which were consistent using both PS and IV methods, strengthen the conclusions of this observational study.

Research Synthesis Systematic Reviews Systematic reviews have been at the forefront of CER since the 2003 Medicare Modernization Act legislation, which highlighted the work being done through the Evidence-based Practice Centers. For many governmental and medical decision makers (eg, Institute of Medicine and United States Preventive Services Task Force), systematic reviews are an essential research method required for making any practice recommendation. Consequently, there is a rich literature guiding, describing, and contrasting the different methodologies for conducting systematic reviews.6,38–43

A.-M. Meyer et al

12 Despite any differences in the prescribed approaches, all methodologies include several key components and are focused on summarizing available evidence through a systematic, structured, unbiased, and transparent process. To be considered systematic, reviews must start with an a priori framework or methodology, which guides the entire process.11,38,41 There also needs to be agreement regarding key clinical questions or decisions that need to be answered. How the question is defined can influence the entire scope and interpretation of the systematic review—including considerations for alternatives or treatments that are being compared, and the study populations and settings.44 Literature (including non–peer reviewed literature) needs to be systematically searched using predefined criteria to minimize bias and to identify all possible studies. Next all the studies are critically evaluated based on their quality, strengths, and weaknesses using preestablished criteria (eg, Consolidated Standards of Reporting Trials45 and Strengthening the Reporting of Observational Studies in Epidemiology).46 Studies that meet the inclusion criteria are then each evaluated and compared or contrasted. The evidence is then qualitatively summarized and interpreted regarding the net benefits or effectiveness of the key clinical question that was posed.

Meta-Analysis Meta-analysis is a quantitative method of combining a systematic review. This is accomplished through applying specific statistical methods that use weighted averages to synthesize and obtain a single summary estimate of effect.47 By combining data on similar studies, meta-analyses can provide greater statistical power than individual studies and enable generalization of results to larger populations. However the quality and validity of any estimate from a meta-analysis is dependent on the quality and validity of the individual studies.11,42,47,48 Careful consideration needs to be made regarding differences in study design, population, quality and measures of exposure and outcome, choice of comparator, and quality and inclusion of confounding or moderating variables.11,42 Well-conducted meta-analyses and advances in methods will continue to be in demand as CER continues to evolve and produce more and diverse comparative effectiveness studies.

Decision Analysis Decision analysis is yet another set of tools that can help researchers and policy makers synthesize evidence. Most often, decision analysis involves developing, parameterizing, and validating mathematical models that represent decision alternatives, probabilistic states of nature or events subject to chance, as well as the economic and health-related implications of those decision pathways.49,50 All of these mathematical models require simplifying real-world data with all its complexity and heterogeneity into conceptual structures and corresponding mathematical inputs that represent health and organizational systems efficiently, but with fidelity. These approaches are extremely useful for enumerating risks, benefits, and costs associated with treatment or intervention

pathways; as a result, cost-effectiveness analyses are natural applications of these models.51,52 Inputs to decision models are drawn from the published systematic review and empirical literature, ongoing RCTs and other study-specific data, large observational databases, and as a last resort, expert opinion where actual data are not available. Integrating data from a diversity of sources into mathematical models allows researchers to optimize available evidence; however, care must be taken to ensure that input data leveraged from diverse study populations are indeed representative of the target population of interest in the decision model. Sensitivity analyses allow investigators to examine the robustness of the model's findings across different ranges and distributions of multiple input parameters that may be observed in the real world. As a result, these models are well suited for projecting health and economic outcomes years into the future for specific subpopulations of interest to CER. A detailed description of decision analysis can be found in the article written by Sher and Punglia in this issue.53

Conclusion CER is a broad field of outcomes research that includes diverse research approaches and analytical methodologies. The goal of CER is to develop new knowledge regarding broad populations, and to translate and disseminate it to aid stakeholders in making informed decisions to improve health care. Despite rapid advances over the last decade, CER needs to continue developing and adopting new, improved methodologies to address questions of treatment effectiveness in the real world populations. No single methodology or study design is a panacea, but each can be effectively identified and tailored to specific CER study questions to optimize design and the value of study findings for improving population's health. By briefly introducing these methodologies and their relative strengths or weakness, we hope readers will have an appreciation for the role each method or design offers CER and clinical practice and policy.

Acknowledgments Work on this study was supported by the Integrated Cancer Information and Surveillance System (ICISS), UNC Lineberger Comprehensive Cancer Center with funding provided by the University Cancer Research Fund (UCRF) via the State of North Carolina.

References 1. Minasian LM, Carpenter WR, Weiner BJ, et al: Translating research into evidence-based practice: The National Cancer Institute Community Clinical Oncology Program. Cancer 116(19):4440-4449, 2010 2. Institute of Medicine. Initial National Priorities for Comparative Effectiveness Research. Washington, DC: National Academies Press, 2009 3. Lohr KN: Emerging methods in comparative effectiveness and safety— Symposium overview and summary. Med Care 45(10):S5-S8, 2007 4. Luce BR, Kramer JM, Goodman SN, et al: Rethinking randomized clinical trials for comparative effectiveness research: The need for transformational change. Ann Intern Med 151(3):206-W245, 2009

Methods for CER 5. Basch E, Aronson N, Berg A, et al: Methodological standards and patientcenteredness in comparative effectiveness research: The PCORI perspective. J Am Med Assoc 307(15):1636-1640, 2012 6. Chang SM: The Agency for Healthcare Research and Quality (AHRQ) Effective Health Care (EHC) program methods guide for comparative effectiveness reviews: Keeping up-to-date in a rapidly evolving field. J Clin Epidemiol 64(11):1166-1167, 2011 7. Abernethy AP, Etheredge LM, Ganz PA, et al: Rapid-learning system for cancer care. J Clin Oncol 28(27):4268-4274, 2010 8. Meyer AM, Carpenter WR, Abernethy AP, et al: Data for cancer comparative effectiveness research. Cancer 118(21):5186-5197, 2012 9. Hershman DL, Wright JD: Comparative effectiveness research in oncology methodology: Observational data. J Clin Oncol 30(34):4215-4222, Dec 2012 10. Thorpe KE, Zwarenstein M, Oxman AD, et al: A pragmatic-explanatory continuum indicator summary (PRECIS): A tool to help trial designers. J Clin Epidemiol 62(5):464-475, 2009 11. Sox HC, Goodman SN: The methods of comparative effectiveness research. Ann Rev Public Health 33:425-445, 2012 12. Institue of Medicine. Initial National Priorities for Comparative Effectiveness Research. Washington, DC: National Academies Press, 2009 13. Chen RC: Comparative effectiveness research in oncology: the promise, challenges, and opportunities. Semin Radiat Oncol 24(1):1-4, 2014 14. Campbell MK, Elbourne DR, Altman DG: CONSORT statement: extension to cluster randomised trials. BMJ 328(7441):702-708, 2004 15. Ukoumunne OC, Gulliford MC, Chinn S, et al: Methods for evaluating area-wide and organisation-based interventions in health and health care: A systematic review. Health Technol Assess 3(5):iii-92, 1999 16. Campbell MK, Elbourne DR, Altman DG, et al: CONSORT statement: Extension to cluster randomised trials. BMJ 328(7441):702-708, 2004 17. Campbell MK, Piaggio G, Elbourne DR, et al: Consort 2010 statement: Extension to cluster randomised trials. BMJ 5:e5661, 2012 18. Wells KJ, Lee JH, Calcano ER, et al: A cluster randomized trial evaluating the efficacy of patient navigation in improving quality of diagnostic care for patients with breast or colorectal cancer abnormalities. Cancer Epidemiol Biomarkers Prev 21(10):1664-1672, 2012 19. Hutson S: A change is in the wind as “adaptive” clinical trials catch on. Nat Med 15(9):977, 2009 20. Berry DA: Adaptive clinical trials: the promise and the caution. J Clin Oncol 29(6):606-609, 2011 21. Carpenter WR, Meyer AM, Abernethy AP, et al: A framework for understanding cancer comparative effectiveness research data needs. J Clin Epidemiol 65(11):1150-1158, 2012 22. Brookhart MA, Wyss R, Layton JB, Sturmer T: Propensity score methods for confounding control in nonexperimental research. Circ Cardiovasc Qual Outcomes 6(5):604-611, 2013 23. Sturmer T, Rothman KJ, Glynn RJ: Insights into different results from different causal contrasts in the presence of effect-measure modification. Pharmacoepidemiol Drug Saf 15(10):698-709, 2006 24. Rosenbaum PR, Rubin DB: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):41-55, 1983 25. Sturmer T, Schneeweiss S, Avorn J, et al: Adjusting effect estimates for unmeasured confounding with validation data using propensity score calibration. Am J Epidemiol 162(3):279-289, 2005 26. Schroeder JC, Bensen JT, Su LJ, et al: The North Carolina-Louisiana Prostate Cancer Project (PCaP): Methods and design of a multidisciplinary population-based cohort study of racial differences in prostate cancer outcomes. Prostate 66(11):1162-1176, 2006 27. Malin JL, Ko C, Ayanian JZ, et al: Understanding cancer patients’ experience and outcomes: Development and pilot study of the Cancer Care Outcomes Research and Surveillance patient survey. Support Care Cancer 14(8):837-848, 2006 28. Sanoff HK, Carpenter WR, Martin CF, et al: Comparative effectiveness of oxaliplatin vs non-oxaliplatin-containing adjuvant chemotherapy for stage III colon cancer. J Natl Cancer Inst 104(3):211-227, 2012 29. Sheets NC, Goldin GH, Meyer AM, et al: Intensity-modulated radiation therapy, proton therapy, or conformal radiation therapy and morbidity

13

30.

31. 32.

33. 34.

35.

36. 37.

38.

39.

40.

41.

42. 43.

44. 45.

46.

47. 48. 49.

50.

51.

52. 53.

and disease control in localized prostate cancer. J Am Med Assoc 307 (15):1611-1620, 2012 Hernan MA, Alonso A, Logan R, et al: Observational studies analyzed like randomized experiments: An application to postmenopausal hormone therapy and coronary heart disease. Epidemiology 19(6):766-779, 2008 Walker AM: Confounding by indication. Epidemiology 7(4):335-336, 1996 Signorello LB, McLaughlin JK, Lipworth L, et al: Confounding by indication in epidemiologic studies of commonly used analgesics. Am J Ther 9(3):199-205, 2002 D'Agostino RB Jr.: Stat Med 17(19):2265-2281, 1998 Sturmer T, Rothman KJ, Avorn J, et al: Treatment effects in the presence of unmeasured confounding: Dealing with observations in the tails of the propensity score distribution—A simulation study. Am J Epidemiol 172 (7):843-854, 2010 Glynn RJ, Schneeweiss S, Sturmer T: Indications for propensity scores and review of their use in pharmacoepidemiology. Basic Clin Pharmacol Toxicol 98(3):253-259, 2006 Newhouse JP, McClellan M: Econometrics in outcomes research: The use of instrumental variables. Ann Rev Public Health 19:17-34, 1998 Brookhart MA, Rassen JA, Schneeweiss S: Instrumental variable methods in comparative safety and effectiveness research. Pharmacoepidemiol Drug Safe 19(6):537-554, 2010 Brouwers MC, Thabane L, Moher D, et al: Comparative effectiveness research paradigm: Implications for systematic reviews and clinical practice guidelines. J Clin Oncol 30(34):4202-4207, 2012 Institute of Medicine. Clinical Practice Guidelines We Can Trust. In: Graham R, Mancher M, Wolman D (eds): Washington, DC: National Academies Press, 2011 Relevo R, Balshem H: Finding evidence for comparing medical interventions: AHRQ and the Effective Health Care Program. J Clin Epidemiol 64(11):1168-1177, 2011 Harris RP, Helfand M, Woolf SH, et al: Current methods of the US Preventive Services Task Force: A review of the process. Am J Prev Med 20 (suppl 3):21-35, Apr 2001 Higgins J, Green S: Cochrane Handbook for Systematic Reviews of Interventions, version 5.0.2. Hoboken, NJ: Wiley & Sons, 2009 Institute of Medicine. Finding What Works in Health Care: Standards for Systematic Reviews. Washington, DC: The National Academies Press, 2011 Armstrong K: Methods in comparative effectiveness research. J Clin Oncol 30(34):4208-4214, 2012 Altman DG, Schulz KF, Moher D, et al: The revised CONSORT statement for reporting randomized trials: Explanation and elaboration. Ann Intern Med 134(8):663-694, 2001 Vandenbroucke JP, von Elm E, Altman DG, et al: Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): Explanation and elaboration. Ann Intern Med 147(8):W163-W194, 2007 Egger M, Smith GD, Phillips AN: Meta-analysis: principles and procedures. BMJ 315(7121):1533-1537, 1997 Egger M, Smith GD: Meta-Analysis. Potentials and promise. BMJ 315 (7119):1371-1374, 1997 Ragsdale C: Spreadsheet Modeling & Decision Analysis: A Practical Introduction to Management Science, ed 6 Mason, OH: Cengage Learning, 2010 Chumney E, Simpson K: Methods and Designs for Outcomes Research. Bethesda, MD: American Society of Health System Pharmacists, 2006 Drummond M, Sculpher M, Torrance G,et al: Methods for the Economic Evaluation of Health Care Programmes (ed 3). New York, Oxford University Press, 2005. Muennig P: Cost-Effectiveness Analysis in Health: A Practical Approach. San Francisco: Jossey-Bass, 2008 Sher DJ, Punglia RS: decision analysis and cost-effectiveness analysis for comparative effectiveness research—a primer. Semin Radiat Oncol 24(1): 14-24, 2014

An overview of methods for comparative effectiveness research.

Comparative effectiveness research (CER) is a broad category of outcomes research encompassing many different methods employed by researchers and clin...
495KB Sizes 0 Downloads 0 Views