Editorial

Statistical considerations in the design, analysis and interpretation of clinical studies that use patient-reported outcomes*

Statistical Methods in Medical Research 2014, Vol. 23(5) 393–397 ! The Author(s) 2013 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/0962280213498013 smm.sagepub.com

Lisa A Kammerman and Stella Grosser

This special issue of Statistical Methods in Medical Research emphasizes statistical considerations that are unique to a clinical study whose endpoint is based on a patient-reported outcome (PRO), which is any report of the status of a patient’s health condition that comes directly from the patient, without interpretation of the patient’s response by a clinician or anyone else.1 The design, analysis and interpretation of results from clinical studies that use PROs to support the efficacy of an investigational medical product are addressed by the papers in this issue. Many of the viewpoints in this issue also apply to clinician-reported outcomes and observer-reported outcomes, in addition to PROs. As Julious and Walters describe in this issue, these three types of outcomes can simply be called ‘person-reported outcomes’.2 Although the concerns and techniques discussed in the papers may pertain to the use of PROs as diagnostic tools in clinical or rehabilitation settings, these settings are not explicitly covered. The selection of a PRO instrument appropriate for the clinical study population and study objective is crucial to the successful outcome of a clinical study. In this issue, Izem et al. discuss statistical considerations regarding the choice of single-item PRO instruments and multi-item PRO instruments.3 They also describe challenges that must be addressed when existing instruments are adapted for use in clinical studies. In addition to the use of existing instruments, a second approach is to develop a new PRO instrument for the study. Some statistical considerations important to the development of new PRO instruments are described next in this editorial. Increasingly, modern psychometric theory (e.g. item response theory and Rasch models) and classical test theory are being used to develop new PRO instruments. In this issue, Massof provides an overview of these methods and their implications for the validation and scoring of instruments.4 This overview should be helpful to statisticians who need to understand these approaches and the roles of these approaches in the development of new, validated instruments for use in clinical studies. The potential for differential item functioning (DIF) and, more specifically, intervention-specific DIF needs to be considered when an instrument is being selected for a clinical study.4 Massof alerts Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, MD 1, USA *This editorial reflects the views of the authors and should not be construed to represent the US Food and Drug Administration views or policies. Corresponding author: Lisa A Kammerman, Center for Drug Evaluation and Research, Food and Drug Administration, One Medimmune Way, Gaithersburg, MD, USA. Email: [email protected]

394

Statistical Methods in Medical Research 23(5)

us to problems associated with ‘intervention-specific DIF’, which arises when a medical intervention affects specific symptoms (i.e. specific items) measured by an instrument instead of the intervention affecting all symptoms (i.e. all items) measured by the instrument. Interventionspecific DIF is especially important in studies that use computer adaptive testing (CAT), as discussed in the next paragraph. The use of CAT in clinical studies has not yet gained acceptance in regulatory submissions for several reasons. Using modern psychometric theory, CAT relies on a fixed set of items (i.e. item bank) to generate a questionnaire specific to the health status of an individual subject. Thus, within a clinical study, not only will different subjects be asked different items, the items for a subject may change throughout the study as the health status of the subject changes. For instance, in a randomized controlled study whose primary outcome is change from baseline in a PRO that is determined by CAT, the items asked at the end of the study may differ from those asked at baseline. The use of different items between subjects and within subjects is concerning because, as Massof discusses, CAT is sensitive to intervention-specific DIF, and ‘the magnitude of the treatment effect can be manipulated by controlling the numbers of responsive relative to unresponsive items in the instrument’.4 After a PRO instrument has been selected for use in a clinical study, an important decision is how to transform the instrument outcome into a study endpoint.3 The endpoint can take many forms such as, for example, a continuous variable or a dichotomized variable. All of the papers address the types of PRO endpoints that can be used in clinical studies. In addition, Julious and Walter, and Cappelleri and Bushmakin specifically address problems associated with the loss of information resulting from dichotomizing subjects as either responders or non-responders.2,5 The use of responder analyses is better suited for interpreting the results of a clinical study and for communication of results. Sample size calculations require an understanding of treatment effect sizes but, because there is limited experience with new PRO endpoints, identifying a difference between treatment groups that is deemed clinically important is very challenging. In this issue, Julious and Walters present an anchor-based approach to estimating treatment effect sizes that can be used to calculate sample sizes for clinical studies that use PRO endpoints or other endpoints.2 This approach, which requires data from a study that uses both a ‘gold standard’ and the PRO of interest, is illustrated by several examples. In addition, Massof cautions that different instruments will give different estimates of the size of the treatment effect due to differences in item content, even though the instruments are purportedly measuring the same latent variable,4 suggesting that estimates of treatment effect sizes for one instrument may not be applicable to another instrument designed to measure the same latent variable. Missing data that are unique to PRO endpoints – missing items and missing questionnaires – have important implications for statistical analysis plans. In this issue, Bell and Fairclough provide a rich source of information and guidance on this topic,6 which should be of interest to statisticians involved in any type of clinical study, not just those that rely on PRO endpoints. Consistent with the National Academy of Sciences (NAS) report on missing data,7 which emphasizes the need to minimize missing data through study design and conduct, Bell and Fairclough recommend study design features to help minimize missing data, and the collection of auxiliary data in order to help assess the nature of the missing data and to improve model estimates. For missing items, they point out that the commonly used half-mean imputation approach (i.e. impute if at least half of the items are missing) is generally valid because items are typically correlated with each other; however, the approach may be inappropriate for items that are ordered, for example, from little difficulty to a high degree of difficulty.

Kammerman and Grosser

395

The analysis and interpretation of PRO endpoints are more adversely affected by missing questionnaires than by missing items. To assist with these challenges, Bell and Fairclough describe approaches to help understand the mechanisms of missing data, discuss imputation methods and recommend inferential analyses that use all PRO data collected throughout the clinical study.6 Their preferred methods of analysis use models that assume data are missing at random; for example, mixed models for repeated measures. Importantly, they point out that assumptions for analysis with such models are more relaxed than assumptions for ‘simple’ models that assume data are missing completely at random; for example, last observations carried forward, summary measures, repeating univariate analyses at each time point, etc. They recommend conducting sensitivity analyses through the use of models with different assumptions, a recommendation consistent with the NAS report7, and prefer the use of sensitivity models that assume missing not at random. Izem et al. present a regulatory case study that illustrates missing PRO data that could have been avoided with a different study design, and how missing data for one PRO endpoint was informative and was actually an indicator of poor treatment benefit.3 In this issue, Cappelleri and Bushmakin also address some general missing data considerations.5 Data arising from PRO instruments, particularly instruments that contain multiple domains, present unique multiplicity challenges that need to be considered in a statistical analysis plan.3,8 Moreover, statistical hypotheses need to be stated clearly in order to help determine the type of statistical tests and analyses that will be used to help establish efficacy as assessed by a PRO endpoint or endpoints. Much of this will depend on what a drug developer seeks to include in product labeling. Best practices and methods for interpreting and communicating the results of PRO endpoints are evolving. Two widely used approaches for interpreting PRO results – anchored-based methods and distribution-based methods – and their variations are reviewed by Cappelleri and Bushmakin, who discuss important limitations of these methods and illustrate the methods with numerous examples.5 The authors introduce a third approach, mediation models, whose goal is to explain the degree to which the effect on the PRO (e.g. sleep quality) is directly due to the medical intervention and how much is indirectly due to the effect of the medical intervention on another variable (e.g. pain). This approach accounts for a medical product’s mechanism of action. Also evolving are methods for presenting PRO results in product labeling, study reports, literature and other communications so that the results are easily understood by health care providers and patients. Regulatory guidance documents recommend reporting the percentages of responders or displaying cumulative distribution functions.1,8 Other types of graphical displays could also be considered. Cappelleri and Bushmakin discuss these methods and their limitations.5 Additionally, Bell and Fairclough recommend reporting simple descriptions of missing data.6 We strongly recommend that authors and investigators define terms used to describe changes within a subject, average group changes within a treatment arm and average differences between treatment arms. The lack of consensus in the literature, as noted by Cappelleri and Bushmakin, creates confusion. In our view, Cappelleri and Bushmakin adopt a best practice by defining the terminology they use in their paper: ‘clinical important change’ (i.e. within-person changes) and ‘clinical important difference’ (i.e. differences between groups).5 To illustrate the potential for confusion, other terms used in this issue and in peer-reviewed literature are ‘clinically important difference’, ‘minimally clinical important difference’, ‘minimum important difference (MID)’, ‘minimum clinically important difference (MCID)’, ‘clinically meaningful difference’, ‘minimal important difference’, ‘clinical meaningful difference’ and ‘minimal clinically important difference’2,5,9 and not all terms are used consistently. Although the Food and Drug

396

Statistical Methods in Medical Research 23(5)

Administration (FDA) draft guidance on PROs used the term MID, this term does not appear in the final guidance due, partially, to the lack of consensus on the definition and the role of MID.10–13 The increased use of PROs in clinical studies is likely to continue as pharmaceutical companies, academia and others create consortia, public/private partnerships and other entities to develop PROs that can be submitted to the FDA’s and the European Medicines Agency’s (EMEA) qualification programs.14,15 A goal of these programs is to speed drug development through early and frequent interactions with the regulatory bodies in order for a PRO or biomarker to be qualified for a specific use in clinical studies. When a tool is qualified, the tool will be available for others to use in their drug development programs and, therefore, avoid challenges currently faced when using and adapting existing instruments for use in clinical studies.3 The development and approval of these drug development tools typically take place outside a specific drug development program. The PRO Consortium, a program of the Critical Path Institute (http://c-path.org/PRO.cfm), is an example of a public–private partnership that is developing PRO instruments for use in several medical areas.16 The need for statistical planning and innovation to address the increasing use of PROs in regulatory submissions, and the desire to know the effect of medical interventions on how clinical trial subjects function, feel and survive motivated this issue. The increase in regulatory submissions has resulted in guidance documents issued by both the United States FDA and EMEA.1,17 Innovative statistical approaches are continuing to be developed to address the development and analysis of PRO instruments.

Note Lisa A Kammerman’s current affiliation is AstraZeneca but all the work was done while she was employed by the US Food and Drug Administration. References 1. United States Food and Drug Administration. Guidance for industry: patient-reported outcome measures: use in medical product development to support labeling claims, http://www.fda.gov/downloads/Drugs/ GuidanceComplianceRegulatoryInformation/Guidances/ UCM193282.pdf (2009, accessed 25 February 2013). 2. Julious SA and Walters SJ. Estimating effect sizes for health-related quality of life outcomes. Stat Methods Med Res 2014; 23: 430–439. 3. Izem R, Kammerman LA and Komo S. Statistical challenges in drug approval trials that use patient-reported outcomes. Stat Methods Med Res 2014; 23: 398–408. 4. Massof R. A general theoretical framework for interpreting patient-reported outcomes estimated from ordinally scaled item responses. Stat Methods Med Res 2014; 23: 409–429. 5. Cappelleri JC and Bushmakin AG. Interpretation of patient-reported outcomes. Stat Methods Med Res 2014; 23: 460–483. 6. Bell ML and Fairclough DL. Practical and statistical issues in missing data for longitudinal patient reported outcomes. Stat Methods Med Res 2014; 23: 440–459. 7. Panel on Handling Missing Data in Clinical Trials; National Research Council. The prevention and treatment of missing data in clinical trials. Washington, DC: National Academies Press, 2010. 8. United States Food and Drug Administration. Guidance for industry: clinical studies section of labeling for human prescription drug and biological products – content and

format, http://www.fda.gov/downloads/ RegulatoryInformation/Guidances/ucm127534.pdf (2006, accessed 28 February 2013). 9. Redelmeier DA, Guyatt GH and Goldstein RS. Assessing the minimal important difference in symptoms: A comparison of two techniques. J Clin Epidemiol 1996; 49: 1215–1219. 10. United States Food and Drug Administration. Draft guidance for industry: patient-reported outcome measures: use in medical product development to support labeling claims, http://www.fda.gov/OHRMS/DOCKETS/98fr/ 06d-0044-gdl0001.pdf (2006, accessed 26 February 2013). 11. International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Comments on the FDA Draft Guidance for industry – patient-reported outcome measures: use in medical product development to support labeling claims, http://www.ispor.org/workpaper/ ISPOR%20Response%20to%20FDA%20PRO%20Guidance.pdf (2006, accessed 26 February 2013). 12. Bjorner JB, Gandek B, Cole J, et al. Response to the FDA Draft Guidance for industry document: patient-reported outcome measures: use in medical product development to support labeling claims (Docket 2006D-0044), http:// escholarship.umassmed.edu/cgi/ viewcontent.cgi?article¼1610&context¼qhs_pp (2006, accessed 26 February 2013). 13. UCB, Inc. Comment on draft guidance for industry: patient-reported outcome measures: use in medical

Kammerman and Grosser product development to support labeling claims, http:// www.fda.gov/ohrms/dockets/dockets/06d0044/06d-0044c000004-01-vol1.pdf (2006, accessed 25 February 2013). 14. United States Food and Drug Administration. Draft guidance: qualification process for drug development tools, http://www.fda.gov/downloads/Drugs/ GuidanceComplianceRegulatoryInformation/Guidances/ UCM230597.pdf (2012, accessed 25 February 2013). 15. European Medicines Agency. Qualification of novel methodologies for drug development, http:// www.emea.europa.eu/docs/en_GB/document_library/ Regulatory_and_procedural_guideline/2009/10/ WC500004201.pdf (2009, accessed 25 February 2013).

397 16. Coons SJ, Kothari S, Monz BU, et al. The PatientReported Outcome (PRO) Consortium: filling measurement gaps for PRO endpoints to support labeling claims. Clin Pharmacol Therap 2011; 90: 743–748. 17. European Medicines Agency, Committee for Medicinal Products for Human Use (CHMP). Reflection Paper on the Regulatory Use of Health Related Quality of Life (HRQL) measures in the evaluation of medicinal products, Committee for Medicinal Products for Human Use http:// www.emea.europa.eu/docs/en_GB/document_library/ Scientific_guideline/2009/09/WC500003637.pdf (2009, accessed 25 February 2013).

Statistical considerations in the design, analysis and interpretation of clinical studies that use patient-reported outcomes.

Statistical considerations in the design, analysis and interpretation of clinical studies that use patient-reported outcomes. - PDF Download Free
88KB Sizes 0 Downloads 8 Views