EDITORIAL For reprint orders, please contact: [email protected]

Informatics grand challenges in multi-institutional comparative effectiveness research Comparative effectiveness research (CER) has the potential to radically transform the healthcare delivery system by identifying those therapies, procedures, preventive tests, and healthcare processes and devices that are most effective from the standpoints of cost, quality and safety. The widespread adoption and conduct of CER will place enormous and growing demands on the existing clinical informatics research infrastructure [1]. Many attempts have been made to use existing, large-scale healthcare claims data for CER, with varying levels of success. In an attempt to improve the accuracy of these studies, and with increasing availability of data generated by electronic health records (EHRs), researchers are beginning to explore the use of large databases of clinical information derived from EHRs. In a recent article in the journal Medical Care, we described the results from ‘A survey of informatics platforms that enable distributed comparative effectiveness research using multi-institutional heterogeneous clinical data’ [2]. For that study, we used an eight-dimension, sociotechnical model [3] to compare and contrast informatics platforms under development or in-use in six, large, federally funded CER efforts. Briefly, we identified six generic steps as necessary for any distributed, multi-institutional CER project, namely: data identification, extraction, modeling, aggregation, analysis and dissemination (Figure 1). In addition, we identified several key challenges that all CER researchers and their informatics’ partners must address as we move forward. While each of these challenges was being addressed in at least one of the projects we reviewed, none of these challenges was being addressed in a comprehensive manner by all of the projects. After careful consideration, we realized that solutions to each of these challenges are crucial for continued progress in this difficult, yet critically important, field of research. As such we have labeled these as ‘grand challenges’ in the hopes of inspiring new and existing informatics researchers to begin (or continue) working on them in earnest. The following sections briefly describe each of these grand challenges. Development of methods to establish unique patient identifiers for all patients: to allow correct matching of patients & de-duplication of patient information across healthcare organizations

CER requires aggregation and analysis of data describing the same patient that are held by different institutions. Therefore, a grand challenge for CER that collects data across healthcare provider organizations, especially those located in the same geographic region, is the need to merge data from the same patient who has received healthcare services at multiple institutions. Since the US federal government has refused to take on the task of assigning and disseminating a unique patient identifier for all US citizens [4,101], CER research efforts require creation and maintenance of a community-wide master patient index that identifies patients based on probabilistic matches using multiple demographic data components (e.g., first name, last name, date of birth, gender, social security

10.2217/CER.12.48 © 2012 Future Medicine Ltd

1(5), 373–376 (2012)

Dean F Sittig*1,2

Brian L Hazlehurst3 School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, USA 2 UT-Memorial Herman Center for Healthcare Quality & Safety, 6410 Fannin Street, UTPB 1100.43, Houston, TX 77030, USA 3 Kaiser Permanente Center for Health Research, Portland, OR, USA *Author for correspondence: Tel.: +1 713 500 7977 [email protected] 1

part of

ISSN 2042-6305

373

EDITORIAL  

Sittig & Hazlehurst

Pharm EHR

Labs

Billing

Identification of applicable raw data within Healthcare Transaction Systems

Data extraction

Research Data Warehouse – standard format

Data modeling

Data normalized – standard meaning

Data aggregation

Data combined across patients and sites

Data analysis

Data used to answer study questions

Dissemination

Results distributed to others

Figure 1. Overview of data flow in comparative effectiveness research projects.

or telephone numbers). Currently, only CER projects that have evolved out of communitybased health information exchanges have begun to tackle this difficult problem through integration of data collected during care delivery operations. Without the ability to properly link all of a patient’s data, CER studies will vastly underestimate the ‘numerators’, or the number of unique patients who have received a specific treatment or experienced a particular outcome, and overestimate the ‘denominator’, or the total number of patients involved in the study. “As clinical practices become more

standardized, the opportunity for accurately representing similar events across settings and contexts increases, but only if common clinical vocabularies and terminologies are employed in data capture.” Development of a comprehensive data model: to facilitate storage & retrieval of key information

A data model describes how data elements are stored and the relationships between these elements [5]. CER data analysis requires the collection, storage and analysis of diverse data elements. To analyze these data, investigators must construct complex data queries. The ability of these queries to identify important events, the sequence of these events and their relationships to each other is heavily dependent on the data model chosen. Therefore, to conduct

374

J. Compar. Effect. Res. (2012) 1(5)

high-quality CER we must develop, and agree to use and share, a comprehensive, standard data model. Utilization of standard clinical vocabularies & development of shared data definitions across organizations: to facilitate data aggregation & comparisons

Data used for CER stand for clinical events in the real-world, and this ‘standing for’ relationship is what gives the data its meaning. When the same events generate different data, or when the data fail to distinguish two events that are importantly different, then the utility of these data for answering CER questions is impaired. As clinical practices become more standardized, the opportunity for accurately representing similar events across settings and contexts increases, but only if common clinical vocabularies and terminologies are employed in data capture. When the same events are represented by the same symbolic data, aggregation and comparison of events across settings is enhanced. Therefore, high-quality CER will require widespread use of standard clinical vocabularies along with common definitions of clinical concepts represented. Natural language processing capabilities: to provide access to all the information contained within the free-text portions of the patient’s record

Despite the importance of utilizing standardized vocabularies to represent clinical events, there remain reasons (both theoretical and practical) why these will fail to adequately represent all of the clinical events needed for CER [6]. Clinical situations can be complex, diverse, often unique, and technology for assisting in standardizing meaning (e.g., through encouraging standardized coding of clinical events) can sometimes create distortion. As a result, approximately half of all clinical information contained in current, state-of-the-art EHRs is in the form of free-text data. To carry out high-quality CER, investigators must be able to incorporate these data into their analyses. Existing natural language processing technologies can accurately identify and code isolated clinical concepts, for example, medications or clinical problems and more complicated concepts such as evidence that a patient has received adequate smoking cessation counseling, but they are not capable of accurately coding an

future science group

Informatics grand challenges in multi-institutional comparative effectiveness research 

entire free-text note on all dimensions of interest at once (a problem shared by ‘human coders’ as well) [7,8]. Data governance structures & processes: to facilitate the aggregation, organization, reduction &/or transformation, interpretation & synthesis of high-quality data across organizations

CER requires investigators to aggregate data from multiple organizations to answer their research questions [9]. To perform such aggregation, one must address the social, legal, ethical and political challenges involved in inter-institutional research. Friedman et al. stated that ‘organizations are understandably reluctant to move data beyond their own boundaries absent a clear and specific need to do so, and patients will be less likely to consent to allow this to happen’ [10]. Therefore, CER researchers must accomplish data aggregation while conforming to local organizations’ internal data governance rules as well as existing state and federal guidelines, not to mention some patients’ preferences to remain anonymous. Informatics platforms have been designed and developed to address and accommodate these constraints, including retaining local physical control of raw data while providing the means to aggregate limited data sets to answer specific research questions. Nonetheless, before CER can reach its fullest potential, we must develop new models of data governance that facilitate and streamline processes that better enable the analysis of inter-institutional data sets. “Failure to address one or more of these

grand challenges will lead to significant delays in the development of high-quality CER with resultant waste of money, time and effort on less-effective diagnostic and therapeutic procedures.” Health information exchange-like capabilities: to ensure the availability of complete, accurate data describing not only treatments given to patients, but also their outcomes

CER requires enormous amounts and many different types of data to create complete, patient-centered representations of patients’ diagnostic and treatment history. To be specific, CER investigators need to collect data from both inpatient and outpatient EHRs

future science group

EDITORIAL

(including free-text notes describing clinical encounters), from billing and ancillary systems such as laboratory, pharmacy, and radiology departments across multiple organizations. In addition, investigators must collect data that document the treatments that patients actually received; therefore, investigators need to collect pharmacy dispensing, patient-reported data, and death status, for example. To date, the only CER platforms that have successfully navigated these considerable challenges are those that began as health information exchanges (e.g., Indiana Network for Patient Care) or are from large health maintenance organizations that cover significant percentages of regional populations with comprehensive healthcare services (e.g., Kaiser Permanente). Therefore, we believe that in order to conduct high-quality CER, investigators need to either recreate or align themselves with community-wide health information exchange activities. Taken together, solutions to these six ‘grand challenges’ will lead to great advances in CER. Solutions will be difficult along both social and technical dimensions, and will require significant research efforts. Failure to address one or more of these grand challenges will lead to significant delays in the development of highquality CER with resultant waste of money, time and effort on less-effective diagnostic and therapeutic procedures. State and Federal governments are strongly encouraged to continue their funding of CER, so as to help citizens of the world receive the highest quality, yet affordable, healthcare. Acknowledgements The authors thank the Electronic Data Methods Forum for their encouragement and support in conducting this study.

Financial & competing interests disclosure DF Sittig is supported in part by a SHARP contract from the Office of the National Coordinator for Health Information Technology (ONC #10510592). BL Hazlehurst’s work is supported in part by grants from the National Library of Medicine (R21LM009728) and the Agency for Healthcare Research and Quality (R01HS019828, R18HS18157). The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed. No writing assistance was utilized in the production of this manuscript.

www.futuremedicine.com

375

EDITORIAL  

Sittig & Hazlehurst

References 1

2

3

376

VanLare JM, Conway PH, Rowe JW. Building academic health centers’ capacity to shape and respond to comparative effectiveness research policy. Acad. Med. 86(6), 689–694 (2011). Sittig DF, Hazlehurst BL, Brown J et al. A survey of informatics platforms that enable distributed comparative effectiveness research using multi-institutional heterogenous clinical data. Med. Care 50(Suppl.), S49–S59 (2012). Sittig DF, Singh H. A new sociotechnical model for studying health information technology in complex adaptive healthcare systems. Qual. Saf. Health Care 19(Suppl. 3), i68–i74 (2010).

4

Stolberg SG. Health identifier for all Americans runs into hurdles: privacy debate heats up. The New York Times, 20 July (1998).

5

Kahn MG, Batson D, Schilling LM. Data model considerations for clinical effectiveness researchers. Med. Care 50, S60–S67 (2012).

6

Rector AL. Clinical terminology: why is it so hard? Methods Inf. Med. 38(4–5), 239–252 (1999).

10 Friedman CP, Wong AK, Blumenthal D.

Hazlehurst B, Sittig DF, Stevens VJ et al. Natural language processing in the electronic medical record: assessing clinician adherence to tobacco treatment guidelines. Am. J. Prev. Med. 29(5), 434–439 (2005).

■■ Website

7

8

Hazlehurst B, Frost HR, Sittig DF, Stevens VJ. MediClass: a system for detecting

J. Compar. Effect. Res. (2012) 1(5)

and classifying encounter-based clinical events in any electronic medical record. J. Am. Med. Inform. Assoc. 12(5), 517–529 (2005). 9

Feblowitz JC, Wright A, Singh H, Samal L, Sittig DF. Summarization of clinical information: a conceptual model. J. Biomed. Inform. 44(4), 688–699 (2011). Achieving a nationwide learning health system. Sci. Transl. Med. 2(57), 57cm29 (2010).

101 Health Insurance Portability and

Accountability act of 1996. Public law 104-191; 104th Congress. 21 August 1996. http://aspe.hhs.gov/admnsimp/pl104191.htm

future science group

Informatics grand challenges in multi-institutional comparative effectiveness research.

Informatics grand challenges in multi-institutional comparative effectiveness research. - PDF Download Free
812KB Sizes 0 Downloads 0 Views