LWW/JACM

JACM-D-14-00013

May 19, 2014

16:49

J Ambulatory Care Manage Vol. 37, No. 3, pp. 206–210 C 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Copyright 

Big Data and the Electronic Health Record Steve G. Peters, MD; James D. Buntrock, MS

Abstract: The electronic medical record has evolved from a digital representation of individual patient results and documents to information of large scale and complexity. Big Data refers to new technologies providing management and processing capabilities, targeting massive and disparate data sets. For an individual patient, techniques such as Natural Language Processing allow the integration and analysis of textual reports with structured results. For groups of patients, Big Data offers the promise of large-scale analysis of outcomes, patterns, temporal trends, and correlations. The evolution of Big Data analytics moves us from description and reporting to forecasting, predictive modeling, and decision optimization. Key words: Big Data, computerized records, electronic health record, electronic medical record

T

HE Mayo Clinic, established in Rochester, Minnesota, now also comprises clinics in Phoenix/Scottsdale, Arizona, and in Jacksonville, Florida, as well as the Mayo Clinic Health System in Minnesota, Iowa, and Wisconsin, and an expanding affiliated care network of sites that have access to Mayo Clinic knowledge assets and expertise. Mayo Clinic personnel include more than 4000 physicians and scientists, 3400 residents, fellows, and students, and 54 000 allied health staff. Approximately 1.2 million patients are seen each year, with more than 130 000 hospital admissions. In this environment, there are fundamental needs for information sharing, analysis, and reporting.

Author Affiliations: Division of Pulmonary and Critical Care (Dr Peters) and Information Management and Analytics, Department of Information Technology (Dr Buntrock), Mayo Clinic, Rochester, Minnesota. The authors have declared no financial conflict of interest. Correspondence: Steve G. Peters, MD, Mayo Clinic, 200 First St SW, Rochester, MN 55905 (Peters [email protected]). DOI: 10.1097/JAC.0000000000000037

The electronic health record (EHR) environment of Mayo Clinic evolved over many years and currently consists of 2 vendors and 3 versions of a core electronic medical record (EMR), with a number of integrated departmental systems, and overlying viewing and workflow applications. In this context, the term “EHR” refers to the traditional medical record plus capabilities for health information exchange, patient access and engagement, and Meaningful Use certification. This “best of breed” approach requires additional effort for clinical practice integration and retrieval of data for analytics and research. As examples, we have developed desktop and mobile applications to synthesize clinical notes, reports, and laboratory results along with data from emergency departments, intensive care units, operating rooms, and other procedural locations. The use of modular systems increases complexity and the need to maintain multiple interfaces. An EMR that provides a more open, “service-oriented architecture” offers the potential for other programs to call data as needed. The advantages of multiple interoperable modules include the ability to use custom tools for specific departments or specialties, the creation of optimized viewing

206 Copyright © 2014 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

LWW/JACM

JACM-D-14-00013

May 19, 2014

16:49

Big Data and the Electronic Health Record tools, and flexibility of workflow (eg, viewing clinical notes and radiographic images of one patient while answering a question about laboratory values of another patient, without losing context). Extensive clinical decision support (CDS) is built into the core EMRs, with supplemental alerting systems drawing information from the EMR or running on a separate clinical data repository. An intensive care unit– based organ system view allows for rapid assessment of complex patients and is built on an infrastructure that allows for a variety of alerts (Herasevich et al., 2010; Pickering et al., 2013). We provide links to Mayovetted knowledge at the point of care through an application named AskMayoExpert. This repository includes general information, frequently asked questions, algorithmic care process models, and online access to experts in each area. Another locally developed application, Generic Disease Management System, provides contextual clinical information and CDS and has been documented to improve preventive services and chronic disease management (Chaudhry et al., 2009, 2012, 2013; DeJesus et al., 2012). To help meet data needs of a large institution for clinical practice, education, research, and administration, the Mayo Clinic Enterprise Data Trust was developed to support the retrieval and analysis of structured information (Chute et al., 2010). New databases are being developed to store and organize massive amounts of genetic and genomic information alongside other clinical results, to use such data for biomedical research, and to provide CDS. Many individual specialty divisions and departments have developed their own “datamarts” and registries for practice, research, education, quality improvement, and reporting. Increasingly, such efforts are hindered by data that may be contained in a loosely structured report, a textual document, scanned image, or other medium such as a video file. Traditional clinical data repositories are failing to meet these needs. Big Data refers to a set of technologies, evolved from the Internet search era, that

207

allow large-scale data management, processing, and indexing. These new technologies allow the processing of large and complex data sets that defy usual search and retrieval. Much has been made of the revolutionary potential of Big Data in business intelligence, analytics, and management (McAfee & Brynjolfsson, 2012; Moore et al., 2013). The data are massive in volume and variety, often semior unstructured. Early accounts of Big Data described “three Vs,” referring to Volume, Velocity, and Variety of information flow. In health care, volume can be massive both in the number of data artifacts (eg, clinical documents) and in terms of the size of a particular artifact (eg, the whole genome of a patient). Velocity refers to the rate of change of data, often connected with analysis of streaming information, typically in real-time (eg, bedside monitoring). Variety suggests the complexity of diverse data sources (eg, video, wireless remote monitor, digital genomic information). Others have added a fourth “V” for Veracity, to describe a need for verification of data extracted from disparate and potentially unreliable or incomplete sources. Finally, a fifth “V” has been offered for Value, a timely and essential consideration in health care. We have used a descriptive equation for medical care in which Value = Outcomes + Safety + Service/ Costs. Big Data techniques provide opportunities to analyze and process complex medical information in ways traditional relational techniques could not perform. As such, the costs of Big Data infrastructure and analytics begin to be justified both by benefits to patient care and by the generation of new insights for the organization. We consider Big Data solutions when the problem or question calls for analysis of raw data of various types. Often, the analysis requires review of all available data rather than a structured subset. Big Data techniques are considered when data do not fit well into traditional relational databases. Furthermore, we seek such analytics either for the discovery of new patterns and correlations or for exploratory analysis. Big Data allows the intake and storage of data independent of their final use. By using “late binding” techniques,

Copyright © 2014 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

LWW/JACM

208

JACM-D-14-00013

May 19, 2014

16:49

JOURNAL OF AMBULATORY CARE MANAGEMENT/JULY–SEPTEMBER 2014

different perspectives of data can be processed without extensive modeling, standardization, and integration upfront. The initial applications of Big Data have focused on the processing, management, and indexing of clinical documents (clinical notes, pathology, radiology, and operative reports) combined with Natural Language Processing methods to identify clinical findings, complications, and other clinical statements of significance. We have begun to explore clinical and research utility of Big Data in several ways. As an example, a patient referred to Mayo Clinic typically might bring outside records that contain a large amount of textual documentation and digital images. Summarizing such information currently is a time-consuming and laborious task. This represents a burden on the provider and a risk to the patient in that an incomplete or erroneous transfer of information could occur. Tools that could digest and summarize scanned text, digital reports, and images would be invaluable in the delivery of effective and efficient care. As another example, identifying and caring for a cohort of patients with heart failure might require analysis of problem lists, clinical notes, echocardiographic findings, laboratory studies, and medication lists. The clinician might wish to know whether the treatment plan follows recommended guidelines. The administrator asks how many such patients have we admitted, at what length of stay, and at what cost. The researcher seeks other variables in physiology and outcome. Big Data analysis offers the promise of fulfilling such needs. We have used Natural Language Processing along with traditional data retrieval in several specific examples of augmented CDS. The accuracy of a seemingly simple finding such as patient cigarette smoking status is enhanced by analyzing the text of clinical documents (Savova et al., 2008). An advisory regarding the need for colonoscopy analyzes previous endoscopy reports, textual findings, clinical problems, pathology reports, and published guidelines to suggest the timing of subsequent study. Similarly, recommendation for cervical cancer screening is based upon age, clini-

cal problems and previous findings, presence of abnormal cytology, viral studies, and prior surgical procedures (Wagholikar et al., 2012, 2013). Complex decision trees requiring disparate types of data are greatly facilitated by these techniques. A rapidly emerging application of Big Data techniques is the analysis of the relationships among genetic and clinical findings. As complex genomic data are collected, the correlation with phenotype requires accurate retrieval of clinical conditions. Beyond findings such as problem lists, medications, or billing diagnoses, the use of Natural Language Processing in electronic records can improve sensitivity and specificity of case identification (Kho et al., 2011). One application of such techniques is the identification of patients eligible for clinical trials (Conway et al., 2011). The Electronic Medical Records and Genomics Network (eMERGE) consortium was developed to test and improve the ability to identify phenotypic cohorts from multiple EMRs and to merge these findings with disparate genotypic data (Chute et al., 2013; Kho et al., 2011; McCarty et al., 2011). Genome-wide associate studies are underway for analysis of genetic determinants of multiple conditions including cardiovascular disease, diabetes, and cancer (Bielinski et al., 2011). These studies pose typical Big Data challenges in the amount, complexity, and variety of source information. Big Data, across millions of patients, offers the opportunity for large-scale population analytics and research. In January 2013, Mayo Clinic and Optum Labs (Cambridge, Massachusetts) announced a collaborative research and development effort to leverage Mayo Clinic patient-centered research expertise with more than 100 million lives of administrative (claims) data and more than 30 million lives of clinical data. A risk inherent in the statistical analysis of a massive data set is that correlations will be identified, which, while unlikely to be due to chance alone, do not imply causation. Rather, correlations can be used either to generate hypotheses or to help answer predefined questions. At Mayo Clinic, the studies are overseen by the Robert

Copyright © 2014 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

LWW/JACM

JACM-D-14-00013

May 19, 2014

16:49

Big Data and the Electronic Health Record D. and Patricia E. Kern Center for the Science of Healthcare Delivery. How is the obesity epidemic affecting the need for joint replacement surgery at a young age, and subsequent health care costs? Are newer cardiac medications, antidiabetic agents, or anticoagulants cost-effective compared with previous therapy? In chronic renal failure, are outcomes improved if kidney transplantation is performed before hemodialysis becomes necessary? It is very difficult to address these questions from traditional structured databases. The Optum Labs infrastructure uses both traditional database and Big Data technologies for supporting these large-scale analytics. In an advanced EHR, Big Data could be brought to bear at the point of care of an individual patient. Faced with a complex history and clinical features, can the conditions be accurately and completely characterized? Have we seen similar patients? If so, which treatments were associated with better outcomes? These questions require the ability to consume and understand information from multiple sources (eg, problem list, patient history, family history, laboratory results, clinical notes, pathology reports, surgical reports, genetic studies) and to compare and analyze the

209

data across the set of all patients and similar cases. Finally, decision support tools can deliver recommended actions and possible outcomes. In traditional data analytics, structured and modeled data are loaded (often as a “batch”), organized, and processed to generate a standard output such as a report or dashboard. In Big Data analytics, there is rapid ingestion of large amounts of structured and unstructured information, which is transformed and modeled (near real-time) for each intended use. We have great experience with standard reports and ad hoc queries. Big Data moves us along the continuum from descriptive (summary reports, dashboards, scorecards) to predictive (modeling, probabilities, forecasting) and ultimately prescriptive (recommended actions, likelihood of outcomes, decision optimization) analytics. Big Data has the capability to transform patient care, population management, and health care analytics, extending our capabilities to use cases that have posed significant challenges in traditional environments. It compliments and enhances existing and future systems and offers to help fulfill the unmet promises of the EHR.

REFERENCES Bielinski, S. J., Chai, H. S., Pathak, J., Talwalkar, J. A., Limburg, P. J., Gullerud, RE, . . . de Andrade, M. C. (2011). Mayo Genome Consortia: A genotype-phenotype resource for genome-wide association studies with an application to the analysis of circulating bilirubin levels. Mayo Clinic Proceedings, 86(7), 606–614. Chaudhry, R., Schietel, S. M., North, F., Dejesus, R., Kesman, R. L., & Stroebel, R. J. (2013). Improving rates of herpes zoster vaccination with a clinical decision support system in a primary care practice. Journal of Evaluation in Clinical Practice, 19(2), 263–266. Chaudhry, R., Tulledge-Scheitel, S. M., Parks, D. A., Angstman, K. B., Decker, L. K., & Stroebel, R. J. (2012). Use of a web-based clinical decision support system to improve abdominal aortic aneurysm screening in a primary care practice. Journal of Evaluation in Clinical Practice, 18(3), 666–670. Chaudhry, R., Tulledge-Scheitel, S. M., Thomas, M. R., Hunt, V. L., Liesinger, J. T., Rahman, A. S., . . . Stroebel, R. J. (2009). Clinical informatics to improve quality of

care: A population-based system for patients with diabetes mellitus. Informatics in Primary Care, 17(2), 95–102. Chute, C. G., Beck, S. A., Fisk, T. B., & Mohr, D. N. (2010). The Enterprise Data Trust at Mayo Clinic: A semantically integrated warehouse of biomedical data. Journal of the American Medical Informatics Association, 17(2), 131–135. Chute, C. G., Ullman-Cullere, M., Wood, G. M., Lin, S. M., He, M., & Pathak, J. (2013). Some experiences and opportunities for Big Data in translational research. Genetics in Medicine, 15(10), 802–809. Conway, M., Berg, R. L., Carrell, D., Denny, J. C., Kho, A. N., Kullo, I. J., . . . Pathak, J. (2011). Analyzing the heterogeneity and complexity of electronic health record oriented phenotyping algorithms. AMIA Annual Symposium Proceedings/AMIA Symposium, 2011, 274– 283. DeJesus, R. S., Angstman, K. B., Kesman, R., Stroebel, R. J., Bernard, M. E., Scheitel, S. M., . . . Chaudhry, R.

Copyright © 2014 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

LWW/JACM

210

JACM-D-14-00013

May 19, 2014

16:49

JOURNAL OF AMBULATORY CARE MANAGEMENT/JULY–SEPTEMBER 2014

(2012). Use of a clinical decision support system to increase osteoporosis screening. Journal of Evaluation in Clinical Practice, 18(1), 89–92. Herasevich, V., Pickering, B. W., Dong, Y., Peters, S. G., & Gajic, O. (2010). Informatics infrastructure for syndrome surveillance, decision support, reporting, and modeling of critical illness. Mayo Clinic Proceedings, 85(3), 247–254. Kho, A. N., Pacheco, J. A., Peissig, P. L., Rasmussen, L., Newton, K. M., Weston, N., . . . Denny, J. C. (2011). Electronic medical records for genetic research: Results of the eMERGE consortium. Science Translational Medicine, 3(79), 79re71. McAfee, A., & Brynjolfsson, E. (2012). Big Data: The management revolution. Harvard Business Review, 90(10), 60–66, 68, 128. McCarty, C. A., Chisholm, R. L., Chute, C. G., Kullo, I. J., Jarvik, G. P., Larson, E. B., . . . eMERGE Team. (2011). The eMERGE Network: A consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Medical Genomics [Electronic Resource], 4, 13. Moore, K. D., Eyestone, K., & Coddington, D. C. (2013).

The big deal about Big Data. Healthcare Financial Management, 67(8), 60–66, 68. Pickering, B. W., Gajic, O., Ahmed, A., Herasevich, V., & Keegan, M. T. (2013). Data utilization for medical decision making at the time of patient admission to ICU. Critical Care Medicine, 41(6), 1502–1510. Savova, G. K., Ogren, P. V., Duffy, P. H., Buntrock, J. D., & Chute, C. G. (2008). Mayo Clinic NLP system for patient smoking status identification. Journal of the American Medical Informatics Association, 15(1), 25–28. Wagholikar, K. B., MacLaughlin, K. L., Henry, M. R., Greenes, R. A., Hankey, R. A., Liu, H., . . . Chaudhry, R. (2012). Clinical decision support with automated text processing for cervical cancer screening. Journal of the American Medical Informatics Association, 19(5), 833–839. Wagholikar, K. B., MacLaughlin, K. L., Kastner, T. M., Casey, P. M., Henry, M., Greenes, R. A., . . . Chaudhry, R. (2013). Formative evaluation of the accuracy of a clinical decision support system for cervical cancer screening. Journal of the American Medical Informatics Association, 20(4), 749–757.

Copyright © 2014 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

Big data and the electronic health record.

The electronic medical record has evolved from a digital representation of individual patient results and documents to information of large scale and ...
73KB Sizes 2 Downloads 4 Views