Editor’s Perspective Of the Importance of Motherhood and Apple Pie What Big Data Can Learn From Small Data Véronique L. Roger, MD, MPH

D

ata integrity and validity are challenging topics to get published in medical journals. The clinical implications of these matters are not always intuitive to the reader, and data quality seldom makes headlines in the medical literature, unless something goes wrong, in which case-related conversations can make their way to the court rooms and the lay press.1 Yet, the cornerstone of our ability to make robust inference and sound clinical decisions is the assumption of the validity, accuracy, and representativeness of medical research data. Over the past few years, several articles in Circulation: Cardiovascular Quality and Outcomes have reminded us of the importance to formally examine such assumptions. Registries occupy a central role in the clinical research landscape by allowing to pool observational clinical data into larger data sets to enhance statistical power and to draw more robust inference. Registries afford expansive population coverage and contain abundant clinical data that enable valuable risk adjustment. Most importantly, registries are intended to reflect real-world practice. However, as participation is voluntary, the representativeness of patients in registries cannot be assumed as institutions that participate may differ from those that elect not to.2 Furthermore, the completeness of the data must be verified as illustrated by a study of the Can Rapid Risk Stratification of Unstable Angina Patients Suppress Adverse Outcomes with Early Implementation of the ACC/ AHA Guidelines (CRUSADE) National Quality Improvement Initiative pertaining to patients with non–ST-segment–elevation acute coronary syndromes, where it was identified that records often lacked key clinical elements of the history and physical examination.3 Hence, the quality control of registries is critical to their value for outcomes research and clinical care. Regular audits of registries to assess their quality are thus essential to their use. In Circulation: Cardiovascular Quality and Outcomes, Ferreira et al4 reported on an audit of a national acute coronary syndrome registry. Patients who were included in the registry were less likely to die in hospital and more likely to receive recommended treatments than those who were not included. Furthermore, among 129 registries on acute coronary The opinions expressed in this article are not necessarily those of the American Heart Association. From the Division of Cardiovascular Diseases, Department of Health Sciences Research, Mayo Clinic, Rochester, MN. Correspondence to Véronique L. Roger, MD, MPH, Mayo Clinic, 200 First St, SW, Rochester, MN 55905. E-mail [email protected] (Circ Cardiovasc Qual Outcomes. 2015;8:329-331. DOI: 10.1161/CIRCOUTCOMES.115.002115.) © 2015 American Heart Association, Inc. Circ Cardiovasc Qual Outcomes is available at http://circoutcomes.ahajournals.org DOI: 10.1161/CIRCOUTCOMES.115.002115

syndromes, audits were seldom mentioned for recruitment/ inclusion (29%) or data abstraction (37%). More recently, also in Circulation: Cardiovascular Quality and Outcomes, Reeves et al5 evaluated the accuracy of the Michigan Stroke Registry by linking to a state-wide hospital discharge database (considered as the gold standard). The Michigan Stroke Registry is a part of the Paul Coverdell Acute Stroke Registry, a multistate hospital-based registry and quality improvement program overseen by the Centers for Disease Control and Prevention.6 Among 26 hospitals in the Michigan Stroke Registry, the sensitivity of case ascertainment was 69% and the positive predictive value was 89%. The sensitivity was lower in teaching hospitals and primary stroke centers, but was higher in the sites that used prospective ascertainment. Through this process, some hospitals that had changed their ascertainment method were identified, illustrating that, much like laboratory assays, registries can drift as several hospitals had changed their procedures in a way that substantially impacted the integrity of the data without knowledge of the registry staff.5 Overall, an evaluation of 153 national registries in the United States delineated substantial opportunities to enhance the robustness of registries and the quality of data collection and analyses to optimize their clinical use.7 Strikingly, only 18% of registries indicated that they audit the data. These articles provide a valuable illustration of the importance of methodological discipline applied to registries: it is our responsibility as researchers and as clinicians to make sure that the data we collect are accurate and complete and that it remains that way over time. Registries must implement and report regular quality control measures to optimize the quality of their data, which must be evaluated by conducting and periodically repeating audits. Administrative data are another cornerstone of the outcomes research toolbox, which increasingly relies on administrative data using International Classification of Diseases-Ninth Edition codes for case finding and often ascertainment. In 2014 of Circulation: Cardiovascular Quality and Outcomes, Kumamaru et al8 reported on the validity of diagnostic coding algorithms for identifying stroke in the Medicare population by linking data from the Reasons for Geographic And Racial Differences in Stroke (REGARDS) study to Medicare claims. Expert-adjudicated strokes from medical records served as the gold standard. Claims-based algorithms had high positive predictive value and specificity to identify stroke supporting their use as outcomes. However, these algorithms were unsuitable to estimate stroke incidence because of low sensitivity. More recently, also on the same topic and in Circulation: Cardiovascular Quality and Outcomes, Thigpen et al9 reported on the accuracy of International Classification of Diseases-Ninth Edition codes to identify stroke, and atrial fibrillation compared with medical record review. Reliance on

Downloaded from http://circoutcomes.ahajournals.org/ 329at University of Pittsburgh--HSLS on August 9, 2015

330   Circ Cardiovasc Qual Outcomes   July 2015 International Classification of Diseases-Ninth Edition codes to identify patients with stroke and atrial fibrillation resulted in notable inaccuracies of similar magnitude across 3 different institutions: Boston Medical Center, Geisinger Health System, and the University of Alabama. Taken collectively, these studies underscore that administrative data without event validation should be interpreted with caution and that the appropriateness of their use will depend on the goal of the study. The critical principles of data accuracy and integrity are conceptually undisputable and universally agreed on, much like the proverbial motherhood and apple pie. Why is it important to reflect on these principles at this time? First, because as any accepted rule, they can be taken for granted and forgotten. The aforementioned examples clearly illustrate that the value of registries to advance science and improve care is at stake if rigorous quality control is not implemented. As the appetite for registries grows both for discovery research and for quality improvement, standard operating procedures must ensure that these critical quality control steps are regularly applied. Centers for Medicare and Medicaid Services created a program entitled, Qualified Clinical Data Registries, which compiles registries in which eligible practitioners can participate.10 This is a laudable effort to optimize the quality of registries that should pave the road for widespread efforts, such as the steps deployed in the National Cardiovascular Data Registry.11 Second and more importantly, these timeless principles have to be explicitly brought to the forefront of our collective thinking as clinical and population sciences research enter the brave new world of Big Data. Regardless of the definitions, often disparate and vague, that surround the concept of big data, the potential of new tools to collect and analyze larger data sets is intuitively recognized by most, and large-scale repositories, disease-agnostic or disease-specific,12 are proliferating, and the appeal of electronic data is amplified by the current fiscal constraints. On the healthcare delivery side, in recent years, providers have been incentivized to shift from paper to electronic medical records (EMR), thereby greatly increasing the volume and accessibility of health data available to clinicians, researchers, and patients. Furthermore, other data sources, which encompass mobile health devices, claims, and administrative databases, expand the scope of the data that can be collected and linked to EMRs. These are unprecedented opportunities to not only facilitate the capture of exposure and outcomes in a conventional hypothesis-driven approach but also mine these vast data repositories agnostic to any hypothesis after an in silico discovery science approach applied to large volumes of clinical and healthcare data. These are incredibly thrilling promises, for some akin to moonshots. To realize this exciting potential, the optimal methods to acquire, validate, standardize, and analyze high volumes of data generated by different sources must be delineated, codified, and broadly shared with the scientific community. Recommendations to move forward in this direction have been put forth by several organizations including the National Cancer Institute13 and the National Heart Lung and Blood Institute.14 These recommendations vigorously underscored that determining the validity/accuracy/integrity of data is critical steps to progress from collecting data to generating information and deriving knowledge. Much remains to be

done to execute on this directive. As far as EMRs, cautionary reports on their use for research and performance/quality assessment are surfacing,15–17 As we acknowledge that more work is needed to improve our operational understanding of the research applications of EMRs, we nevertheless conceptually understand the limitations of the use of medical records for research, a topic familiar to most providers and clinical researchers. Indeed, the migration to electronic platforms cannot be expected to alleviate issues related to missing data, uneven reliability, incomplete documentation, and inconsistencies across providers and over time.17 Analogous limitations pertaining to completeness of records have also been reported to some extent for claims data but could benefit from more attention to fully delineate the best methods to be applied to and the most appropriate research questions to be addressed by claims data.18 The complexity of the questions increases significantly when one considers emerging data sources. What does data integrity, validity, and representativeness mean for persongenerated health data, including from mobile devices, social media platforms, and other patient-generated data sources? In this regard, the early experience of the Beacon Community Cooperative Agreement Program created by the Office of the National Coordinator for Health Information Technology is particularly informative. Several Beacon Communities launched community-based mobile health programs and shared useful lessons learned.19 These early considerations, including the recognition of key barriers to mobile health use, some related to socioeconomic status, underscore the pressing need to address among other topics, the implications of the digital divide on health disparities. Although many questions remain unanswered and much work remains to be carried out, it is imperative to do so. Bad data are costly to business as Experian underscored, suggesting that 75% of businesses are wasting and 14% of revenue is because of poor data quality.20,21 If this is the case for business, one can only imagine the magnitude of the implications for clinical research and health care. Privacy issues add to the complexity of the discussion. Indeed, responsible data sharing is essential to optimize the quality of the information on the effectiveness and safety of medical care. To delineate the good common practices for responsible data sharing, the perspectives of key stakeholders must be considered. For research participants, informed consent, privacy protection, and knowledge to be gained from data sharing are essential. Researchers seek appropriate time to report their findings and hence get academic credit for their work. Considerations related to intellectual property and commercialization also come into play. Forging agreements on measures to address these concerns from these diverse perspectives will be difficult but is vital. In conclusion, it is obvious from this brief perspective that many thoughtful conversations and formal research studies are urgently needed on these topics. These endeavors must unfold in settings that foster broad engagement of key stakeholders including clinicians, clinical researchers, methodologists, patients, and their families. Finally, robust action plans must follow to be deployed with alacrity. At Circulation: Cardiovascular Quality and Outcomes, we look forward to taking an active part in this exciting journey.

Downloaded from http://circoutcomes.ahajournals.org/ at University of Pittsburgh--HSLS on August 9, 2015

Roger   What Big Data Can Learn From Small Data   331

Disclosures None.

References 1. Hoffman S, Podgurski A. The use and misuse of biomedical data: Is bigger really better? Am J Law Med. 2013;39:497–538. 2. Xian Y, Hammill BG, Curtis LH. Data sources for heart failure comparative effectiveness research. Heart Fail Clin. 2013;9:1–13. doi: 10.1016/j. hfc.2012.09.001. 3. Dunlay SM, Alexander KP, Melloni C, Kraschnewski JL, Liang L, Gibler WB, Roe MT, Ohman EM, Peterson ED. Medical records and quality of care in acute coronary syndromes: results from CRUSADE. Arch Intern Med. 2008;168:1692–1698. doi: 10.1001/archinte.168.15.1692. 4. Ferreira-González I, Marsal JR, Mitjavila F, Parada A, Ribera A, Cascant P, Soriano N, Sánchez PL, Arós F, Heras M, Bueno H, Marrugat J, Cuñat J, Civeira E, Permanyer-Miralda G. Patient registries of acute coronary syndrome: assessing or biasing the clinical real world data? Circ Cardiovasc Qual Outcomes. 2009;2:540–547. doi: 10.1161/ CIRCOUTCOMES.108.844399. 5. Reeves MJ, Nickles AV, Roberts S, Hurst R, Lyon-Callo S. Assessment of the completeness and accuracy of case ascertainment in the Michigan Stroke Registry. Circ Cardiovasc Qual Outcomes. 2014;7:757–763. doi: 10.1161/CIRCOUTCOMES.113.000706. 6. Centers for Disease Control and Prevention. The Paul Coverdell National Acute Stroke Registry. http://www.cdc.gov/dhdsp/programs/stroke_registry.htm. Accessed June 19, 2015. 7. Lyu H, Cooper M, Patel K, Daniel M, Makary MA. Prevalence and data transparency of national clinical registries in the United States. [published online ahead of print]. J Healthc Qual. 2015. doi: 10.1097/ JHQ.0000000000000001. 8. Kumamaru H, Judd SE, Curtis JR, Ramachandran R, Hardy NC, Rhodes JD, Safford MM, Kissela BM, Howard G, Jalbert JJ, Brott TG, Setoguchi S. Validity of claims-based stroke algorithms in contemporary Medicare data: reasons for geographic and racial differences in stroke (REGARDS) study linked with medicare claims. Circ Cardiovasc Qual Outcomes. 2014;7:611–619. doi: 10.1161/CIRCOUTCOMES.113.000743. 9. Thigpen JL, Dillon C, Forster KB, Henault L, Quinn EK, Tripodis Y, Berger PB, Hylek EM, Limdi NA. Validity of international classification of disease codes to identify ischemic stroke and intracranial hemorrhage among individuals with associated diagnosis of atrial fibrillation. Circ Cardiovasc Qual Outcomes. 2015;8:8–14. doi: 10.1161/ CIRCOUTCOMES.113.000371. 10. Qualified Clinical Data Registry Reporting. http://www.cms.gov/medicare/quality-initiatives-patient-assessment-instruments/pqrs/qualifiedclinical-data-registry-reporting.html. Accessed June 19, 2015. 11. Messenger JC, Ho KK, Young CH, Slattery LE, Draoui JC, Curtis JP, Dehmer GJ, Grover FL, Mirro MJ, Reynolds MR, Rokos IC, Spertus JA,

Wang TY, Winston SA, Rumsfeld JS, Masoudi FA; NCDR Science and Quality Oversight Committee Data Quality Workgroup. The National Cardiovascular Data Registry (NCDR) Data Quality Brief: the NCDR Data Quality Program in 2012. J Am Coll Cardiol. 2012;60:1484–1488. doi: 10.1016/j.jacc.2012.07.020. 12. Tu YK. Linear mixed model approach to network meta-analysis for continuous outcomes in periodontal research. J Clin Periodontol. 2015;42:204– 212. doi: 10.1111/jcpe.12362. 13. Khoury MJ, Lam TK, Ioannidis JP, Hartge P, Spitz MR, Buring JE, Chanock SJ, Croyle RT, Goddard KA, Ginsburg GS, Herceg Z, Hiatt RA, Hoover RN, Hunter DJ, Kramer BS, Lauer MS, Meyerhardt JA, Olopade OI, Palmer JR, Sellers TA, Seminara D, Ransohoff DF, Rebbeck TR, Tourassi G, Winn DM, Zauber A, Schully SD. Transforming epidemiology for 21st century medicine and public health. Cancer Epidemiol Biomarkers Prev. 2013;22:508–516. doi: 10.1158/1055-9965.EPI-13-0146. 14. Roger VL, Boerwinkle E, Crapo JD, Douglas PS, Epstein JA, Granger CB, Greenland P, Kohane I, Psaty BM. Strategic transformation of population studies: recommendations of the working group on epidemiology and population sciences from the National Heart, Lung, and Blood Advisory Council and Board of External Experts. Am J Epidemiol. 2015;181:363– 368. doi: 10.1093/aje/kwv011. 15. Chan KS, Fowles JB, Weiner JP. Review: electronic health records and the reliability and validity of quality measures: a review of the literature. Med Care Res Rev. 2010;67:503–527. doi: 10.1177/1077558709359007. 16. Parsons A, McCullough C, Wang J, Shih S. Validity of electronic health record-derived quality measurement for performance monitoring. J Am Med Inform Assoc. 2012;19:604–609. doi: 10.1136/ amiajnl-2011-000557. 17. Bayley KB, Belnap T, Savitz L, Masica AL, Shah N, Fleming NS. Challenges in using electronic health record data for CER: experience of 4 learning organizations and solutions applied. Med Care. 2013;51(8 Suppl 3):S80–S86. doi: 10.1097/MLR.0b013e31829b1d48. 18. Tyree PT, Lind BK, Lafferty WE. Challenges of using medical insurance claims data for utilization analysis. Am J Med Qual. 2006;21:269–275. doi: 10.1177/1062860606288774. 19. Abebe NA, Capozza KL, Des Jardins TR, Kulick DA, Rein AL, Schachter AA, Turske SA. Considerations for community-based mHealth initiatives: insights from three Beacon Communities. J Med Internet Res. 2013;15:e221. doi: 10.2196/jmir.2803. 20. Adamson G. The well-oiled data machine: Data is the oil that keeps the business cogs turning. https://www.edq.com/uk/blog/the-well-oiled-datamachine/. Accessed June 19, 2015. 21. Wheatley M. Big, bad data: What’s the big deal over data quality? http:// siliconangle.com/blog/2014/07/25/big-bad-data-whats-the-big-deal-overdata-quality/. Accessed June 19, 2015. Key Words: data science ◼ epidemiology ◼ population ◼ registries

Downloaded from http://circoutcomes.ahajournals.org/ at University of Pittsburgh--HSLS on August 9, 2015

Of the Importance of Motherhood and Apple Pie: What Big Data Can Learn From Small Data Véronique L. Roger Circ Cardiovasc Qual Outcomes. 2015;8:329-331; originally published online July 14, 2015; doi: 10.1161/CIRCOUTCOMES.115.002115 Circulation: Cardiovascular Quality and Outcomes is published by the American Heart Association, 7272 Greenville Avenue, Dallas, TX 75231 Copyright © 2015 American Heart Association, Inc. All rights reserved. Print ISSN: 1941-7705. Online ISSN: 1941-7713

The online version of this article, along with updated information and services, is located on the World Wide Web at: http://circoutcomes.ahajournals.org/content/8/4/329

Permissions: Requests for permissions to reproduce figures, tables, or portions of articles originally published in Circulation: Cardiovascular Quality and Outcomes can be obtained via RightsLink, a service of the Copyright Clearance Center, not the Editorial Office. Once the online version of the published article for which permission is being requested is located, click Request Permissions in the middle column of the Web page under Services. Further information about this process is available in the Permissions and Rights Question and Answer document. Reprints: Information about reprints can be found online at: http://www.lww.com/reprints Subscriptions: Information about subscribing to Circulation: Cardiovascular Quality and Outcomes is online at: http://circoutcomes.ahajournals.org//subscriptions/

Downloaded from http://circoutcomes.ahajournals.org/ at University of Pittsburgh--HSLS on August 9, 2015

Of the Importance of Motherhood and Apple Pie: What Big Data Can Learn From Small Data.

Of the Importance of Motherhood and Apple Pie: What Big Data Can Learn From Small Data. - PDF Download Free
461KB Sizes 0 Downloads 9 Views