 EDITORIAL

‘Big data’ reporting guidelines HOW TO ANSWER BIG QUESTIONS, YET AVOID BIG PROBLEMS

D. C. Perry, N. Parsons, M. L. Costa From University of Warwick, Coventry, United Kingdom

The extent and depth of routine health care data are growing at an ever-increasing rate, forming huge repositories of information. These repositories can answer a vast array of questions. However, an understanding of the purpose of the dataset used and the quality of the data collected are paramount to determine the reliability of the result obtained. This Editorial describes the importance of adherence to sound methodological principles in the reporting and publication of research using ‘big’ data, with a suggested reporting framework for future Bone & Joint Journal submissions. Cite this article: Bone Joint J 2014;96-B:1575–7.

 D. C. Perry, FRCS(Tr&Orth), PhD, NIHR Clinical Lecturer  M. L. Costa, FRCS(Tr&Orth), PhD, Professor of Trauma and Orthopaedic Surgery, Associate Editor of Research Methods Warwick Clinical Trials Unit, University of Warwick, Gibbet Hill Road, Coventry, CV4 7AL, UK.  N. Parsons, MSc, PhD, Medical Statistician Statistics and Epidemiology, Warwick Medical School, University of Warwick, Gibbet Hill Road, Coventry, CV4 7AL, UK. Correspondence should be sent to Mr D. C. Perry; e-mail: [email protected] ©2014 The British Editorial Society of Bone & Joint Surgery doi:10.1302/0301-620X.96B12. 35027 $2.00 Bone Joint J 2014;96-B:1575–7.

In 2012 it was estimated that 90% of the data in the world was created in the preceding two years.1 Such data are collected for a range of purposes including hospital administration, patient care records, monitoring of vital signs, pharmaceutical sales, disease registries or implant registries. Databases can be used in isolation, or linked, to provide huge volumes of information that can inform a massive array of issues. Collectively these are known as ‘big’ data. Big data offers the potential to answer an unprecedented variety of questions; many of which it would have been impossible to even contemplate only a few years ago. It is ideal for tracking trends in practice over time, planning service delivery across a healthcare system and may be the only way to detect rare adverse events, such as complications after surgery. Reflecting on those published in the Bone & Joint Journal (BJJ) during the last year, examples of published studies using big data include the use of UK primary care and hospital inpatient data,2,3 a range of international disease and procedural registries4-10 and several studies which have linked datasets to expand the possibilities that can be achieved by any single dataset.4,7,9 However, while big data creates a raft of opportunity, there also comes an array of difficulties. When it comes to using big data for research, perhaps the key concern is that the datasets were often never designed to answer the question under investigation. This creates a series of methodological issues.

Misclassification bias If codes used for the disease or procedure classification are incomplete, this may result in a

VOL. 96-B, No. 12, DECEMBER 2014

misclassification bias for the disease of interest. For example if a primary care database were used to assess details of soft-tissue injuries presenting to the emergency department, more complicated cases requiring hospitalisation or follow-up are likely to generate more primary care correspondence. The coding of data from complex soft-tissue injuries is therefore likely to be more complete than minor cases, with the resulting analysis suggesting a falsely high proportion of complex soft-tissue injuries.

Lumping Sub-groups of patients are often lumped together for the purposes of coding or billing, which may lead to erroneous conclusions. Therefore an analysis of arthroplasty revisions using hospital episode statistics alone would fail to recognise the nuances that may emerge due to the type of prosthesis used. Confounders When attempting to link risk factors to outcomes, routine datasets seldom contain all of the confounding factors of relevance to the research; it is easy to arrive at the conclusion that a particular risk is responsible for the outcome, whereas a secondary factor not recorded in that dataset, may be responsible. A trauma database may identify that motor vehicle trauma is more common at night, suggesting darkness is the major determinant. However, alcohol consumption is independently associated with both darkness and motor vehicle accidents, and therefore significantly confounds this association. Without adequate knowledge of confounders, we are unable to quantify the true effect of sunlight. 1575

1576

D. C. PERRY, N. PARSONS, M. L. COSTA

Table I. BJJ Big Data Interim Reporting Guidelines (to be used in addition to the STROBE statement): a checklist of items that should be included in reports of the analysis of routine administrative/ healthcare datasets.

Methods Dataset

Variables

Item number

Recommendation

1

Describe the nature of dataset(s) used. In particular: - The purpose of the dataset – e.g. observational research registry, national audit programme, administrative dataset (linked to financial remuneration or service delivery). This should include details of the funding of the dataset, and the organisation(s) responsible for the administration and oversight. The mechanism of data collection – e.g. ‘data is entered by practitioners as part of routine care, and this information forms the routine patient care record’, ‘data is entered by a research associate, for the sole purpose of the research registry’. - Data quality/completeness – Provide a summary of the general quality of the dataset. This should include internal validity – i.e. data-validation performed by the organisation responsible for the oversight of the dataset – and external validity – i.e. data-validation performed by external bodies. - Denominators – Does the dataset have an appropriate denominator inbuilt, or does the dataset relate to a denominator population that can readily be determined (i.e. geographic or otherwise)? - Linkage – The nature of any linkage of datasets. The variable(s) used to link datasets. The process of linkage –i.e. who performed the linkage (i.e. if two or more datasets were linked which body performed the linkage?). How were unlinked data points managed? - Define database codes for outcomes, exposures, predictors, potential confounders, and effect modifiers. Give diagnostic criteria/ algorithm, if applicable. Describe the process used to generate this code list. The full list of diagnostic/ procedural codes used should be made available as an online appendix. - Validation of variables used – Define any process of validation (internal or external validation) specific to the codes used in the analysis, i.e. if the code for hip replacement is pivotal to the analysis how is it known that hip replacement is well recorded?

2

Results Significance 3

If an outcome score is used, comment on whether the results have both statistical and clinical significance – e.g. is the magnitude of the outcome difference both bigger than the minimal clinically important difference and statistically significant?

To be used as an adjunct to the STROBE statement. Information on the STROBE Initiative is available at www.strobe-statement.org

Proxy outcomes The outcomes collected in routine administrative data may not be the outcome of interest to clinicians or patients. A proxy or surrogate measure may therefore be used because it is routinely recorded within the dataset, e.g. an admission to hospital within 28 days of surgery may be used as a proxy measure of a serious complication, irrespective of the cause of the readmission. Power Somewhat unintuitively, the statistical power of the data may be unduly great, such that even a minute clinical difference may become statistically significant. In reality this may offer no clinical benefit to patients, though the demonstration of statistical significance may offer unnecessary importance to the result, e.g. a 5 ml difference in blood loss after arthroplasty may be demonstrable, but there are few instances that this would be of clinical relevance. Future directions Unless we are guided by the well-established principles of scientific rigour that we expect in more conventional studies, analytics of big data have the potential to propagate spurious conclusions. A recent editorial has highlighted the benefits and flaws of joint registry data,11 particularly emphasising their role as hypothesis-generating tools, rather than a replacement for well-designed prospective randomised clinical trials. In order to adequately appraise any scientific research, understanding the methodological process is paramount. This is no different for big data. Methodological clarity is

essential to empower onlookers to reach conclusions, based on the evidence presented. For big data, the reader needs an appreciation of what data are collected, the purpose of the collection, what is missing from the dataset and the methods of quality control and audit within the database. These factors may have profound influences on the data, i.e. one may expect a disease register to have greater diagnostic specificity than a hospital discharge dataset, or a database of disease formed for the purpose of practice/surgeon remuneration to be more complete than one that is not, or a dataset where the results are validated against an external dataset to be more accurate than one that is not. Transparent systematic reporting of the methodology is therefore essential, in the same manner as it is for any piece of scientific work. Good practice and transparency in the reporting of research methodology is encouraged through the use of reporting guidelines. Such guidelines are well accepted in study designs such as clinical trials and meta-analyses, using the well-established Consolidated Standards of Reporting Trials (CONSORT)12 and Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)13 statements, respectively. These guidelines have been shown to improve the quality of scientific reporting,14 and are widely endorsed by editorial boards, including that of the BJJ. The development of international reporting guidelines is overseen by the Enhancing the QUAlity and Transparency Of health Research (EQUATOR) Network,15 which is an international body seeking to ensure that scientific reporting is both transparent and accurate. THE BONE & JOINT JOURNAL

‘BIG DATA’ REPORTING GUIDELINES

Reporting guidelines for big data are currently in development by the EQUATOR Network, using the Delphi method, which is a formalised assimilation of expert consensus.16,17 These guidelines, named the REporting of studies Conducted using Observational Routinely-collected Data (RECORD) statement, are based on the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement18 and is believed to be approximately one year from publication. However, the number of journal submissions relating to big data is increasing rapidly, and a uniform reporting framework is urgently needed. We have therefore established interim reporting guidance, to accompany the STROBE statement, for submissions relating to big data (Table I). New submissions relating to the analysis of routine administrative healthcare data (i.e. hospital episode statistics, clinical practice research datalink),19 the analysis of data from national audit or registries (i.e. National Hip Fracture Database20 and National Joint Registry),21 or the analysis of large-scale observational studies (i.e. birth cohorts such as the Avon Longitudinal Study of Parents and Children (ALSPAC)22 should henceforth adhere to this guidance. It is anticipated that these guidelines will enable methodological transparency, thereby enabling readers to better interpret the information being provided, and allow authors to understand the requirements for publication. In the fullness of time the RECORD statement will supersede these guidelines and become the standard framework for submissions relating to big data. No benefits in any form have been received or will be received from a commercial party related directly or indirectly to the subject of this article.

References 1. Hilbert M. How much information is there in the “information society”? Significance 2012;9:8–12. 2. White JJ, Titchener AG, Fakis A, et al. An epidemiological study of rotator cuff pathology using The Health Improvement Network database. Bone Joint J 2014;96B:350–353. 3. Judge A, Murphy RJ, Maxwell R, Arden NK, Carr AJ. Temporal trends and geographical variation in the use of subacromial decompression and rotator cuff repair of the shoulder in England. Bone Joint J 2014;96-B:70–74. 4. Baker PN, Rushton S, Jameson SS, et al. Patient satisfaction with total knee replacement cannot be predicted from pre-operative variables alone; a cohort study from the National Joint Registry for England and Wales. :Bone Joint J 2013;95B:1359–1365.

VOL. 96-B, No. 12, DECEMBER 2014

1577

5. Jameson SS, Baker PN, Mason J, et al. Independent predictors of failure up to 7.5 years after 35 386 single-brand cementless total hip replacements: a retrospective cohort study using National Joint Registry data. Bone Joint J 2013;95-B:747–757. 6. Maletis GB, Inacio MC, Desmond JL, Funahashi TT. Reconstruction of the anterior cruciate ligament: association of graft choice with increased risk of early revision. Bone Joint J 2013;95-B:623–628. 7. Bernhoff K, Rudström H, Gedeborg R, Björck M. Popliteal artery injury during knee replacement: a population-based nationwide study. Bone Joint J 2013;95B:1645–1649. 8. Bini SA, Chen Y, Khatod M, Paxton EW. Does pre-coating total knee tibial implants affect the risk of aseptic revision? Bone Joint J 2013;95-B:367–370. 9. Försth P, Michaëlsson K, Sandén B. Does fusion improve the outcome after decompressive surgery for lumbar spinal stenosis?; a two-year follow-up study involving 5390 patients. :Bone Joint J 2013;95-B:960–665. 10. Gøthesen O, Espehaug B, Havelin L, et al. Survival rates and causes of revision in cemented primary total knee replacement: a report from the Norwegian Arthroplasty Register 1994-2009. Bone Joint J 2013;95-B:636–642. 11. Konan S, Haddad FS. Joint Registries: a Ptolemaic model of data interpretation? Bone Joint J 2013;95-B:1585–1586. 12. Moher D, Hopewell S, Schulz KF, et al. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. BMJ 2010;340:869. 13. Liberati A, Altman DG, Tetzlaff J, et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ 2009;339:2700. 14. Plint AC, Moher D, Morrison A, et al. Does the CONSORT checklist improve the quality of reports of randomised controlled trials? a systematic review. Med J Aust 2006;185:263–267. 15. No author listed. Equator Network: Enhancing the Quality and Transparency of health Research, 2014. http://www.equator-network.org (date last accessed 22 September 2014). 16. Langan SM, Benchimol EI, Guttmann A, et al. Setting the RECORD straight: developing a guideline for the REporting of studies Conducted using Observational Routinely collected Data. Clin Epidemiol 2013;5:29–31. 17. Benchimol EI, Langan S, Guttmann A; RECORD Steering Committee. Call to RECORD: the need for complete reporting of research using routinely collected health data. J Clin Epidemiol 2013;66:703–705. 18. von Elm E, Altman DG, Egger M, et al. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. BMJ 2007;335:806–808. 19. Jick H, Jick SS, Derby LE. Validation of information recorded on general practitioner based computerised data resource in the United Kingdom. BMJ 1991;302:766– 768. 20. No authors listed. National Hip Fracture Database: National Report, 2013. http:// www.nhfd.co.uk/20/hipfractureR.nsf/4e9601565a8ebbaa802579ea0035b25d/ 566c6709d04a865780257bdb00591cda/$FILE/onlineNHFDreport.pdf (date last accessed 22 September 2014). 21. No authors listed. National Joint Registry for England and Wales: 9th Annual Report, 2012. http://www.njrcentre.org.uk/njrcentre/Portals/0/Documents/England/ Reports/9th_annual_report/NJR%209th%20Annual%20Report%202012.pdf (date last accessed 22 September 2014). 22. Boyd A, Golding J, Macleod J, et al. Cohort Profile: the 'children of the 90s'--the index offspring of the Avon Longitudinal Study of Parents and Children. Int J Epidemiol 2013;42:111–127.

'Big data' reporting guidelines: how to answer big questions, yet avoid big problems.

The extent and depth of routine health care data are growing at an ever-increasing rate, forming huge repositories of information. These repositories ...
204KB Sizes 2 Downloads 10 Views