STATISTICS IN MEDICINE, VOL. 10, 559-564 (1991)

USING ROUTINELY COLLECTED DATA FOR CLINICAL RESEARCH CHARLES SAFRAN Charles A . Dana Research Institute. and the Harvard Thorndike Laboratory of Beth Israel Hospital, the Department of Medicine. Beth Israel Hospital. and the Center for Clinical Computing, Harvard Medical School, 350 Longwood Avenue, Boston. MA 02115, U S A .

SUMMARY Clinical research involving prospective data collection in randomized controlled trials is not always feasible. Increasingly, hospitals are developing large clinical databases that are waiting to be mined. We have developed a computer program, ClinQuery, that facilitates such exploration and analysis. We have also shown in a series of studies that the use of clinical data is a powerful tool in health services research. In some cases, we have shown that coded data are inaccurate and that alternative clinical data are preferable. In other cases, a combination of clinical data and coded discharge diagnoses is preferable.

INTRODUCTION Some hospital computing systems can capture a wealth of clinical data as a byproduct of the patient care process. My colleagues had I have developed a clinical computing system for Boston's Beth Israel Hospital that helps doctors, nurses, and other clinicians provide better care for their patients.'-4 Data are collected at the point of transaction either directly on 1 of 800 terminals located throughout the hospital or from autoanalysers. During an average week in 1988, hospital employees entered or corrected information in the patients' computer record 137,526 times,3 and clinicians viewed patient data 40,958 times.4 The information entered and viewed includes demographic data, diagnosis and procedure codes, complete laboratory data, medications, results of diagnostic procedures, and scheduling of tests and office visits. Although hospital computing systems such as ours make it easy to find individual data pertaining to a particular patient, the use of these data for research typically involves specialized programming by intermediaries who are not familiar with the research problem under consideration. In addition, the programs usually take hours or even days to run, and then frequently have to be rewritten as the researcher and the data intermediary try to understand each other. Thus, limitations relating to access have prevented most researchers from using routinely collected computer-based clinical data for clinical research. RAPID ACCESS TO CLINICAL DATA We have developed ClinQuery, an interactive computer program for online searching of clinical data, to address these problems of a c c e s ~-. ~ This program allows clinicians, researchers, and administrators to use any of the 800 terminals located throughout the hospital to query the hospital's database at any time of the day or night. The database consists of all clinical data from 0277-67 15/9 1/04055946$05.00 0 1991 by John Wiley & Sons, Ltd.

560

C. SAFRAN

more than 177,000 consecutive hospital admissions since 1984 and is updated automatically after patients are discharged from the hospital. The types of searchable data elements are shown in Table I. ClinQuery is menu-driven and designed to be similar in operation to Paperchase, a selfservice bibliographic retrieval system for the MEDLINE database.*^^ The user selects a description of a patient from the menus, and a list of admissions is formed for each description. These lists can then be logically combined to produce additional lists. Table I1 shows nine lists that were obtained during a search to explore the relation between hypoalbuminaemia and death in patients with AIDS in 1989. The numbers shown in the far right column indicate that during 19 (list E) of 126 hospitalizations (list E + list H), patients with AIDS and hypoalbuminaemia died, whereas during 5 (list F) of 122 hospitalizations (list F + list I), patients with AIDS and normal serum albumin died. Thus, in a few minutes, a clinician can find data suggesting that patients with hypoalbuminaemia who had a discharge diagnosis of AIDS in 1989 had more than four times the risk of death during hospitalization as similar patients with normal serum albumin on admission. Of course, further analysis is necessary to ynderstand this association.

USING ROUTINELY COLLECTED DATA Over 900 clinicians at the Beth Israel Hospital in Boston have used ClinQuery more than 10,OOO times since it was made available as one of the decision support options in the hospital's clinical information ~ y s t e m Clinicians .~ reported that 16 per cent of their searches were performed for patient care, 38 per cent for clinical research, and the remainder for teaching, exploration, and administration. Because CIinQuery integrates diverse types of data from different computer systems within a hospital, researchers can explore relationships between clinical data, coded discharge data, and fiscal data. The following examples, derived from ClinQuery searches, demonstrate the value of clinical data, the inadequacy of coded discharge diagnoses, how a combination of clinical data and coded discharge date may be preferable to either alone, and how fiscal data can be combined with clinical data. Patients with acute myocardial infarction (AMI) as a principal diagnosis are currently assigned to one of three DRGs: AM1 - alive with complications (DRG 121); AM1 - alive without complications (DRG 122); and AM1 with death during hospitalization (DRG 123). Complications were defined by expert panels to be one of approximately 40 ICD-9-CM codes, such as congestive heart failure (428.0)or artial fibrillation (427.31).When one of these 40 ICD-9-CM codes is listed on a discharge abstract of a patient discharged alive with a diagnosis of AMI, the hospital is entitled to an additional $1 800. We examined whether this classification reflected resource consumption in our hospital.'O Of the 254 patients at our hospital who were discharged with a principal diagnosis of AM1 during a nine-month period in 1985, 4 had been transferred to the hospital within 12 hours of the AM1 and 31 had no electrocardiographic changes, cardiac isoenzyme elevations, or physician notes indicating AM1 during hospitalization. 37 of the remaining 219 patients died, leaving 101 patients with AMI- alive with complications and 81 patients with AM1 - alive without complications. There was no significant difference in length of stay or total hospital charges between these two groups of patients." Thus, ICD-9-CM codes for complications failed in our hospital to separate high and low resource-consuming patients. Patients who underwent cardiac catheterization were excluded from further analysis because they had higher total charges and longer lengths of stay, a fact that has been reflected in changes to the DRGs since 1985. Subsequent stepwise multiple linear-regression analysis revealed that of more than 70 clinical factors considered, only peak levels of myocardial enzymes (creatine kinase and

561

ROUTINELY COLLECTED DATA

Table I. Searchable data elements Administrative data 43 categories Age, sex, admitting & discharge date, admitting & discharge service, room & ward, length of stay, number of admissions, financial class, intervals between admissions, status (alive or dead), discharge destination, hospital charges, etc. Laboratory data 5 fluids Blood, urine, CSF, joint, stool Chemistry 80 results Glucose, electrolytes, renal function, liver function, cardiac enzymes, hormone levels, drug levels, etc. Haematology 60 results Complete blood count, differential cell count, morphology, haemostatic values, sedimentation rate, etc. Arterial blood gas 7 results Ph, PO,, PCO,, ventilator settings, etc.

58 organisms 23 sensitivities Microbiology Sources: blood, urine, respiratory tract, CSF, joint, genitourinary tract, stool, and other. Blood bank 25 products & tests Red blood cells, whole blood, platelets, plasma components, and blood type. Medications Surgical pathology SNOMED codes subindexed by morphology and etiology codes.

623 medications 70 anatomic locations

Radiology 40 tests Cardiac catheterization lab Procedures 29 procedures Arteriography: dominance, morphology and per cent stenosis by artery. Ventriculography 10 results 8 categories Final diagnosis 3 categories Outpatient visits Location, provider type, frequency Diagnosis (ICD-9-CM codes) Admitting diagnosis, discharge diagnosis, diagnosis-related group (DRG), major diagnositc category

Table 11. Lists of patients and admissions from a ClinQuery search exploring the risk of death among patients with AIDS and hypoalbuminaemia (low serum albumin) in 1989 Subject of search AIDS Died First albumin Died A, B, and C A, B, and D A not B G and C G and D

Attribute All patients

< 3.4 mg/dl 2 3.4 mg/dl (AIDS, died, low albumin) (AIDS, died, normal albumin) (AIDS, alive) (AIDS, alive, low albumin) (AIDS, alive, normal albumin)

Number of patients

Number of admissions

174 647 1493 3986 19 5 161 64 89

325 647 1780 4822 19 5 298 107 117

562

C. SAFRAN

lactate dehydrogenase) strongly predicted length of stay or institutional charges among patients surviving AMI." Thus, we were able to show that the coded discharge diagnoses did not accurately divide patients into high and low resource-consumption groups, but that one simple, objective laboratory value, which reflects our understanding of patient physiology, is the best predictor of these outcomes. Conditions such as fluid and electrolyte disorders frequently entitle a hospital to increased reimbursement when they are coded as comorbid conditions on a patient's discharge abstract. (Comorbid conditions were defined as diagnoses 'that because of [their] presence with a specific principal diagnosis would cause an increase in length of stay by at least one day in at least 75 per cent of the patients.") To our surprise, over half the patients with fluid and electrolyte abnormalities during their hospitalizations have no code for this abnormality assigned to their discharge summary. Although a simple computer program now reminds hospital coders of these abnormalities, which increase reimbursement for the hospital, we were curious to see if serum electrolyte abnormalities truly did predict resource consumption. When we analysed laboratory data to ascertain whether serum electrolyte abnormalities did in fact correlate with increased length of stay, we found that for only 31 per cent of all DRGs (for which we had sufficient data), did the abnormalities meet the above definition." However, they were not primarily the DRGs for which Medicare is willing to pay more (that is, one of 104 DRGs with a comorbidity proviso). Since the DRG scheme is based on analysis of coded clinical data, one is forced to conclude that important information is lost in the coding process. Although the previous two examples showed the power of clinical data to predict resource utilization in some situations and the failure of coded data to do the same, at least in one teaching hospital, sometimes the combination of the two types of data provide a better predictive model than when either is used alone. When we studied emergency readmissions to our medical service, we found that 19 per cent of patients were readmitted within 90 days of di~charge.'~ Using the technique of recursive partiti~ning'~ we stratified a patient's risk of emergency readmission on the basis of coded discharge diagnosis and clinical data, When we used only coded discharge data or only clinical data, we either could not validate a model predicting readmission or the model was simple and not clinically useful. However, when we combined these types of data, the technique produced stratifications that were not only clinically useful, but also statistically valid. For instance, we found that patients with a discharge diagnosis of cancer and a prior admission within 60 days had a 41 per cent readmission rate, patients with heart failure and abnormal anion gap had a 32 per cent readmission rate, patients with diabetes, haematocrit less than 38 per cent, and creatinine greater than or equal to 1.3 mg/dl had a 32 per cent readmission rate, whereas patients without a high-risk diagnosis, no prior admission within 60 days, and a normal serum albumin have only a 10 per cent readmission Clinical data used in conjunction with coded discharge diagnoses yielded better predictions than either diagnosis or laboratory data used alone. REASONS TO USE CLINICAL DATA The main advantage of using coded information is that it is widely available and simpler to analyse than the clinical data from which it was derived. All hospitals in the United States are required by law to produce coded discharge abstracts for Medicare reimbursement. The pitfalls of using coded data are generally acknowledged and then thoughtfully ignored. These problems include high error rates, coding bias for reimbursement, inadequacy of the coding rules, and subjectivity. We and others have detected coding error rates as high as 44 per cent.10.12*16-19 At our hospital, even after careful audit and the revision of more than 3000 discharge abstracts, the

ROUTINELY COLLECTED DATA

563

discharge abstracts contained the ICD-9-CM code 276.8 (hypopotassaemia) in only 35 per cent of 3001 admissions of patients with a low serum potassium level (an improvement from the nearly 50 per cent coding omission rate we had previously documented). Furthermore, only 69 per cent of the 758 admissions coded for acute myocardial infarction had creatine kinase isoenzyme elevations suggestive of this diagnosis. Thus, well-intentioned coders (even with financial incentive) make errors of both omission and commission. In the case of acute myocardial infarction, the rules for assigning ICD-9-CM codes are likely to contribute to a misinterpretation of the data. A patient admitted prior to October 1989 for balloon angioplasty could be assigned a principal diagnosis code of acute myocardial infarction if he or she has had an acute myocardial infarction at any time within the prior two months! Although this anomaly has been corrected with the most recent revisions of the ICD-9-CM codes, all Medicare databases with data prior to October 1989 contain this built-in error. Finally, reasons for hospitalization are frequently multifactorial, and the ordering of diagnosis codes as well as the selection of particular codes can reflect the subjective judgement of the hospital’s coder. Clinical data, on the other hand, are less subjective; they come from primary information sources, they reflect the patient’s physiology, and they reflect clinical practice. Laboratory data, which make up a large component of the clinical data that are available on computers in hospitals, come directly from autoanalysers. These data are objective and not subject to manipulation. Other clinical data such as medications, diets, procedures performed, and X-ray and pathology results are also captured in some hospital computing systems. This information, when entered by clinicians who have responsibility for the care of the patients, is relatively free of error. Only a few hospitals capture large amounts of clinical data on their computer systems, however, and even if clinical data were more widely available, there would be too much information from each hospitalization for the statistical techniques used to analyse health related data. Of course, many pitfalls await the users of clinical data. ClinQuery permits researchers to explore data making hundreds of comparisons. Although some have called this exploration ‘data dredging’ and others have elaborated on the hazards of using observational databases,” ready access to the database can also stimulate hypothesis formation. Researchers also gain a better understanding of the information recorded - what data elements are contained in the database, how reliable are the data (laboratory data versus discharge diagnoses), what are the alternative search strategies, and what are the limitations and biases of the data. Ready access provides immediate feedback when data are requested. If a query is too broad or too specific, or the results are not consistent with known facts, the researcher can alter the search or re-evaluate a hypothesis. Finally, interaction with the data can alert the researcher to unexpected relationships that warrant further exploration and study. Thus, the ability to directly interact with the database allows the clinician to explore alternative explanations. Perhaps the most important problems associated with using a clinical database relate to the facts that not all patients have the same data items collected and sicker patients have more tests performed on them. Thus, missing data and selection bias continue to be major concerns for those who use routinely collected data in their research.” Despite these problems, clinical data that are collected as a byproduct of the patient care process represent a largely untapped resource for health care researchers. We have shown that in some instances, these clinical data are preferable to the coded data widely used by governmental regulators and health services researches, and in other instances, clinical data should be used in conjunction with coded discharge data. We believe that programs like ClinQuery that provide rapid and flexible access to integrate clinical data, fiscal data, and coded discharge data will provide health services researchers with powerful new tools to explore the process of care.

564

C . SAFRAN ACKNOWLEDGEMENTS

The author thanks Dr. Warner V. Slack for his many thoughtful comments and suggestions, and Emily Boro for editorial assistance. REFERENCES 1. Bleich, H. L., Beckley, R. F. and Horowitz, G. L. ‘Clinical computing in a teaching hospital’, New England Journal of Medicine, 312, 756-764 (1985). 2. Slack, W. V. ‘The soul of a new system: a modern parable’, MD Computing, 6, 137-140 (1989). 3. Bleich, H. L., Safran, C. and Slack, W. V. ‘Departmental and laboratory computing in two hospitals’, MD Computing, 6, 149-155 (1989). 4. Safran, C., Slack, W. V. and Bleich, H. L. ‘Role of computing in patient care in two hospitals’, M D Computing, 6, 141-148 (1989). 5. Safran, C., Sobel, E., Lightfoot, J. and Porter, D. ‘A computer program for interactive searches of a medical database’, in Salamon, R., Blum, B. and Jorgenson, M. (eds.) MEDINFO 86: Proceedings of the Fifth World Conference on Medical Informatics, Elsevier Science Publishers, Amsterdam, 1986, pp. 545-549. 6. Safran, C. and Porter, D. ‘New uses of the large clinical database at the Beth Israel Hospital: On-line searching by clinicians’, in Othner, H. F. (ed.) Proceedings of the Tenth Annual Symposium on Computer Applications in Medical Care, IEEE Computer Society Press, Washington, D.C., 1986, pp. 114-1 19. 7. Safran, C., Porter, D., Lightfoot, J., Rury, C. D., Underhill, L. H., Bleich, H. L. and Slack, W. V. ‘ClinQuery: a system for online searching of data in a teaching hospital’, Annals of Internal Medicine, 111, 751-756 (1989). 8. Horowitz, G . L. and Bleich, H. L. ‘Paperchase: A computer program to search the medical literature’, New England Journal of Medicine, 305, 924-930 (1981). 9. Horowitz, G. L., Jackson, J. D. and Bleich, H. L. ‘Paperchase. Self-service bibliographic retrieval’, Journal of the American Medical Association, 250, 2494-2499 (1983). 10. Barbash, G. I., Safran, C., Ransil, B. J., Pollack, M.A. and Pasternak, R. C. ‘Need for better severity indexes of acute myocardial infarction under diagnosis-related groups’, American Journal of Cardiology, 59, 1052-1056 (1987). 11. Averill, R. F. ‘The revised ICD-9-CM diagnosis related groups. Technical definitions’, New Haven, C T Health Systems International, 1983. 12. Safran, C., Porter, D., Slack, W. V. and Bleich, H. L. ‘Diagnosis-related groups. A critical assessment of the provision for comorbidity’, Medical Care, 25, 101 1-1014 (1987). 13. Phillips, R. S., Safran, C., Cleary, P. D. and Delbanco, T. L. ‘Predicting emergency readmissions for patients discharged from the medical service of a teaching hospital’, Journal of General Internal Medicine, 2, 400-405 (1987). 14. Cook, E. F. and Goldman, L. ‘Empiric analysis of multivariate analytic techniques: advantages and disadvantages of recursive partitioning analysis’, Journal of Chronic Disease, 37, 721-73 1 (1984). 15. Safran, C. and Phillips, R. S. ‘Interventions to prevent readmission: The constraints of cost and efficacy’, Medical Care, 27, 204-21 1 (1989). 16. Demlo, L. K., Campbell, P. M. and Brown, S. S. ‘Reliability of information abstracted from patients’ records’, Medical Care, 16, 995-1005 (1978). 17. Still, S. ‘The reliability of medical records’, Journal of the American Medical Record Association, 51, 20-27 (1980). 18. Corn, R. F. ‘Quality control of hospital discharge data’, Medical Care, 18, 416-426 (1980). 19. Barnard, C. and Esmond, T. ‘DRG-based reimbursement: The use of concurrent and retrospective clinical data’, Medical Care, 19, 1071-1082 (1981). 20. Mantel, N. ‘Cautions on the use of medical databases’, Statistics in Medicine, 2, 355-362 (1983). 21. Rubin, D. B. and Schenker, N. ‘Multiple imputation in health-care data bases: An overview and some applications’, Statistics in Medicine, 10, 585-598 (1991).

Using routinely collected data for clinical research.

Clinical research involving prospective data collection in randomized controlled trials is not always feasible. Increasingly, hospitals are developing...
463KB Sizes 0 Downloads 0 Views