The design and automated testing of an expert system for the differential diagnosis of acute stroke.

The Design and Automated Testing of an Expert System For the Differential Diagnosis of Acute Stroke Reese A. Wain M.D., Monteflore Medical Center and the Albert Einstein College of Medicine Stanley Tuhrim M.D., Mount Sinai School of Medicine Lynne D'Autrechy and James A. Reggia, University of Maryland Stroke is the third leading cause of death in the United states and a major source of morbidity.[1J Recent studies have shown a potential use for thrombolytic agents in the treatment of ischemic stroke (IS) but these agents are contraindicated in intracerebral hemorrhage (ICH). A computed tomographic scan is used to distinguish between these two stroke types prior to the use of thrombolytic agents, but may not be readily obtainable. Decision making aids such as algorithms developed at Guy's Hospital and Strong Memorial Hospital have been designed in an attempt to make this distinction on clinical grounds. We have constructed computerized medical decision-making (CMD) systems based upon these algorithms and compared their performance to a system we developed with the use of National Stroke Data Bank data. Relevant medical data for each of 337 patient cases in the Mount Sinai Hospital Stroke Data Bank were presented to each of the CMD systems. In consideration of the clinical task of using thrombolytic agents, we attempted to maximize the positive predictive value (PPV) for ischemic stroke. The CMD systems based upon the Guy's Hospital and Mount Sinai algorithms produced PPV's of 95% and 94% with sensitivities of 77% and 78% respectively compared to a PPV of 93% and sensitivity of 56% with the Strong Memorial CMD system. The Mount Sinai CMD system was judged more efficacious than the Guy's Hospital system in that it required less clinical information that could be more easily obtained to arrive at similar results.

use. If ischemic strokes could be reliably distinguished from hemorrhagic strokes on the basis of the patient's history and clinical findings, the need for an emergent CT scan could be eliminated. The task of distinguishing ischemic strokes from their hemorrhagic counterparts on clinical grounds has proven to be a less than effective endeavor. In attempting to improve upon these results, algorithms have been developed which rely on a more ordered approach to this differential diagnosis. Among the existing decisionmaking aids that attempt to predict the likelihood of infarction and hemorrhage in acute stroke patients are algorithms developed at Guy's Hospital[2] and Strong

Memorial Hospital.[3] These algorithms were derived from analyses of data regarding series of patients admitted to the respective institutions who fulfilled criteria for the diagnosis of stroke. In both studies, the admission histories, physical and neurological examination results and diagnostic test findings were recorded in a computerized data base for all stroke patients who had undergone CT scanning. Attributes were then sought which would distinguish between stroke type. Each individual clinical and historical variable was analyzed with respect to its occurrence in one type of stroke versus the other. The Guy's Hospital algorithm assessed for the presence or absence of 23 distinct variables while the Strong Memorial Hospital algorithm required the physician to note the presence of only 6 variables. We undertook the construction of computerized medical decision-making (CMD) systems based upon the aforementioned published algorithms and then endeavored to create a CMD system based upon an algorithm we developed that also distinguishes between an acute ischemic stroke and an intracerebral hemorrhage utilizing more easily obtained historical and clinical findings. These systems were then tested using a database of patient cases seen at the Mount Sinai Hospital. In the following sections we will describe how these systems were created, the evaluation process and its results and then discuss the potential usefulness of these systems. We will also discuss the development of a domain-independent tool for automating the extraction of information from databases and its presentation to CMI systems that represents a useful byproduct of this work. II. Methods A. CMD System Development We have previously developed an expert system generator which allows direct development of CMD

I. Introduction Distinguishing between intracerebral hemorrhages and ischemic strokes on clinical grounds is often difficult, but is of of increasing importance in light of recent advances in the treatment of acute stroke. Recent clinical trials have suggested that thrombolytic agents may be of potential value in the treatment of ischemic stroke. Since these agents promote clot lysis and potentiate bleeding, a distinction between ICH and IS must be made if they are to be used in acute stroke. This distinction must be made quickly because to be effective these agents must be administered very soon after the stroke onset, but could prove fatal if given to patients with ICH. The most definitive way to make this distinction is by performing a computed tomographic (CT) scan. Unfortunately, a critical amount of time can be spent in transport to a hospital for this scan and many hospitals throughout the world do not have CT equipment, or cannot provide for their immediate 0195-4210/91/$5.00 © 1992 AMIA, Inc.

94

systems by domain experts without a computer scientist acting as an intermediary.[4] This generator, or shell, provides all of the necessary software in prepackaged form (inference mechanisms and user interface), so that a developer need only specify the type of inference method to be used and the necessary knowledge in an appropriate syntax. The choice of inference methods includes the use of linear discriminant functions, an approach based on Bayes' Theorem, production rules and an abductive, "hypothesize-and-test" approach. The use of this expert system generator has greatly streamlined the process of developing CMD systems and has been successfully utilized in a variety of projects, most notably the development of a rule-based system for diagnosing and managing patients with possible transient ischemic attacks (TIA's).[5] Using this expert system generator, CMD systems were constructed based upon the Guy's Hospital and Strong Memorial algorithms. A production rule system that incorporates a linear discriminant function was used for each of these systems. Because neither of these proved sufficiently accurate (as discussed below), we attempted to develop an entirely new system based on data from the National Institute of Neurological Diseases and Stroke (NINDS) Stroke Data Bank. To begin our search for the variables most likely to distinguish between stroke types, we compiled a preliminary list of a large number of clinical and historical factors which had previously been described as discriminating ICH from IS.[2,3,6-8] The discriminative capacity of these variables was assessed by quantifying their association with a particular stroke type in the group of stroke patients described in the NINDS Stroke Data Bank which contains historical and clinical information for over 1800 stroke victims.[9] Using the SAS package, we were able to create prior probability tables relating the presence or absence of each of these close to 100 variables with the occurrence of one or the other type of stroke. Chi-square analyses were also performed for each of the attributes to screen those attributes whose presence or absence did not have significant bearing on stroke type. The variables identified in the Chi-square analyses were then analyzed using multivariate techniques to identify those with independent predictive value for distinguishing between stroke etiologies. The variables we chose to include in the Mount Sinai CMD system were the presence of headache, vomiting and decreased consciousness at the stroke onset and the presence of a diastolic blood pressure greater than 1 l0mmHg. This CM) system made use of a linear discriminant function as its inference mechanism. In contrast to the Guy's Hospital and Strong Memorial systems a diagnosis could be made even if the data regarding a patient were incomplete. This was accomplisthed by calculating prior probabilities for the existence of any variable based upon NINDS Stroke Data

Bank data and including these in the calculation of the discriminant function value for all variables whose presence was unknown. It should be emphasized that the Mount Sinai system, like the Guy's and Strong Memorial Hospital systems, was derived from data regarding patients distinct from those used for the evaluation of the

systems. B. CMD System Evaluation The information necessary to test the CMID systems was obtained from a data base collected prospectively and consisting of all patients admitted to the Acute Stroke Unit of the Mount Sinai Hospital over an 18 month period. The results of a CT scan on each patient obtained within 24 hours of admission was included in this data base. All patients given a final diagnosis by CT scan of either an ischemic stroke or an intracerebral hemorrhage were included in this study (n=337). Typically, the CMD systems developed with our expert system generator are intended to function in a dialog format in which the user is queried regarding the information pertinent to the decision-making task at hand. However, a dialog format would have been extremely cumbersome for the evaluation undertaken here. Given the large number of cases used to evaluate each of the three systems the data entry process would have been quite slow and the potential for transcription errors great. Since the data already existed on computer a more expedient approach was developed in which a computer program was written that provided an interface between the Mount Sinai Data Bank records and the CMD systems. This interface automated the process of describing each of the cases considered to each of the CMD systems being evaluated, obtaining each system's diagnosis and comparing that to the actual diagnosis obtained by CT scan.

Following construction and debugging of the CMD systems based upon the Guy's Hospital, Strong Memorial and Mount Sinai algorithms, an attempt was made to diagnose either ischemic stroke or intracerebral hemorrhage for each of the 337 Mount Sinai Stroke Data Bank patients. The probability levels that were chosen for diagnosing a patient as having had an IS were intended to maximize the positive predictive value (PPV) for IS. In the CMI system based on the Guy's Hospital algorithm, a patient was diagnosed with an IS if the likelihood of having had an ICH was less than 5%, compared to less than 2% in the Strong Memorial Hospital study. These cutoff points were those used in the original studies[2,3] In the Mount Sinai CMI) system a greater than 90% probability of IS was deemed necessary for the diagnosis of ischemic stroke to be made. III. Results The results that were obtained when each CMD system diagnosed the type of stroke that had occurred in

95

the test patients are shown in Tables 1-3 along with the actual CT diagnoses.

parameters chosen to minimize the number of ICH patients misdiagnosed as IS, as would be done in a trial to assess benefit of thrombolytic agents. The choice of these parameters resulted in 65 of 298 patients (22%) with ischemic strokes being misdiagnosed as having incurred an ICH with the Mount Sinai CMD system and 47 of 205 patients (23%) and 121 of 273 patients (44%) with the Guy's Hospital and Strong Memorial Hospital systems respectively. A limitation of this study was the absence of sufficient data to make any diagnosis in 108 (32%) patients with the Guy's Hospital system and in 28 patients (8%) with the Strong Memorial system. This occurred despite the prospective nature of data collection employed. In other words, despite knowing what information was required a priori , it was not possible to obtain this data in all cases. For example, the Guy's Hospital system was heavily dependent on historical information such as the occurrence of a myocardial infarction in the six months prior to the index event. This sort of information is often not readily available when a patient with severe brain dysfunction is initially evaluated. Therefore, while missing data must be considered a limitation of the study, it also indicates a limitation of the CMD system itself. The inclusion of these patients would probably not have changed the results of this study significantly. Their absence does however shed light on the actual difficulty of compiling an adequate patient history. While the numerical results of the Guy's Hospital and Mount Sinai Hospital CMI) systems are quite comparable, differences inherent in their construction make the Mount Sinai system the more useful of the two. The Mount Sinai system is able to distinguish between stroke types when information regarding all of the clinical variables is not present. Furthermore, this system provides its results based upon the presence or absence of four easily obtained variables rather than six multi-part variables. The variables in the Mount Sinai CMD system do not depend on a detailed knowledge of the patient's past medical history or extensive physical examination or laboratory tests. The necessary information can be readily obtained by a health care worker with limited training. Though the Strong Memorial CMD system was not as adept at distinguishing between stroke types, it also requires both less initial data and data which is easier to obtain. This may be important if non-physicians are required to play a role in the early assessment of stroke patients (e.g. ambulance personnel or triage nurses). While all three of these systems could prove useful as adjuncts in the clinical differentiation of stroke type, none are sufficiently accurate to serve as the basis for the administration of thrombolytic agents. However, these CMD systems could prove more valuable in situations where a greater degree of incertitude could be permitted. The information they provide could be used, for example,

CT Diagnosis

IC CMDS Diagnosis

IS

158

8 a c

b d

ICH 47 16 PPV 95% Sensitivity 77% NPV 25% Specificity 67% Table 1. Guy's Hospital Algorithm Results Of the 337 patients included in the study 298 (88.4%) had ischemic strokes while the remaining 39 patients (11.6%) had hemorrhages. Relevant variables were not always available for either the system based on the Strong Memorial algorithm or that based upon the Guy's Hospital algorithm. Twenty-eight patients were excluded for lack of sufficient data from evaluation of the former (25 IS and 3 ICH) and 108 from the latter (93 IS and 15 ICH). However, excluded patients had occurrence rates of ICH virtually identical to the entire test population (89.3% and 86.1% respectively). All patients were available for the evaluation of the Mount Sinai CMD system.

CMDS Diagnosis

CT Diagnosis is ICH IS 152 11 a c

b d

ICH 121 25 PPV 93% Sensitivity 56% NPV 17% Specificity 69% Table 2. Strong Memorial Hospital Algorithm CT Diagnosis CMDS Diagnosis

IS IS 233

ICH 14 a c

b d

ICH 65 25 PPV 94% Sensitivity 78% NPV 28% Specificity 64% Table 3. Mount Sinai Hospital Algorithm Results

Of note, ninety-five and 94 percent of the patients respectively diagnosed by the Guy's Hospital and Mount Sinai Hospital CMD systems as having an IS actually did, while only 5% and 6% respectively of those diagnosed with IS did in fact have an ICH. IV. Discussion These results indicate that the CMI) systems based upon the Mount Sinai and Guy's Hospital algorithms are better able to distinguish between IS and ICH in the acute stroke patient than that based upon the Strong Memorial algorithm. These results were obtained using probability

96

to determine which patients are very unlikely to have suffered an intracerebral hemorrhage so that the CT scan may be deferred until the optimal time to visualize an acute ischemic stroke, i.e. three to seven days after its onset. Also, decisions regarding agents less likely to cause harm such as calcium channel blocking agents could be made on the basis of these systems prior to CT scanning. The accuracy and therefore the usefulness of such systems might be increased by identifying the characteristics of the patients who are most likely to be diagnosed incorrectly. For example, lacunes are small deep infarcts that typically produce easily recognized clinical syndromes. These syndromes, while usually due to lacunes, are occasionally caused by small hemorrhages.[l0] The misdiagnosis of this type of hemorrhage may contribute substantially to the overall error rate, while the exclusion of these patients would be clinically unimportant in many situations (lacunes would probably never be treated with thrombolytic agents because they are small and usually cause only mild

uncover

"bugs" in the system's knowledge base, as well as

to reevaluate the

knowledge base after fixing errors to make sure that new errors were not generated by the revisions. Using information from a database in this fashion to expedite expert system evaluation usually involves either 1) implementing a database specifically for evaluation of the expert system, 2) manually deriving data from an existing database and entering it into the expert system, or 3) manually writing a program to take information derived from an existing database and put it into files of a format that can be directly processed by the expert system. The latter approach is generally the most efficient when a large number of test cases are involved, but still is a non-trivial task that can involve a major time commitment. This task could be facilitated by automating the generation of the program to convert information from any database into a format suitable for processing by an expert system. What would be needed would be a way of efficiently extracting the relevant information and providing it to the system under evaluation. Since many existing database facilities and statistical packages can produce output in standard format for processing by another system, a general purpose facility for utilizing that extracted information could be used in conjunction with a variety of existing programs. In the process of performing the evaluations described above, we developed this sort of tool. This automatic interface generator (AIG) created a program that ran the large number of patient cases needed to evaluate these CMD systems. It is important to note that the AIG is a general purpose tool which can be used to automate the running of large numbers of test cases through any expert system. The AIG takes as input any patient file in columnar format where each line represents one patient and each column contains a particular type of data (e.g. sex, age) about the case. (For this project we used data obtained from the Mount Sinai Stroke Data Bank which was created using Clinical Data Manager, however, the AIG can be used with any data which can be formatted appropriately). Creating a file which the AIG can use is a relatively straightforward procedure with most popular data base facilities. Once this file exists, the AIG directs the automated transfer of the patient information from the data file to the expert systems. The AIG can be used for any system created with our expert system generator. Potentially, programs using the AIG can be written by others to generate programs which can automate the testing of different expert systems. The concept of using an AIG to automate the testing and validation of expert systems can be applied to virtually any expert system for which the appropriate database exists.

deficits). In the nearly three decades that computer-assisted medical decision-making systems have existed their impact on clinical medicine has been very limited. The reasons for this are complex and have been discussed in detail elsewhere.[1 1] One limiting factor has been the difficulty in direct interaction between the domain specialist (physician) and the evolving CMD system. A variety of approaches have included intelligent editors such as Tieresias,[12] expert systems for knowledge acquisition such as AGE[13] and the creation of shells to allow direct CMD development by domain experts. This has provided a partial solution to the limitation alluded to above, but has placed greater emphasis on another source of inefficiency in CMD system development: the evaluation process. Two main requirements are necessary for the evaluation of a newly developed CMD system. First, criteria must be developed to determine the adequacy of a system's performance. This represents a formidable task in that it is difficult to decide what represents optimal or acceptable performance when disagreement among human experts with similar training can exist. Often there is no single correct answer. Several evaluation methods have been suggested[14,15] and guidelines for system evaluation have been discussed elsewhere.[16] Second, a source of actual patient data is needed for the testing process. These cases may be collected prospectively once the information necessary for a given system is known, but this is a laborious, time-consuming process. If an existing source of relevant information exists, this provides a far more efficient means of evaluation. Many studies have tested expert systems using information from a database of cases where the "correct answer" is known.[5,17-20] Such testing allows one to

VI. Conclusion In order for CMD systems to be widely accepted into the armamentarium of the clinician, guidelines need to be

97

established by which they can be adequately validated. The minimum criteria for validation should probably include testing on actual patient cases. Additionally, any existing algorithms, models or programs should be compared to a newly developed system. Such a houd be made using the same patient cases comparison should and is needed to demonstrate the relative value of each system. Such testing might be followed by prospective in vivo studies using actual patients in comparing physician performance with and without the decision-making aid. Then the validation process could be said to closely parallel the mechanisms used in evaluating a new drug or procedure. this study of performing InIn thethe course of performig this study wedeveloped we developed an interface mechanism which greatly expedited the validation process. This was accomplished by creating an "automated interface generator" which allows already existing information in a computerized data base to be presented for testing to any expert system created with our expert system generator. Moreover, this testing could occur quickly in an automated fashion. The process is largely independent of the nature of the data or the system

Assessment of Transient Ischemic Attacks: A Clinical Evaluation. Archives of Neurology, 41, 1248-1254. 1984. 6. Aring, C.D., Merritt, H.H. Differential Diagnosis Between Cerebral Hemoffhage and Cerebral Thrombosis. Archives of Interal Medicine. 56:435-456. 1935. 7. Aring, C.D. Differential Diagnosis of Cerebrovascular Stroke. Archives of Internal Medicine. 113:195-199. 1964. 8. Harrison, MJ.G. Clinical Distinction of Cerebral Haemorrhage and Cerebral Infarction. Postgraduate Med. J.

comparedon anewymdevelopgthed sysepatiem. Suchas

56:629-632.1980.

9. Foulkes, M.A., Wolf, P.A., Price, T.R., Mohr, J.P. and Hier, D.B. The Stroke Data Bank: Design, Methods and Baseline Characteristics. Stroke. 20:864-870. 1989. 10. Tuhrim, S., Yang, W.C., Rubinowitz, H., and Weinberger, J. Pontine Primary and the Dysarthria-Clumsy Hemorrhage Hand Syndrome. Neurology. 32, 1027-1028. 1982. 11. Reggia, J., Tuhrim, S. Computer-Assisted Medical Decision Making, 1, 1-45. New York: Springer-Verlag. 1985. 12. Davis, R. Acquisition in Rule Based Systems-Knowledge About Representations as a Basis for System Construction and Maintenance: In D. Waterman & F. Hayes-Roth (Eds.), Pattern Directed Inference Systems (PP. 99-134). New York: Academic Press. 13. Nii, H., Aiello, N. AGE (Attempt to Generalize): A larelyand can be overseen by an individual withlittleKnowledge-Based Program for Building Knowledge -Based uPrograms. Proceedings of the International Joint Conference on computer experience. This approach could easily be Artificial Intelligence, 645-655. 1979. applied to the evaluation of CMD systems in general. 14. Miller, R., Pople, H., & Myers, J. Internist-i: An Evolving We have considered the task of distinguishing ICH Computer-Based Diagnostic Consultant for General Internal from IS, without a CT scan, a job of potentially crucial Medicine. New England Journal of Medicine, 307,468-476. 1982. importance to the use of thrombolytic agents in the 15. Hundsgaarde, H. Evaluating Medical Expert Systems. Soc. Sci. treatment of an acute stroke. Such a problem has proven Med. 24, 805-819. 1987. difficult for clinicians and has led to the development of 16. Miller, R., Schaffner, K., & Meisel, A. Ethical and Legal Issues algorithms which attempt to improve on the acumen of the Related the Use Medicine, of Computer Clinical Medicine. Annals ofto Internal 102,Programs 529-538.in1985. ndividual diagnostician. we have constructed CMd 17. Kulikowski, C., Ostroff, J. Constructing an EXPERT systems based upon two of the previously published algorithms and compared their performance with that of a Knowledge Base for Thyroid Consultation Using Generalized CMD system we developed de novo from analysis of data Artificial Intelligence Techniques. Proceedings of the Fourth from the NINDS Stroke Data Bank. This latter system Annual Symposium on Computer Applications in Medical Care. 175-180. 1980. proved more effective than those developed using existing 18. Neapolitan, R., Georgakis, C., Evens, M., et. al. Using Set algorithms, but was of insufficient accuracy for Covering and Uncertain Reasoning to Determine Treatment. In application to the task of selecting patients to receive W. Stead (Ed.), Proceedings of the Eleventh Annual thrombolytic agents without prior CT scan. Nevertheless, is system developed b t in sSymposium 219. 1987. on Computer Applications in Medical Care. 213the system the developed Is of potential potential benefit In situations where the consequences of error are less devastating. 19. Politakis, P., Weiss, S. Using Empirical Analysis to refine Expert System Knowledge Bases. Artificial Intelligence. 22:23References 48.1984. 1. Caplan, Louis R. Stroke. In Bean, Kristine J. (Ed.) Clinical 20. Speedie, S., Palumbo, R., et. al. Rule-Based Drug Prescribing Symposia. Vol. 40:4, ppl-32. 1988. Review: An Operational System. Proceedings of the Fifth 2. Allen, CMC: Clinical Diagnosis of the Acute Stroke Annual Symposium on Computer Applications in Medical Syndrome. Quarterly Journal of Medicine. 208:515-523. Care. 598-602. 1981. 1983. 3. Panzer RJ, Fiebel JH, Barber WH, Gruner PF: Predicting the Acknowledgement: We wish to thank Dr. Mayer Fishman and Likelihood of Hemorrhage in Patients with Stroke. Michael Ostrow for their help in programming and Dr. Deborah R. Arch. Int. Med. 145:1800-1803. 1983. Horowitz who examined the patients in this study. This work was 4. Reggia, JA., Perricone, BT: Knowlege Management System supported by grants and NS29414 from the Health and aNS01257 gift from the Hess Manual. Dept. of Computer Science, University ofInsttutes of NS27924, Foundation. course

5. Reggia, J., Tabb, D., Price, T., et. al. Computer-Aided

98

A desktop expert system for the differential diagnosis of dementia. An evaluation study.

An expert system for the diagnosis of irritable bowel syndrome.

'Carrusel': an expert system for vestibular diagnosis.

Design of customizable automated low cost eye testing system.

Design of an expert system and its application to dermatopathology.

A protocol for testing expert-system reliability.

A prototype expert system for fishway design.

An automated system for testing the accuracy of patient-controlled analgesia devices.

Differential Diagnosis of Acute Myelopathies: An Update.

An Approach to Automated Fusion System Design and Adaptation.

The Diagnostic Value of Skin Disease Diagnosis Expert System.

An expert fitness diagnosis system based on elastic cloud computing.

Acute nontraumatic weakness: overview of central nervous system differential diagnosis.

Expert learning system network for diagnosis of breast calcifications.

Automated decision-support system for prediction of treatment responders in acute ischemic stroke.

Design and Prototype of an Automated Column-Switching HPLC System for Radiometabolite Analysis.

The accuracy of glial fibrillary acidic protein in acute stroke differential diagnosis: A meta-analysis.

Design and Fabrication of a Differential Electrostatic Accelerometer for Space-Station Testing of the Equivalence Principle.

Design and testing of a novel multi-stroke micropositioning system with variable resolutions.

Combining SPECT and Quantitative EEG Analysis for the Automated Differential Diagnosis of Disorders with Amnestic Symptoms.

Validation of an automated blood culture system for sterility testing of cell therapy products.

Foetos: an expert system for fetal assessment.

An expert system for environmental data management.

[The task of the radiologist in the diagnosis and differential diagnosis of acute appendicitis].