Computer Methods and Programs in Biomedicine, 34 (1991) 229-238 © 1991 Elsevier Science Publishers B.V. 0169-2607/91/$03.50

229

COMMET 01146

Evaluation of decision support systems in medicine P. N y k a n e n

1, S. C h o w d h u r y

2 a n d O. W i g e r t z 2

I Technical Research Centre of Finland, Medical Engineering Laboratory, 33101 Tampere, Finland, and 2 Link6ping University, Department of Medical Informatics, 581 83 Link6ping~ Sweden

Evaluation deals with the measurement or judgement of system characteristics and with comparison of these with the frame of reference. Evaluation of medical decision support systems is important because these systems are planned to support human decision making in tasks where information from different sources is combined to support clinicians' decisions concerning diagnosis, therapy planning and monitoring of the disease and treatment processes. As the field of decision support systems is still relatively unexplored, standards or generally accepted methodologies are not yet available for evaluation. Evaluation of medical decision support systems should be approached from the perspectives of knowledge acquisition, system development life-cycle and user-system integrated environment.

Evaluation; Validation; Knowledge acquisition; Functionality; Validity; Performance; Fitness; Assistance; Impact; Medical decision support system

1. Introduction

The term 'decision support system' (DSS) has, despite its frequent use, no broadly accepted definition. Earlier it was defined as an interactive computer-based system which could help decision makers to utilize data-bases and models to solve ill-structured problems. Often the technical implementation and practical application of such systems were emphasized in definitions rather than how they succeeded to fulfil their purpose. More recently attention has been shifted to the conceptual aspects of these systems as reflected in definitions like 'the exploitation of intellectual and computer-related technologies to improve creativity in decisions that really matter' [29]. Medical decision support systems are designed

Correspondence: Prof. Ove Wigertz, Dept. of Medical Informatics, Link~ping University, 581 83 Link6ping, Sweden.

to support, or even replace, human expertise and they should function in situations which are extremely complex or specific to the individual user, i.e., clinical management, diagnosis, interpretation, therapy planning, monitoring and follow-up of treatments and diseases. As a rule, human decision making is not a well-understood process and many users have negative feelings about computer-based decision support in these decision making tasks. The only way to overcome these negative feelings and to achieve user acceptance for decision support systems is to perform evaluations and validations and thus convince users about the validity and correctness of these systems. Validation of a decision support system has also been stressed by Shwe et al. [28] as an essential step in demonstrating the correctness of the system. Validity is a must for a decision support system before it can be accepted to use. Definitions for the terms verification, validation and evaluation are presented in the following.

230

Verification means demonstration of consistency, completeness and correctness of the software. Verification aims at eliminating errors in the software and certifies that it has been built according to the specifications [21]. Adrion et al. [1] define validation as the determination of the correctness of the final program with respect to the user needs and requirements. O'Keefe et al. [20] note that validation means building the fight system. Here the right system refers to the extent that an expert's activity and knowledge has been captured, i.e. the extent to which a system is a good model. Validity of a decision support system is often the key to its worth. Evaluation means that the worth of the system is judged, i.e., is it beneficial, does it what was required? [21]. Evaluation is comparison of quality measures with a frame of reference to assess the system's quality [32]. Evaluation is determination of the relevance, progress, efficiency, effectiveness and impacts of the software product [15]. What makes evaluation of decision support systems difficult is that they are both a piece of software and a model [21]. Viewing a decision support system as a model of human knowledge and reasoning leads to many problems which affect evaluation and validation: understanding of the domain is critical in the development, the domain defines the critical aspects of the system; and the human expertise which is to be modeled is expensive, not easily communicated or even not available. Evaluation is difficult also due to the lack of standards or global, generally accepted methodologies. The development of methodologies originates from research activities, performed and reported studies and collaboration between scientists. Important aspects in evaluation and validation of decision support systems in medicine which we want to emphasize in this paper are: - Evaluation is iterative in nature, it should be performed at different stages of system development including also knowledge acquisition. - The system should be evaluated as part of the environment, not as a stand-alone device. Decision support systems are mainly intended as

supporting tools for end-users, therefore evaluation should measure the quality characteristics of the user-system integrated behaviour, the user's performance with the system. These themes are discussed in this paper with a focus on the criteria which should be used in evaluation studies, when evaluations should be performed and who should be involved in evaluation activities. The methods and metrics for evaluation are not discussed in this paper.

2. W h e n to evaluate and w h i c h criteria to use?

A basic theme is that if a system cannot be evaluated it should neither be taken into use. Any evaluation becomes absurd when performed without a formulated goal, or when performed on a system which has no stated functional objective [3,191. Evaluation has mostly been considered concerning only the developed system and when it is about to be introduced into clinical practice. Evaluation should, however, cover all the phases of the development process starting from the planning and design phase to the final system which is used in routine. And after acceptance of the system to be taken into use, the user should prepare himself for a continuous follow-up of the system. He has to analyze whether the basis for qualification still holds. This requalification helps the user to judge the expected life-time of the system, provides the basis for maintenance and updating and gives ideas for the specification of future plans [19]. Evaluation activities aim at giving feedback for the developers and experts from the development process, at validating whether the system compares with the expert clinicians and at helping the clinicians in the statistical analysis of large volumes of data [15]. Users should be involved in evaluation activities during the system's development, because they are able to assess the competence of the system in its tasks and in the user-system interaction. Involvement of users also generates interest in the system and they may become the advocates of the system in their own environments

[8].

231

The results reported from various studies undertaken in health care have been quite subjective in the absence of objectives and standards. The lack of any set of criteria has led to the assessments of the quality of these systems to be based on decision making accuracy only, e.g. MYCIN [33] and INTERmST-1 [16]. Efforts have been made to establish a conformance testing service aiming at a standardized qualification of medical decision support devices or systems [7]. But conformance testing can have no role without underlying standards that serve as a frame of reference. Criteria for evaluation of medical decision support systems are discussed in the following from the perspectives of knowledge acquisition, life-cycle model of development and the user-system integrated behaviour. It is our argument that all these perspectives are needed in evaluation studies to ascertain the validity of the development and the resulting system and its use in the user environment by the end-users.

2.1. Knowledge acquisition Knowledge acquisition is a process where application task-related knowledge and problem-solving methods are extracted from various knowledge sources, analysed and modelled, represented and transferred to a knowledge base. In knowledge acquisition we have to consider the following questions or issues: what knowledge should be elicited, - how different types of knowledge should be elicited, - how the knowledge should be arranged and modelled, - how the knowledge should be validated. In a knowledge elicitation phase the domain expert tries to externalise his/her knowledge and the knowledge engineer tries to elicit relevant knowledge. The results from the knowledge elicitation phase are collections of transcript data, which are further analysed and interpreted to yield knowledge [18]. In knowledge acquisition experts have most often been seen as passive sources of knowledge. But experts are not only sources of knowledge, they have an active role in knowledge acquisition. Knowledge is a result from knowledge -

acquisition activities, it results from different communication, modelling, interpretation and representation operations where different frameworks are used. Pragmatically, knowledge acquisition starts with isolating relevant knowledge and discovering key elements of the domain problem, i.e., identifying the characteristics of the task situation and the method used by the expert. The resulting general content model of a domain is a conceptual model of the domain problem. The assessment of the quality of this modelling operation is achieved through conceptual validation [2]. Conceptual validation considers both the correctness of the basis of the model and that the model is reasonable for the intended application. Models of conceptual models are achieved through knowledge analysis and communication and they consist of series of models which represent the interrelationships of task characteristics as modeled by the problem solver. Elicitation validation supports the assessment of validity of these knowledge models and modelling operations [2]. Key issue in this phase of modelling is to find a representation that adequately represents the contents and is also understandable for the parties examining it. Elicitation validation refers to validation of knowledge communication and analysis. Key question is the quality control of knowledge models where also accordance with standardized terminologies, taxonomies and golden standards should be considered. Conceptual validation and elicitation validation constitute the validation of the knowledge acquisition process [2]. These validation phases result in a model of the domain in a representational formalism which can further be manipulated for computational purposes. Important in this model is that it captures the content of the domain and represents it in such a way that knowledge is understandable and that the representation is complete, formal and can be communicated. Only then domain experts have a possibility to understand if the model should be modified or not, based on the domain knowledge evolution or changes in cognition. Domain knowledge evolution means that the needs of the application domain and domain experts are evolving over time during the develop-

232

ment and maintenance of a knowledge-based system. Origins of knowledge and the ways of discovering, identifying, generalizing, validating, and distributing the knowledge are issues under consideration. The cognitive processes of an expert, such as accumulation of practical experiences, utilization of various knowledge sources and communication of knowledge during knowledge acquisition change the conceptual basis of knowledge. The experts have their individual cognitive processes active and they are involved in social processes which emphasize the intersubjective and public aspects of knowledge. Beside acquisition of knowledge from relevant experts, knowledge could also be extracted from databases through statistical techniques like discriminant analysis and Bayes' theorem as well as artificial intelligence approaches, for example inductive learning [11-13]. In such cases the decision support system should be tied to the database system. The information content of the database is also likely to evolve over time. An ideal approach would be to have a mechanism available automatically to review the database contents and revise the decision making logic whenever necessary. Several statistical methods exist for extracting knowledge from a database [6]. The knowledge in a knowledge base extracted from a database through some statistical methods could be validated with other relevant statistical methods applicable for knowledge extraction from the database [5]. The knowledge acquisition process, where knowledge is acquired either from experts or from databases, is linked with an on-going evaluation by domain experts. Knowledge acquisition helps the experts to structure and understand better the domain and their own expertise.

2.2. Life-cycle approach We feel it is important to consider the time aspects involved in any evaluation activities, as also noted by Gaschnig et al. [8]. Time is also related to the question of methodological scaling, i.e., selection and application of appropriate evaluation strategies and methods in relation to the developmental level or to the scale of the system.

The life-cycle approach to evaluation of decision support systems divides evaluation into four phases: (1) preliminary exploration of the system specifications; (2) validity of the system in application; (3) functionality and usability of the system in the user's environment; and (4) the impacts of the system on users, organization, health care delivery, quality of patient care, etc. [22]. In the preliminary exploration phase of evaluation and validation a global outline of the system is established based on the needs and requirements of the users. The goal of the evaluation in this phase is to look at the objective specification of the system, i.e., user requirements, and the system's specification. The system specification documents are evaluated using a check-list. The check-list includes at least the following items: functionality; evaluatability or testability; completeness; maintainability; consistency and feasibility. Validity of the system in application means judging the output of the system directly. The validity should be judged both from the viewpoints of the final user and the expert who will sanction its use [32]. The domain experts, or group of experts, usually represent the frame of reference against which the performance is compared. They state the acceptable performance range, or the level of expertise, which is to be achieved by the system. The domain experts in end-user organizations are those who accept the system for use, and they are responsible for the monitoring of the quality of the system. Also questions related to the requalification, maintenance and updating of the system are their responsibility [22]. In evaluation of the functionality and usability of the system the main focus is on the user interface, man-machine interaction and on a broad evaluation of the system. Evaluation of the user interface involves the study of at least the following issues: are the desired options available in the user interface, is the screen layout effective, and is the dialogue with the system acceptable? When thinking of the man-machine interaction and broad evaluation of the system we have to find answers to the following questions: -

-

-

233

What is the quality of advice from the system, i.e., self-descriptiveness, conciseness and legibility? What is the reasoning efficiency of the system? - Is the system acceptable for the users, is it easy to use, what are the response times, does it fit into the working practice? What is the brittleness of the system, i.e., what is the system behaviour when it is applied at or beyond the limits of its knowledge? Is the system transferable, i.e., does the system retain its reliability and confidence when applied in another location and can the system be modified by the end-users? The impacts of the system mean the effects which the system has on users, their behaviours, overall health-care process, etc. The key question is: does the use of the system lead to changes and improvements in the users work and in the whole organization? Long-term effects of the system on the quality of health care, cost-benefit balance and professional competence of the users among other things are difficult to state, but should be considered and evaluated during the routine use of the system. The terms which can be used to evaluate the impact of the system in health care may be listed as follows: provision of the clinical care to a single patient; routine management of the health service; education and training of health operators; advancement of medical research, etc. [25]. An important issue related to the impact of the system is whether the system has effects on the important components of the health care process, i.e., on consultation times, length of stay, number and types of examinations and tests ordered, types of treatments used and quality of the users' decisions before and after use [32]. Calibration and requalification of the system are also needed during long-term use. Capability of the system to actualize its knowledge base recursively taking into consideration new facts (for example, information from a new incoming patient) should also be considered and evaluated. This is particularly necessary when knowledge in the knowledge base is extracted from databases employing statistical methods or artificial intelligence approaches. This is an important issue in maintaining a knowledgebased system. Legal and ethical aspects should be

-

studied both from the user's and from the patient's point of view [19].

-

-

-

2.3. The system as an integrated part o f the environment

Rossi-Mori and Ricci [25] emphasize the need to evaluate the system as part of the integrated knowledge environment of the user. They state that before an evaluation considerations are needed along two convergent lines: - The a priori approach: An understanding of the framework where the system is to be integrated, what the system is and what features it shows when compared with other decision supporting aids and the users. The user-system integrated behaviour is that which we are modelling, analyzing and evaluating when looking at decision support systems. - The pragmatical approach: The utilization of existing experiences in the field and development of realistic methodologies for design, development and evaluation of these systems. Important also is to critique these methodologies and assess their value. Any stand-alone decision support system is of little use in medical decision making. It has to be integrated with the user's environment. Evaluation of the fitness of the system into the user's environment raises questions like: does the system help the user to improve the outcome of h i s / h e r work in the clinical setting; is the system easy to use; is it cost-effective, etc.? In a clinical setting it is also important to know how well the knowledge base is related to the local patient population and the specific disease panorama of the local area or country. Adaptation of the system to the local conditions should also be considered and evaluated. That medical data are subject to variability and inaccuracy is well known. Variation in prior probability could be due to observational variation. Also prior probabilities, which play a significant role in diagnostic systems using Bayes' theorem could be greatly influenced by local conditions. As Gill et al. [9] suggested this could well be caused by observer variability, and also differences in referring procedures, etc. This kind of

234

variation could be reduced through careful collection and recording of data. In evaluating the usefulness of a system's assistance we ought to know if the desired goal has been fulfilled after integration of the system into the user's environment. In this connection we refer to the pharmacy module of the HELP system as a good example of a system's assistance in prescribing drugs and quote the authors 'the pharmacy module has resulted in an overwhelming positive response from the medical staff who make few medication-prescribing .errors. The computer allows them to conduct their practice with the assurance that the problems will not be a major part of their risk in caring for the patient' [24].

2.4. Discussion of the criteria and perspectives The need to demonstrate a practical use of decision support systems and the evaluation of programs such as MYCIN and INTERNIST-1 in a clinical environment was called for by Shortliffe and Clancey in 1984 [26]. The absence of any set of criteria had led the assessments of the MYClN and INTERNIST-1 systems to be based on decision making accuracy only. The other components had been ignored. The evaluation of MYCIN - - a system in the domain of antibiotic selection - - was tested for the problem area of meningitis. The study was a comparison of the results of experts versus the system and conducted as a double blind trial where a set of outside experts evaluated the opinions of the experts and the MYCIN system. The evaluators approved 70% of the choices made by the MYCIN system and none of the Stanford faculty achieved a better approval than MYCIN [33]. No attempts were, however, made to find out what the performance of the physicians would have been if they had had access to MYCIN'S replies. It also turned out that a large part of the system's knowledge is in compiled form and not explicitly represented and thus validation of knowledge is difficult. A similar evaluation of the INTERNIST-1 system, however, showed that the human experts and discussants of the clinico-pathologic conferences in the New England Journal of Medicine performed

better. From the possible 43 major diagnoses, INTERNIST made 17 definitively, while the clinicians made 23, and the discussants made 29 [16]. Evaluation focused also on finding reasons for the system's failures, especially those different from humans. The reasons for system failures turned out to be limitations in the system's knowledge base. This in turn has eventually led to the recognition of the importance of causal, temporal and anatomic knowledge in medical expert systems. The evaluation results of decision support systems within larger hospital information systems are available from the report of the HELP hospital information system [24], where pharmacy alerts that were considered life-threatening for 1.8% of the patients in a period of 1 year were triggered. In 94% of the cases physicians complied and altered their prescriptions, thus preventing lifethreatening medication and possibly resultant hospitalization for about 50 patients a month, which could be valued as cost saving. In another study of ten different features of the HELP system, laboratory results, vital signs and pharmacy alerts were ranked first, second and third respectively considering their contribution in clinical practice. This ranking was based on 246 responses of 360 physicians (65% responses) [23]. In a recent paper, Shortliffe [27] has written of the need to develop decision support systems as part of larger hospital information systems, rather than as stand-alone expert systems. An assessment based on consideration of economic factors has been written by Gottinger [10], including factors of shared costs by dispersion of the technology, but a clear objective methodology for evaluation remains undefined in this paper. The experiences described above emphasize the need to link evaluation of decision support systems with their development and to evaluate systems also in their user environment. If we evaluate only final systems we often are confronted with problems which cannot be solved without starting the development from the beginning, as seen with the MYCIN and INTERNIST-1 evaluations. The development of a knowledge-based system aims at capturing a model of human knowledge and reasoning. Knowledge acquisition consists of modelling operations where a model of human

235 expertise is developed. Knowledge acquisition means also definition and documentation of user requirements and needs for the intended system. Conceptual and elicitation validation are means to divide the model development into consecutive phases and through this to study the validity at and between these phases. The life-cycle approach to validation of knowledge-based systems has an operational nature, it relates validity to the phases of the development process. If validity inside and between these phases can be studied and documented, we are more close to find the errors as early as possible and have also the means for correcting them. However, because the validity of the system concerns both the validity of the software and the model, validation of knowledge acquisition should be combined with this life-cycle approach. Knowledge acquisition is a task in the development process and it is a critical task in the sense that the quality of the system is highly dependent on the quality of the knowledge captured. Evaluation of the system as an integrated part of the environment is not easily done, because we should be able somehow to measure both the user with the system and the user without the system in the same circumstances. The life-cycle approach emphasizes studying the impacts of the system on users, their behaviours, their environment, the resulting quality of health care delivery, etc. The impacts can only be studied when we have the system in routine use in the user's environment and we have a possibility for a long-term follow-up of the system and its impacts. For validation of the user-system integrated behaviour an understanding of the framework where the system is to work is needed. This understanding should be gained in the early development phase through knowledge acquisition where the user-system behaviour is modeled and analyzed.

3. Who performs evaluation? Evaluation concerns developers, project managers, users, domain experts and third parties. The roles and tasks of these groups in evaluation studies are discussed in the following.

People involved in the development should take care of the basic testing and performance evaluation with selected test cases. The developers should also provide the system with clear documentation consisting of all relevant information in order to give the user a possibility to inspect both the knowledge base and the functioning of the system. It should also be evaluated and clarified by the developers as to the type of performance of the system and what is the level of skill needed from the user of that specific system. Usual problems with the evaluation by developers are lack of time and resources and a certain blindness to detect the failures in the system's functioning [21]. Project managers should evaluate the development process from management and resource consumption perspectives. The user should always evaluate the system before accepting it into use. It is his/her responsibility to introduce the system into clinical practice and therefore h e / s h e must form his/her own qualified opinion on whether the particular system complies with h i s / h e r needs [19]. Users must state their needs before evaluation: (1) What do we want to achieve with the system? Important issues are quality of care and costeffectiveness. (2) How does a given system comply with our requirements? This question relates to the methodological foundation and the operational characteristics of the system. (3) What will be the consequences of taking a specific system into use? What are the intended effects and the potential side-effects the system has on the environment? Also legal and ethical issues related to the use of the system should be considered. The big problem with user evaluation is the lack of guidelines, standards of what to evaluate and methodologies on how to perform evaluation studies. We refer the reader for these to the recent literature [8,14,15,17,19,20,22,25,28,30-32]. A prerequisite for user validation is that the system itself provides features which facilitate evaluation, i.e., all information needed is available to the user. The user organization must be able to provide a set of test cases, a test bed and users that are capable of manipulating the system. Also

236

expertise is needed to understand the functioning of the system and to assess the results of evaluation studies. Domain experts have two main tasks in evaluation and validation: they have their roles in deciding the quality of the system in application, and they sanction the use of the system in practice. Third parties are needed in evaluation and validation in order to ascertain the objectivity of the persons involved. Those persons representing the third party should not be involved with the development process nor in the organization where the system is planned to be used. Buchanan and Shortlifffe [4] have reported that even these third parties may be biased in their validation if they know that the results have been generated by computers. The way to overcome this is to organize blinded studies like Turing tests. Third parties may also be biased especially in the medical field if the results from the system they are evaluating represent a different school of thought than that of their own. This latter issue is closely related to the difficulties in defining in medicine a gold standard, i.e., a generally accepted correct answer, because there usually exist numerous different, or even competitive schools of thought. Consensus conferences may help to get a gold standard defined and accepted [21].

4. Conclusion

Evaluation and validation are necessary tasks to perform before taking any systems into routine use. Yet, neither single nor global methodology exists which covers the particular problems related to the decision support systems in medicine. However, the user should not despair of the lack of a single methodology; evaluation and validation can be done using a combination of different methods and techniques as described earlier. It is difficult to set a gold standard which could be utilized to evaluate decision support systems in various application areas in such a way that both general and particular requirements for the system operating in a real world situation are considered.

However, consensus conferences may be the way to approach the definition of gold standards. Evaluation of a knowledge-based system should start from the validation of knowledge acquisition. Knowledge acquisition means the development of the domain model and the description of the decision strategies in the domain, i.e., model of expertise. During knowledge acquisition the user-system behaviour is modeled and the requirements and tasks of the system in the user environment are specified. These specifications and models of domain and decision strategies form a basis for the development of a system. The evolution of domain knowledge, cognitive processes of humans and changes in the user environment result in a system that should be open for continuous updating. The life-cycle model of validation sees the development as consisting of iterative phases. A fife-cycle view on development helps the developers, users and experts to monitor the system and to certify validity at each phase before proceeding to the next phase. Most decision support systems are planned to support humans in decision making tasks, they are auxiliary devices rather than replacing devices. They are parts of the working environment of the user. Thus validity of the system is concerned with the integrated human-system behaviour. Supporting devices cannot be evaluated in isolation from their environment. Validity of the system in use has to be studied in a situation where the user is ~¢orking with the system. The criteria presented in this article for evaluation of decision support systems approach the validity of the system from the perspectives of knowledge acquisition, the development life-cycle and the user environment. It is our opinion that all these three perspectives are needed to ascertain the validity of a system. These perspectives help to manage the development of these kinds of complex systems while considering the development as divided into consecutive phases and while studying the validity at and between these phases from different perspectives. Also the two characteristics of these systems - - they are both models and pieces of software - - are included in these perspectives.

237 Acknowledgements The authors thank the international referees whose c o m m e n t s led to s u b s t a n t i a l i m p r o v e m e n t s in t h e p r e s e n t a t i o n of this work.

References [1] W. Adrion, M. Brandstat and J. Cherniavsky, Validation, verification and testing of computer software, ACM Cornput. Surv., 14(2) (1982) 159-192. [2] 1. Bensabat and J.S. Dhaliwal, A framework for the validation of knowledge acquisition, Knowl. Acquis. 1 (1989) 215-233. [3] J. Brender and P. McNair, Watch the system. An opinion on user validation of computer-based decision support systems in medicine, m: Proceedings of the Sixth International Conference on Medical Informatics, eds. B. Barber, D. Cao, D. Qin and G. Wagner, pp. 275-279 (Elsevier/ North-Holland, Amsterdam - New York, 1989). [4] B. Buchanan and E. Shortliffe, Rule-Based Expert Systems: The MYCIN experiments of the Stanford Heuristic Programming Project (Addison-Wesley, Reading, MA, 1985). [5] S. Chowdhury, R. Linnarsson, A. WaUgren, B. Wallgren and O. Wigertz, Extracting knowledge from a large primary health care database using a knowledge-based statistical approach, J. Med. Syst., 1990 (in press). [6] S. Chowdhury, G. Bodemar, P. Haug, A. Babic and O. Wigertz, Methods for knowledge extraction from a clinical database for liver diseases. J. Comput. Biomed. Res., 1990 (submitted). [7] FDA Draft Policy Guidance for the Regulation of Computer Products: Availability (Docket No. 86D-0380, 1987, Department of Health and Human Services, Food and Drug Administration, Rockville, MD, 1987). [8] J. Gaschnig, P. Klahr, H. Pople, E. Shortliffe and A. Terry, Evaluation of expert systems: issues and case studies, in: Building Expert Systems, eds. F. Hayes-Roth, D.A. Waterman and D.B. Lenat, pp. 241-280 (AddisonWesley, New York, 1983). [9] P.W. Gill, D.J. Leaper, P.J. Guillou, J.R. Staniland, J.C. Horrocks and F.T. de Dombal, Observer variation in clinical diagnosis - - A computer-aided assessment of its magnitude and importance in 552 patients with abdominal pain, Methods Inform. Med. 12 (1973) 108-113. [10] H.W. Gottinger, Technology assessment and forecasting of medical expert systems, Methods Inform. Med. 27 (1988) 58-66. [11] P. Haug, P.D. Clayton, P. Shelton, T. Rich, I. Tocino, P.R. Frederick, R.O. Crapo, W.J. Morrison and H. Warner, Revision of diagnostic logic using a clinical database, Med. Decis. Mak. 9(2) (1989) 84-90.

[12] I. Kononenko, B. Cestnik and I. Bratko, Assistant Professional - - System User Manual (Ljubljana, 1988). [13] N. Lavrac and I. Mozelic, Methods for knowledge acquisition and refinement in second generation expert systems, SIGART Newsl. (Knowledge Acquisition Special Issue) 108 (1989) 63-69. [14] J. Liebowitz, Useful approach for evaluating expert systems, Expert Syst. 3(2) (1986) 86-96. [15] H.P. Lundsgaarde, Evaluating medical expert systems, Soc. Sci. Med. 24(10) (1987) 805-819. [16] R.A. Miller, H.E. Pople and J.D. Myers, INTERNIST-l,an experimental computer-based consultant for general interhal medicine,New Engl. J. Med. 307 (1982) 468-476. [17] P.L. Miller, The evaluation of artificial intelligence systems in medicine, Comput. Methods Programs Biomed. 22 (1986) 5-11. [18] P. Nyk~inen, J. Rantanen and K. Saarinen, Knowledge Acquisition from Domain Experts. KAVAS-AIMproject, Technical Report 11 (January 1990). [19] P. Nyk~inen (ed.), Issues in Evaluation of Computer-based Support to Clinical Decision Making. Report of the SYDPOL-5 Working Group. Research Report 127 (Oslo University, Oslo, 1990). [20] R. O'Keefe, O. Balci and E. Smith, Validating expert system performance, IEEE Expert, 2(4) (1987) 81-89. [21] D. O'Leary and R. O'Keefe, Verifying and Validating Expert Systems. IJCAI '89, Tutorial MP4 (August 1989). [22] R.R. O'Moore, K. Clarke, J. Brender et al., Methodology for Evaluation of Knowledge Based Systems. AIM-KAvAS Project, Technical Report 32 (1990). [23] Personal communication, LDS Hospital Physician Questionnaire. Results presented to medical staff meeting, September 29, 1989. [24] T.A. Pryor, R.M. Gardner, D. Clayton and H.R. Warner, The HELP System, J. Med. Syst. 7 (1983) 87-102. [25] A. Rossi-Mori and F.L. Ricci, Comprehensive criteria for the evaluation and design of knowledge based systems in medicine, in: Systems Engineering in Medicine, eds. J. Talmon and J. Fox, pp. 1-13 (Springer-Verlag, New York, 1990). [26] E.H. Shortliffe and W.J. Clancey, Anticipating the second decade, in: Readings in Medical Artificial Intelligence, eds. E.H. Shortliffe and W.J Clancey, pp. 463-472 (Addison-Wesley, Reading, MA, 1984). [27] E.H Shortliffe, Testing reality. The introduction of decision support technologies for physicians. Methods Inform. Med. 28 (1989) 1-5. [28] M.A. Shwe, S.W. Tu and L.M. Fagan, Validating the knowledge-base of a therapy planning system, Methods Inform. Med. 28 (1989) 36-50. [29] H.G. Sol, Paradoxes around DSS, in: Decision Support Systems: Theory and Application, NATO ASI Series, F31, eds. C.W. Hoslapple and A.B. Whinston, pp. 3-18 (Springer-Vedag, Berlin, 1987). [30] E. Soloway and D. Littman, Evaluation of Expert Systems - - Examples and Principles. IJCAI '87, Tutorial No. 9 (Milan, 1987).

238 [31] P. Sorgaard, Evaluating expert system prototypes. Paper presented at the 9th Scandinavian Seminar on Use and Development of Information Systems, BEstad, August 1922 (1986). [32] J. Wyatt and D. Spiegelhalter, Evaluating medical decision aids: what to test and how, in: Systems Engineering

in Medicine, eds. J. Talmon and J. Fox, pp. 1-13 (Springer-Verlag, New York, 1990). [33] V.L. Yu, L.M. Fagan, S.M. Wraith et al., Antimicrobial selection by a computer: a blinded evaluation by infectious disease experts, J. Am. Med. Assoc. 242 (1979) 1279-1282.

Evaluation of decision support systems in medicine.

Evaluation deals with the measurement or judgement of system characteristics and with comparison of these with the frame of reference. Evaluation of m...
803KB Sizes 0 Downloads 0 Views