J Med Syst (2014) 38:79 DOI 10.1007/s10916-014-0079-0

SYSTEMS-LEVEL QUALITY IMPROVEMENT

Developing a Semantic Web Model for Medical Differential Diagnosis Recommendation Osama Mohammed & Rachid Benlamri

Received: 4 October 2013 / Accepted: 4 June 2014 # Springer Science+Business Media New York 2014

Abstract In this paper we describe a novel model for differential diagnosis designed to make recommendations by utilizing semantic web technologies. The model is a response to a number of requirements, ranging from incorporating essential clinical diagnostic semantics to the integration of data mining for the process of identifying candidate diseases that best explain a set of clinical features. We introduce two major components, which we find essential to the construction of an integral differential diagnosis recommendation model: the evidence-based recommender component and the proximitybased recommender component. Both approaches are driven by disease diagnosis ontologies designed specifically to enable the process of generating diagnostic recommendations. These ontologies are the disease symptom ontology and the patient ontology. The evidence-based diagnosis process develops dynamic rules based on standardized clinical pathways. The proximity–based component employs data mining to provide clinicians with diagnosis predictions, as well as generates new diagnosis rules from provided training datasets. This article describes the integration between these two components along with the developed diagnosis ontologies to form a novel medical differential diagnosis recommendation model. This article also provides test cases from the implementation of the overall model, which shows quite promising diagnostic recommendation results. Keywords Semantic Web . Differential Diagnosis . Evidence–based Recommendation . Proximity–based Recommendation . Ontology Driven Data Mining . Clinical Pathways This article is part of the Topical Collection on Systems-Level Quality Improvement O. Mohammed (*) : R. Benlamri Department of Software Engineering, Lakehead University, 955 Oliver Road, Thunder Bay P7B 5E1, ON, Canada e-mail: [email protected]

Introduction A health provider's ability to make correct decisions regarding patient care is predicated on the correct identification of a patient's disease. However, the process of developing diagnostic certainty remains a challenging task despite an increasingly sophisticated array of available diagnostic modalities and techniques. Differential diagnosis (DDx) is increasingly considered as a systematic clinical method to diagnose a specific disease in a patient. It is an iterative process often involving first making a list of possible diagnoses, then attempting to eliminate diagnoses from the list by conducting more tests and analysis until one diagnosis remains [1]. This process can be complicated by the fact that the clinical features can have varying degrees of specificity, as well as by the presence of features unrelated to the disease itself. DDx is highly dependent on the experience of the physician and the availability of laboratory tests [2]. Also, there are certain diseases that tend to be over–diagnosed or under–diagnosed. Thus the process of diagnosing diseases is a multi– step process that is complex, challenging and imprecise, as experience, symptoms and signs vary widely. Therefore, clinicians need support to integrate a broad range of findings along with a patient's symptoms and signs [3]. This paper introduces two inter–related approaches to developing tools for DDx recommendation. The first approach uses semantic web technologies based on transforming flexible clinical pathways into rules for medical diagnosis. It systematically guides clinicians, according to a disease–symptom ontology (DSO). It also identifies signs and symptoms and prompts clinicians, according to clinical pathways rules, to provide lab test results in order to reach a diagnostic decision. This approach is called evidence–based recommendation. The second approach utilizes semantic web technologies for navigating, with the aid of a patient ontology (PO), clinical

79, Page 2 of 18

documents to extract important clinical and diagnostic variables. To find data for these variables, it uses diagnostic data produced by the first approach, as well as data from sound diagnostic datasets. Data mining techniques will then be applied to the data in order reach a diagnostic decision. The second approach is called proximity–based recommendation. Background and related work Traditional medical decision support systems are implemented using static rules to model knowledge that can help in the diagnosis process [4]. Mostly, they utilize very restrictive types of diagnostic technologies such as direct deductive systems [5], decision trees [6], or neural networks [7]. Moreover, most of these systems are stand–alone, disease–specific, non-adaptive and isolated from clinical applications [4]. These shortcomings have been addressed by introducing Clinical Pathways (CP), which promote integrating decision support systems in the clinical workflow [8]. CPs are standardized protocols comprising detailed medical plans that correspond to specific actions for the diagnosis, treatment and follow-ups of the patient. These protocols also encapsulate domainspecific knowledge in order to tackle any exceptional events (adverse events) that might occur during their execution and entail quick response and appropriate modifications to the proposed treatment scheme [8]. One approach to cope with the dynamic nature of CPs is to employ an innovative adaptive rule-based engine, which can handle the implantation of evolving clinical pathways knowledge into its rule base. By implantation we mean directly transferring CP knowledge into rules compatible with a particular adaptive rule-based system. Another approach to integrating CPs into medical decision support systems is to guide their dynamic variation through ontologies. Instead of directly translating CPs into rules for a specific rule engine, ontologies can be constructed to represent CP knowledge in a standard and relational way [9]. Then, the resulting clinical pathway ontology (CPO) is translated into rules that are compatible with a specific adaptive rule-based system. It should be noted that it is easy to convert ontologybased rules to knowledge representation specific to any rule engine. However, it is much more difficult to convert rules from one format into another. Finally, a third approach was proposed combining a CPO along with SWRL1 (Semantic Web Rules Language) rules to handle exceptional scenarios in CP execution [10, 11]. However, the idea of creating CPO's is a new research area with many challenges. Engineering ontologies for CPs is a great challenge [12, 13] and the best notable ontology known to date is the KEGG Pathways2 database representing molecular dataset pathway maps, used 1 2

http://www.w3.org/Submission/SWRL/ http://www.genome.jp/kegg/pathway.html#global

J Med Syst (2014) 38:79

for biological interpretation of higher-level systemic functions that describe genetic links for sets of diseases. There are no other clinical pathway ontologies in existence, but many researchers proposed ontological approaches for clinical pathways representation for use with medical recommendation systems [14–21]. In the absence of CPOs, the first approach represents the only viable approach for implementing CPs. This paper investigates a semantic web based approach to integrating CPs into medical decision support systems. However, operationalization of CPs brings to the forefront several challenges [22]: (i) abstracting practice-oriented knowledge from paper-based CPs; (ii) representing CP knowledge, including disease, symptom and drug in a semantically-rich formalism; (iii) managing clinical evidence to maintain CP integrity; (iv) integrating CPs with patient-specific data and clinical applications; and (v) tolerance to noise in patient data. Much research work has been done to deal with the above-mentioned challenges and both knowledge and non-knowledge based approaches have been considered. Challenges (i), (ii), (iii) and (iv) were mainly addressed using knowledge based approaches, adopting semantic web technologies. Ontological knowledge representations were used to develop a set of interoperable knowledge modules, which together provide coverage of the human disease diagnosis and treatment domain. Non-knowledge based approaches however, were used to address challenge (v), adopting mainly machine learning techniques [23]. Ontological representation of medical knowledge requires the use of standardized vocabularies to ensure both shared understanding between people and interoperability between systems [24, 25]. Internationally, there are countless existing biomedical vocabularies such as SNOMEDCT,3 LOINC4 and ICD-9 CM,5 UMLS.6 Unfortunately, many of these existing biomedical vocabulary standards rest on incomplete, inconsistent, or confused accounts of basic terms pertaining to diseases, diagnoses, and clinical phenotypes [25]. There are indeed several attempts to harmonize such terminologies but such efforts are at their infant stage [26]. For this reason, there is a need for an effective way to store and retrieve knowledge related to human diseases using standard comprehensive medical ontologies, instead of just relying on biomedical vocabularies. The most prominent disease ontology developed to date is the Human Disease Ontology (DOID).7 Started in 2003 as part of the NUgene project at Northwestren University, it contains to this date over 8600 known human diseases and 14600 terms. DOID is currently a 3

http://www.openclinical.org/medTermSnomedCT.html http://loinc.org/ 5 http://icd9cm.chrisendres.com/ 6 http://www.nlm.nih.gov/research/umls/ 7 http://do-wiki.nubic.northwestern.edu/index.php/Main_Page 4

J Med Syst (2014) 38:79

standard ontology adopted by the OBO Foundry.8 At the same time the Institute for Genome Sciences (IGS) at the University of Maryland has developed the Symptom Ontology (SYMP)9 that effectively hierarchically describes more than 900 symptoms. SYMP's hierarchy categorizes symptoms under certain headings for example categorizing all types of pain (arm, leg, headache, back pain, chest pain, etc) under physical pain. However, disease diagnosis requires both ontologies to be integrated, annotated and aligned so that every disease can identify relevant symptoms and vice versa. Medical decision support systems do not just rely on well structured medical knowledge, but require large amount of discriminating diagnosis factors, including the risk factors, similarities, anomalies and other relationships between individual symptoms, signs and lab tests. However, often realworld medical decisions are based on partial information due to the challenges posed by the many dimensions of distributed medical information. The manual construction and maintenance of such evolving diagnosis knowledge is both costly and difficult [27]. In this direction, the power of knowledge discovery, available within the paradigm of data mining, from networks and graphs of symptoms, signs and lab tests is much more effective. Data mining knowledge discovery involves approximate predictions with variable degrees of certainty depending on the availability and quality of training data. In this paper, we propose a hybrid system that combines both semantic web technologies and machine learning to leverage the relative benefits of both knowledge & non-knowledge based approaches. The system uses ontological knowledge representations, logic based inference, and probabilistic reasoning. Clinical information is represented using concepts defined in a disease-symptom ontology. Clinical pathways’ protocols and disease-diagnosis expertise are encoded into inference rules and applied to the knowledge using a semantic reasoner for decision making. Moreover, the semantic engine is augmented with machine learning based inference models that are resilient to noise in the patient data, thus making the system capable of handling a wide range of medical queries whether the patient data is complete or incomplete. Developed methodology and architectural design Currently, there is no ontology that establishes relations between disease class hierarchies and symptom class hierarchies. Moreover, there is also no ontology to help analyzing the various components of a variety of medical documents and notes (e.g. Chief Complaint Form, Triage Nurse Note, Medical Note and Electronic Health record). The existence of such ontologies would be very useful for any diagnosis medical 8

http://www.obofoundry.org/ http://symptomontologywiki.igs.umaryland.edu/wiki/index.php/Main_ Page 9

Page 3 of 18, 79

recommendation system. In our model we are presenting two main knowledge representation schemes to develop such essential ontologies. The first ontology is the Diseases Symptoms Ontology (DSO) [28] and the second is the Patient Ontology (PO) [29]. The DSO and PO ontologies are vital sources of knowledge for the developed DDx recommender model. They can also be used with other medical decision support systems to select relevant patient attributes, symptoms, test data, and certain possible diseases under consideration for diagnosis. Ontological design DSO is an upper ontology produced through the alignment between SYMP and DOID ontologies. Ontology alignment is the idea of combining two (or more) ontologies into one and defining relationships between the concepts of the ontologies forming a new ontology in the process. Alignment between ontologies is a critical challenge for semantic interoperability [30] as well as for producing hybrid (upper) ontologies. As the medical domain is represented by multiple ontologies, there is a need for creating mappings among these ontological elements in order to facilitate the integration of data and reasoning across these ontologies [31]. There are two main approaches to alignment: Ontology Matching and Ontology Linking. Ontology matching techniques are used for relating ontologies on the same domain or on partially overlapping domains [32], while Ontology linking allows elements from distinct ontologies to be coupled with links [33]. A strict requirement is that the domains of the ontologies that are being linked are disjoint. For this reason, ontology linking is appropriate for aligning disease and symptom ontologies as diseases and symptoms are separate concepts. In the case of disease diagnosis, in order to link symptoms to diseases, combining/linking the SYMP & DOID ontologies is necessary. Establishing relations between symptoms classes from SYMP and diseases classes from DOID is done by defining a new has_symptom object property. The developed DSO ontology is based on the OWL (Web Ontology Language) standard and its details are made available in [34]. Access to the DSO ontology is made possible through the use of an ontology crawler, specifically designed to retrieve from the ontology important DDx answers to some important clinician’s queries required for determining which is most likely diagnosis from a list of given diseases (Fig. 1). All the DDx questions require navigating the DSO ontology in order to identify relations between diseases and symptoms in a variety of ways. Examples of such questions are: & &

What are the symptoms of a given disease? If a patient displays a number of symptoms, then what are the possible diseases he/she may have?

79, Page 4 of 18

J Med Syst (2014) 38:79

Fig. 1 Illustration of the components of the DSO ontology Crawler

&

&

If a patient displays a number of symptoms, and a certain disease is suspected then what symptoms of the disease are not displayed by the patient (the so called missing symptoms)? What diseases are related to each symptom displayed by the patient?

In addition to the DSO ontology, there is a need for another ontology to enable processes, such as data mining, to navigate a variety of medical documents for the purpose of finding certain patient attributes. For this purpose, we developed a Patient Ontology (PO) that can aid the DDx diagnosis recommender to extract relevant datasets from any available silo of patient records. The PO is developed using OWL and consists of 241 major classes. Details of the PO ontology are made available in [35]. Evidence-based DDx Recommender The specification of rule-based knowledge is a flexible way for designing scalable systems. Rules are a way of intuitive knowledge representation for building intelligent systems. In this section, we introduce a flexible rule-based DDx recommendation approach that can cope with the dynamism of clinical disease diagnosis knowledge. We call this approach evidence-based DDx recommendation. Clinical pathways are the main knowledge source to derive evidence-based DDX recommendations. Unfortunately, in the absence of CPOs, the best approach is to build-up adaptive higher order knowledge by employing an incremental rule base for disease CPs starting from common diseases like diabetes [36], hypertension [37], anemia [38], and calcemia. We call such incremental rules as clinical pathways confirmation rules.

where d is a disease object (an instance of class Disease) with the disease name "diabetes mellitus type 2". Now

Since most of the available CPs are in the form of care maps,10 a graphical form suitable for clinician’s use, the process of transferring CPs into CP confirmation rules must be made fairly simple to enable clinicians as well as system engineers to create and maintain the new form of CPs expressed as rules. To show the simplicity of this process, we will use the diabetes CP [34] as an example. From [34], Fig. 2 is redrawn to illustrate part of the diabetes CP. Transferring the CP in Fig. 2 into CP rules requires the creation of a few rules that are interrelated. We will show one of the rules that implements one of the pathways in Fig. 2, specifically the pathway highlighted in red. The diagnosis of diabetes begins with performing an FPG (fasting plasma glucose) test which measures glucose (sugar) levels in a patient's blood stream. A rule usually has two parts: condition & action. A rule usually follows the format: . The first condition of the pathway is that an FPG has to be performed. To define this condition in our drools rule format (.drl), we define the FPG test as a test object with a test name equal to "Fasting plasma glucose" as follows:

where t is a Test object (an instance of class Test) with the name “Fasting plasma glucose” A second definition to describe the condition "A FPG test must be performed to diagnose diabetes" must indicate that the FPG test is for diagnosing the disease diabetes. So we define the disease diabetes as a disease object with the disease name equal to "diabetes mellitus type 2" as follows:

10

http://en.wikipedia.org/wiki/Care_map

J Med Syst (2014) 38:79

Page 5 of 18, 79

Fig. 2 Part of a Type 2 Diabetes Clinical Pathway [36]

that we described the starting point (see the red diamond in Fig. 2) our example path within the diabetes pathway, let's take another step down the pathway by considering the possibility that, after performing an FPG test on a patient, the FPG test indicates glucose levels of over 7 mmol/L or 126 mg/dL (1 mmol/L=18.0182

mg/dL). In order to define this possibility in our rule, we need to add another description to our rule. In the description, we need to indicate that this rule describes the possibility that the result of the FPG test is higher than 126 mg/dL. So we define a test result object as follows:

where tr is a TestResult object and (indicated by && in the rule) the unit for the test result is mg/dL (unit == "mg/dL") and the result is 126 or higher (amount>=126). So far, we have three rule descriptions in the condition section of the rule. The next step in our example pathway, is to ask if there are symptoms of diabetes present on the patient. The DSO crawler checks for symptoms, thus we do not include this in our rule. Let’s assume that the DSO crawler confirmed the existence of the symptoms, which means that we jump to the last step of the example pathway, concluding a positive diagnosis of type 2 diabetes. This conclusion needs to be described in the conclusion section of our rule. The conclusion of positive diagnosis of type 2 diabetes is directly related to the test result of the FPG test. Therefore, in the conclusion section of our rule, we add three

rule descriptions to say that if the conditions of the rule are met, then, the result is not normal and it indicates a diagnosis of diabetes. The followings denote these descriptions:

where tr is the same test result object defined in the conditions section of our rule. The process of rule creation continues, where a rule must be created to describe each distinct pathway in the set of CPs for diabetes. The conditions and conclusions in our rule format are described using objects. Simple object oriented classes are written for each object the rules need. In our example rule,

79, Page 6 of 18

J Med Syst (2014) 38:79

simple classes are written for the objects needed by our rule: , , and . We have been able to easily reuse these classes in writing other rules for the remaining diabetes pathways, as well as for the CPs for the other diseases we implemented. The ability to easily reuse these classes across rules greatly simplifies the rule writing process. In each class, attri-

butes needed by the rules are defined. For example, our example rule needed the variable to identify a by its name. Likewise, methods needed by the rules conclusions sections are defined. For example, and methods are defined to indicate whether a certain yields a positive or negative diagnosis of a certain disease.

Listing 4.1: Evidence-based Rule for Diabetes In the proposed model, the confirmation rules take their data from a relational patient database that stores and manages the dynamically changing clinical tests and data. The DSO ontology crawler assists in selecting the right group of confirmation rules. It does so by selecting the clinical pathway rules for diseases that show certain observed patient symptoms. Thus the confirmation rules are agile in two senses, the rule selection is directed by the DSO ontology, and the rules utilize data from a frequently changing clinical database. The DSO and PO ontologies contribute significantly to context-aware reasoning. As new clinical data arrives, the two ontologies select only data relevant to the diagnosis of a specific case. They then feed the selected data to the patient database, which in turn passes the data to the rule engine. Upon receiving the selected data, the rule engine would then compare it to the CP confirmation rules. This process considerably enhances the reasoning and data selection mechanisms. It optimizes the rule firing process by refining the data input to only include data relevant to a specific diagnostic case. The intelligence of our clinical pathways and ontology driven evidence-based recommender is complemented by the interactivity with clinicians. First, interactivity with the physician is essential for any medical recommendation system. During the start of the differential diagnosis process, the ontology driven recommender guides the clinician to prepare a list of possible diagnoses. As pointed out in section 3.1, the

DSO crawler provides clinicians with relational queries that guide clinicians in preparing a list of possible diagnoses, and in choosing the right CPs. When the ontology driven recommender concludes a list of possible diagnoses, the clinician should be able to order the system to rule out one or some of the possible diseases, or include diseases not given by the recommender, or even select one of the diseases as the diagnosis based on knowledge not available to the system. This iterative interaction, in the process of differential diagnosis, between the clinician and the recommender greatly improves the chances of accurate diagnosis. As the system guides the clinician along the CPs of a disease, it asks the clinician for information such as test results and based on that information decides to move closer towards a diagnosis by taking one of the pathways. The element of interactivity here is that the clinician is given the power to use his/her cognition to approve or disapprove each rule-based decision, taken by the recommender, to select one pathway or another in the CPs. If the clinician approves a decision, the recommender moves forward along the pathway. If the clinician disapproves a decision, the physician selects an alternative path/decision, and the recommender moves forward with the diagnosis process based on this decision. The clinician should be able to update, override, or define exceptions to the rules used by the recommender based on medical experience or interacting with the recommender. A key interaction is that each diagnosis made by the evidence-based recommender must be confirmed by the clinician. Actually, our evidence-based

J Med Syst (2014) 38:79

Page 7 of 18, 79

Fig. 3 Illustration of the Evidence-based DDx Recommendation Model

component of the DDx recommender will not be able to update the data files of the proximity-based recommender (described below) before the authentication and confirmation of the clinician. The defining criteria for interactivity is to engage the physician to make use of his/her cognitive knowledge in the diagnosis process better yet to learn from his/her heuristic knowledge, learn from clinical data, and to more accurately verify medical diagnosis decisions. This interactive iterative process of differential diagnosis allows the recommender to become dynamic, context-sensitive, and clinician-centric as well as to yield adaptive and evolving CPs. This added value process sets the DDx recommender apart from traditional static expert systems or clinical support systems. Actually adding the interactivity feature to the CPs is very important, as many of the diseases pathways are increasingly complex, and the clinicians are overwhelmed with information, and have no time to spend researching them. Without the element of interactivity, clinicians may send a patient down the wrong path which can result in less than optimum clinical outcomes. In the proposed model, we call a rule engine based on dynamic updateable clinical pathways rules, and supported by diagnosis ontologies for context-aware reasoning an evidencebased system. The word "evidence" refers to the clinical pathways. Figure 3 shows how evidence-based system works. The dynamic nature of the type of rule-based system required to represent clinical pathways needs a rule-engine that can support dynamism. For this purpose, our model uses the Drools11 rule engine where rules can be represented using a 11

http://www.jboss.org/drools

declarative form, and where changes in these rules and their firing sequences are maintained by a forward chaining mechanism. The mechanism relies on dynamically changing clinical findings and is guided by the DSO and PO ontologies. It fires after new clinical data is selected by the two ontologies and written to the patient database. Figure 4 illustrates the CP rules firing mechanism for the DDx Evidence-based recommender. Proximity-based DDx Recommender Using data mining, one can build dedicated predictive models for a specified diagnosis process. The use of classical data mining techniques requires proper understanding of the data in advance [38], as well as the characteristics of the problem domain. Indeed, the data mining process will not be effective without the availability of a training dataset, as well as clear knowledge of the data hierarchies. In this direction, there are few recent research attempts based on ontologies to clearly define the data hierarchies in order to direct the prediction of the data mining techniques, as well as to filter training datasets from large-scale datasets [39]. In the proposed system, we used the DSO and PO ontologies to assist clinicians in the selection of several clinical attributes (e.g. patient attributes, lab results, and suspected diseases) that are relevant to a certain diagnostic case. We call this model the Ontology-Driven Proximity-based DDx Recommender. Clinicians need to extract, from patient data records, the relevant symptoms and test data for a certain diagnostic case. Using the proximity-based DDx recommendation approach, the PO and DSO ontologies will select relevant patient attributes, lab test attributes, and certain possible diseases under consideration for diagnosis. This process resembles what we used in the evidence-based approach where only clinical data

79, Page 8 of 18

J Med Syst (2014) 38:79

Fig. 4 The Dynamics of Rule Firing of the Evidence-based DDx Recommender

relevant to a certain diagnostic case are selected by the DSO and PO ontologies and written to the patient database. However the evidence-based approach uses the selected data as part of a rule firing mechanism to reach diagnostic conclusions. On the other hand, the proximity-based DDx recommendation approach uses the selected data as training data, which is fed to data mining algorithms that produce diagnostic recommendations based on trends in the training data. Training data for the selected clinical variables will be used by data mining classification algorithms to learn diagnosis trends. After the training process, these classification algorithms use the learned diagnosis trends to predict the diagnosis for provided test data (i.e. new clinical data cases that require diagnosis). Note that learned diagnosis trends are considered valid predictors of diagnosis for test data only when the clinical attributes of the test data largely match the clinical attributes of the training data. The association algorithms use the training data to learn rules relating diagnosis variables (clinical attributes) and other diagnosis variables, as well as rules relating diagnosis variables and possible diagnoses. These newly generated rules can be used to make diagnosis predictions/recommendations. These newly generated rules can also be added to, or can override or update clinical pathway rules from the evidencebased approach. Using the proximity-based DDx recommendation model, clinicians can ask the following queries in order to conduct both association and classification data mining processes. 1) Find whether a patient has a certain disease or not, based on the patient's demographic information and lab test results. (Classification using Test Data) 2) Find associations/rules describing the relation between a patient's demographic information and lab test results on one hand, and a certain possible diagnosis on the other hand. (Association Rules) For the purpose of implementing the ontology-driven proximity model component, we selected the Weka12 API to

provide variety of useful data mining services including classification and association algorithms. As for the training data, the training data used by the Weka APIs can be either the data generated from the confirmed cases, produced by the Evidence-based DDx component, or a hybrid of this data along with a large standardized patient dataset. In our case, we used a hybrid data that includes the evidence-based local data as well as the HCUP NIS,13 which is data of hospital inpatient stays containing data from about thousand US hospitals. Actually there is one NIS dataset for each year from 1988 to 2009, each containing from five to eight million patient stays with 126 clinical and non-clinical data elements for each visit. Nonclinical elements include patient demographics, hospital identification, admission date, zip code, calendar year, total charges and length of stay. Clinical elements include procedures, procedure categories, diagnosis codes and diagnosis categories. Every record contains a vector of 15 diagnosis codes (1 primary diagnosis and 14 secondary diagnoses). The diagnosis codes are represented using the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM).14 Several pre-processing steps such as cleaning and filtering operations need to be carried out to extract relevant HCUP NIS attributes into the required ARFF file format so that it can be processed by the Weka API. Figure 5 illustrates our vision of proximity (data mining) ontology driven DDx recommendation.

Linking proximity-based and evidence-based DDx components The evidence-based and proximity-based approaches for DDx recommendation can cooperate to form an overall DDx recommendation model. Results generated by the evidencebased DDx recommender can be used for prediction by the data mining algorithms of the proximity-based DDx recommender. On the other hand, data mining algorithms from the 13

12

http://www.cs.waikato.ac.nz/ml/weka/

14

http://www.hcup-us.ahrq.gov/db/nation/nis/NIS_Introduction_2006.jsp http://www.cdc.gov/nchs/icd/icd9cm.htm

J Med Syst (2014) 38:79

Page 9 of 18, 79

Fig. 5 Ontology Driven Proximity-based DDx Recommendation Model

proximity-based approach generate rules that can be used to update the rule base of the evidence-based approach. Figure 6 illustrates the overall DDx recommendation model. Figure 7 illustrates the semantic web technologies involved in the overall DDx recommendation model (e.g. Ontologies, DSO Crawler, Diagnosis recommendation rules for both clinical pathways and data mining approaches). It shows how these

technologies cooperate in order to provide meaningful DDx recommendation. The overall DDx recommendation model allows clinicians to use both the evidence-based and proximity-based DDx recommenders for assistance in diagnosing a specific patient's case. In rare cases, both recommenders for a specific patient's case may provide different diagnosis results. This is due to the

79, Page 10 of 18

J Med Syst (2014) 38:79

Fig. 6 The Integral Model of Evidence-based & Proximitybased DDx Recommendations

fundamental difference between the two recommenders. As explained earlier, the evidence-based DDx recommender reaches a diagnosis decision based on applying CP rules to the data of the patient case under investigation. On the other hand, the proximity-based DDx recommender relies on data mining algorithms to extract rules from data of previous cases. These proximity rules are then applied to the data of the patient case under investigation in order to yield a diagnosis. In some cases, diagnosis results of the two recommenders may be different, especially if the datasets used by the data mining algorithms of the proximity-based DDx recommender are not large and representative enough. If the datasets are representative, we expect that both recommenders’ diagnosis results correlate. Fig. 7 Semantic Web Technologies of the Overall DDx Recommender Model

In case of differences in the predictions of the two recommenders, these differences are logged into the system for statistical analysis, and it is up to the clinician to examine both results and to make his/her own decision about the patient’s case. The clinician may choose one of the recommender’s results in which case the associated entry in the log file is corrected accordingly, or may decide on a third result. The result indicated by the clinician is saved to the patient database. Both recommenders benefit from this process. A new confirmed diagnosis is added to the proximity-based DDx recommender's training dataset, which will help improve its prediction accuracy. The new confirmed case would also update the set of proximity (association) rules of the specific disease that was under investigation. To resolve potential

J Med Syst (2014) 38:79

Page 11 of 18, 79

differences between the two recommendation models, the log file is periodically checked for repeated disagreements. For such disagreements, if any, the updated proximity rules are compared to the clinical pathway rules. If there are any differences between the two rule sets, the clinician would be allowed to either opt for one rule-set (i.e. the proximity-based rules or the clinical pathways rules) or to write a new rule-set to settle the difference. The overall DDx recommender would replace the rules in both rule sets with the new rules suggested by the clinician. This is the spirit of the proposed DDx recommendation model, that is to give the clinician the power to make the final diagnosis decisions.

obtained, these are recorded to the patient database and then fed to the rule engine. The latter fires CP rules for hypertension to determine whether or not hypertension should be eliminated from the list of possible diseases. Figures 8, 9, and 10 illustrate this scenario in the evidence-based DDx recommender. Figure 8 shows a sample list of possible diseases for this scenario, including hypertension. Figure 10 shows the results of the rule engine firing clinical pathway rules for hypertension using blood pressure data in Fig. 9. In Fig. 10, the evidence-based DDx recommender determines that hypertension can be eliminated from the list of possible diseases for this scenario.

Experimental results

Test case for proximity-based DDx recommender

In this section, we first introduce a scenario for each of the evidence-based and proximity-based DDx recommenders, showing how both recommenders can be conveniently used to predict diagnosis of real patient cases. Next, we conduct a quantitative analysis using real cases from the NIS database in order to compare the prediction accuracy between the proximity-based recommender and the evidence-based recommender. We will then show the credibility of the proximity-based recommender, by measuring its prediction accuracy against actual diagnoses made by health care professionals, for real cases extracted from the NIS database. The evaluation study aims at showing the usefulness of the proximity-based DDx recommender, assisting healthcare professionals make more accurate diagnoses.

Using the scenario in section 4.1, the proximity-based DDx recommender follows the same steps as the evidence-based DDx recommender when it comes to being directed by the DSO and PO ontologies, to select the data relevant to the diagnostic case under consideration. It also uses the same process as the evidence-based recommender to obtain the systolic and diastolic blood pressure, which is then recorded into the patient database. However, it proceeds differently, extracting similar type of blood pressure from other patient records, and arranging it into a tabular format where each row would represent a patient record, and each column would represent a clinical variable relevant to the diagnostic case in question. The constructed table is then fed to a classification data mining algorithm in order to produce a diagnostic prediction about whether or not the patient has hypertension. One example classification for this case is shown in Figs. 11 and 12. Figure 11 shows classification statistics of the J48 classification algorithm for the table of blood pressure results, while Fig. 12 shows the classification tree for the blood pressure data. Table 1 shows the accuracy of the classification when the J48 classification tree, shown in Fig. 12, is used to predict the diagnosis for a number of blood pressure measurements from different individuals. It would either classify the measurements for each individual as normal or as indicative of hypertension. The classification accuracy indicates the percentage of times the classification algorithm makes the correct classification of blood pressure measurements. The correct expected classifications of the blood pressure measurements are compared to the classifications made by the classification algorithm in order to calculate its accuracy. Table 1 also shows the classification accuracy when using other classification algorithms for the same blood pressure measurements.

Test case for evidence-based DDx recommender Let us consider a scenario for the use of the evidencebased DDx recommender where a clinician observes that a patient is experiencing fatigue and headache. The evidence-based DDx recommender takes the clinician's note into consideration by recording the two symptoms, which are then used by the DSO ontology crawler. The crawler matches the symptoms to the possible diseases that display both symptoms. Emulating the process of differential diagnosis, the recommender extracts the list of possible diseases through systematic elimination. For example, hypertension is one of the possible diseases where patients experience fatigue and headache among other symptoms. The recommender, using the PO ontology, will select indicators that are relevant to the diagnosis of hypertension. One important indicator is systolic and diastolic blood pressures. The recommender would then look into the clinician's notes to find the blood pressure of the patient. Once the blood pressure measurements are

79, Page 12 of 18

J Med Syst (2014) 38:79

Fig. 8 Hypertension in the List of Possible Diseases in a Scenario of the Evidence-based DDx Recommender

The blood pressure table would also be fed into the Apriori association data mining algorithm in order to produce rules

relating blood pressure to the diagnosis of hypertension. One example rule is:

The above rule means that if the systolic blood pressure is in the specified range then the diagnosis is positive

for hypertension. The rule's confidence factor is 1 (i.e.100%).

Fig. 9 Blood Pressure Data Input into the Evidence-based DDx Recommender

J Med Syst (2014) 38:79

Page 13 of 18, 79

Fig. 10 Diagnostic Results Scenario of the Evidence-based DDx Recommender

Validating the effectiveness of the proximity-based DDx recommender The proposed DDx recommendation model allows clinicians to use both of the evidence-based and proximity-based DDx

Fig. 11 Classification Statistics for Proximity-based DDx Recommendation for a Possible Diagnosis of Hypertension

recommenders for assisting them diagnose a specific patient. As explained in section 3.2, in rare cases, the two recommenders for the same patient's case may provide different diagnosis results. In such cases, it is up to the clinician to examine both results and make his/her own judgment by

79, Page 14 of 18

J Med Syst (2014) 38:79

Fig. 12 Classification Tree for Proximity-based DDx Recommendation for a Possible Diagnosis of Hypertension

either opting for one of the recommender’s results or sets a new diagnosis. To illustrate the above-mentioned process, let us examine a number of rule pairings. Rule pairings represent equivalent evidence-based and proximity-based rules, in the sense that

Listing 4.2: Clinical Pathways Evidence-based Rule for Diabetes

Table I Comparing the Accuracy of different Classification Algorithms for Blood Pressure Data Algorithm

Algorithm Type

Prediction Accuracy

Zero R JRIP NNGE J48 NBTree NaiveBayes

Rule-based Rule-based Rule-based Decision Tree-based Decision Tree-based Bayes Probabilistic Classifier

50.36 100 100 100 100 100

they both describe the same condition, which is defined by the same set of variable (s)/attribute (s). Let us consider the following two equivalent set of rules used to diagnosis diabetes. These are extracted from the developed knowledge base and are referred to in this example as rule listings.

J Med Syst (2014) 38:79

Page 15 of 18, 79

Listing 4.3: Proximity-based Rule for Diabetes

Listing 4.4: A Second Clinical Pathways Evidence-based Rule for Diabetes

Listing 4.5: A Second Proximity-based Rule for Diabetes Both the evidence-based rule (listing 4.2) and the proximity-based rules (listing 4.3) refer to the same attribute (fasting plasma glucose) in their conditions sections. Therefore, these two rules are said to be equivalent but not exactly equal. The evidence-based rule says that if the fasting plasma glucose is less than 109.8 mg/dL then this should be considered normal. On the other hand, the proximity rules 7 & 17 (see listing 4.3) have the combined effect of indicating that if the fasting plasma glucose is less than 76.6 mg/dL then this is considered a normal result. Of course, here the evidence-based result is the most accurate since it is based on prior medical knowledge (clinical pathways) while the

proximity-based result is based on data extracted from 64 patient cases. It seems that significantly more patient cases are required to achieve a more accurate result. With 64 cases only, the proximity-based rules are found to exhibit 30% error rate with the maximum normal FPG. If we look at the equivalent rules that check for the abnormal/diabetic FPG, we find that the proximitybased rule (see listing 4.5), based on the same 64 patient cases, is closer to the evidence-based rule (see listing 4.4). The evidence-based rule states that if the FPG test result is higher than 125 mg/dL, then the result indicates diabetes, while the proximity-based result states that an FPG result higher than 139 mg/dL indicates diabetes. The proximity-based rule here has an error rate of 17%.

79, Page 16 of 18

J Med Syst (2014) 38:79

Listing 4.6: A Clinical Pathways Evidence-based Rule for Hypertension

Listing 4.7: A Proximity-based Rule for Hypertension However, we have found more encouraging results in our investigation of proximity-based association rules. We used a table of more numerous hypertension/hypotension patient Fig. 13 Accuracy of Proximity-based Rules

cases. The used experimental dataset are for two groups of patients. The first data set is for 430 patients with normal and hypotension (low blood pressure), while the second data set is for 860 patients with normal and hypertension (high blood

J Med Syst (2014) 38:79

pressure). The proximity-based rules produced by the association algorithm from the hypertension patient cases are more in line with the evidence-based rules based on the clinical pathways of hypertension. For example, the evidence-based rule in listing 4.6 states that a diastolic blood pressure of more than 90 mmHg is considered hypertensive. The corresponding equivalent proximity rules in listing 4.7 state that a diastolic blood pressure of more than 87.4 mmHg is considered hypertensive. The difference in the maximum normal diastolic blood pressure is only 3%. The effect is that the clinical pathways evidence-based rules validate the proximity-based rules, meaning that the proximity-based process is successful at producing accurate rules. Therefore, the proximity-based DDx recommender is a powerful tool for generating perhaps new diagnosis rules and predicting the diagnosis of patient cases based on data from similar patient records where the diagnosis is known. Figure 13 shows the increasing accuracy of the proximity-based rules, compared to the clinical pathways rules as basis, as more patient cases are used to generate such rules.

Conclusions The central argument of this article is to develop an effective DDx recommender model by integrating methodologies that combine the best clinical disease diagnosis practices along with methods from the paradigm of the semantic web. Central to this argument is the integration between “clinical pathways”, which constitute the foundation for evidence-based practice, with the power of proximity diagnosis and prediction as exemplified by the techniques and methods of “data mining”. The evidence-based diagnosis process develops dynamic rules based on standardized clinical pathways, while the proximity-based process employs data mining to provide clinicians with diagnosis predictions, as well as generates new diagnosis rules from provided training datasets. The paper utilizes several semantic web techniques such as the development of the DSO upper-ontology which aligns the symptoms and disease ontologies, thus facilitating knowledge integration and reasoning across the two spaces. The experimental results show the credibility of the proximity-based recommender, which is demonstrated by measuring its prediction accuracy against actual diagnoses made by health care professionals for real cases extracted from the NIS database. The results also show the usefulness of the proximity-based DDx recommender, assisting healthcare professionals make more accurate diagnoses. Further research work is underway to develop a user-friendly rule generation toolkit based on CPs that is usable by clinicians. Also, the present system prototype is mainly used for diagnosing diabetes and hypertension. Further work is needed to expand the system’s knowledge base to cover incrementally additional diseases.

Page 17 of 18, 79

References 1. Rodriguez, A., Mencke, M., Alor-Hernandez, G., PosadaGomez, R., Gomez, J.M., Aguilar-Lasserre, A.A.: MEDBOLI: Medical Diagnosis Based on Ontologies and Logical Inference. In: IEEE International Conference on eHealth, Telemedicine, and Social Medicine, 2009, eTELEMED '09, Cancun, Mexico, Feb.1-7 (2009) 2. Köhler, S., Schulz, MH., Krawitz, P., Bauer, S., Dölken, S., Ott, C.E., Mundlos, C., Horn, D., Mundlos, S., Robinson, P.N.: Clinical diagnostics in human genetics with semantic similarity searches in ontologies. In: Am J Hum Genet., 85 (4), pp. 457–464. October 9 (2009), http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2756558/ ?tool=pubmed 3. Weiner, M., Pifer, E., Williams, V.: Computer-Aided Diagnosis. In: Encyclopedia of Biostatics, Published Online by John Wiley & Sons, 15 JUL (2005) 4. Diamond, H., Clinical Reminder System: A Relational Database Application for Evidence-Based Medicine Practice, 2004. 5. Yeh, Y.C., Kuo, Y.H., Hsu, D.S.: Building expert systems by embedding analogical reasoning into deductive reasoning mechanism. In: Ninth Annual International Phoenix Conference on Computers and Communications, pp. 822 - 826. Scottsdale, AZ, USA, 21-23 March (1990) 6. Wim, B.K., Wiegerinck, W., Akay, E., Neijt, J., Van Beek, A.: Promedas: A clinical diagnostic decision support system. In: Proceedings of the 15th Belgian-Dutch Conference on Artificial Intelligence (BNAIC), Netherlands (2003), http://www.kiranreddys. com/articles/clinicaldiagnosissupportsystems.pdf 7. Matsumoto, T., Shimada, Y., Kawaji, S.: Clinical diagnosis support system based on symptoms and remarks by neural networks. In: 2004 I.E. Conference on Cybernetics and Intelligent Systems, pp. 1304 - 1307. Singapore (2004) 8. Alexandrou, D. and Pardalis, K. V.: SEMantic PATHways: Modeling, Executing, and Monitoring Intra-Organizational Healthcare Business Processes towards Personalized Treatment”, In: e-Health Technologies and Improving Patient Safety: Exploring Organizational Factors, by Moumtzoglou A. and Kastania, A. N. edition, IGI Global Publisher, pp. 98-123, (2013). 9. Hu, Z. Li, J.S., Zhou, T.S., Yu, H.Y.: Ontology-Based Clinical Pathways with Semantic Rules, In: Journal of Medical Systems , 36:4, pp.2203–2212, DOI: 10.1007/s10916-011-9687-0, (2012). 10. Alexandrou, D., Xenikoudakis, F., Mentzas, G., SEMPATH: Semantic Adaptive and Personalized Clinical Pathways. In: eTELEMED '09, International Conference on eHealth, Telemedicine, and Social Medicine, pp. 36 – 41. Cancun, Mexico 1-7 Feb. (2009) 11. Alexandrou, D., Xenikoudakis, F., and Mentzas, G., Adaptive Clinical Pathways with Semantic Web Rules. In: IEEE International Conference on Health Informatics, Madeira, Portugal 28–31(January), 2008. 12. Chabalier, J., Dameron, O., Burgun, A.: Integrating and querying disease and pathway ontologies: building an OWL model and using RDFS queries. In: The 10th Annual Bio-Ontologies Meeting, CoLocated with ISMB/ECCB 2007, Austria, July (2007), http://www. ea3888.univ-rennes1.fr/lim/doc_184.pdf 13. 13. Ye, Y., Jiang, Z., Yang, D., Du, G.: A semantics-based clinical pathway workflow and variance management framework. In: IEEE Intl. Conf. on Service Operations and Logistics, and Informatics, pp. 758-763. Beijing, China, IEEE Press (2008) 14. Chen, H., Colaert, D., De Roo, J.: Towards Adaptable Clinical Pathway Using Semantic Web Technology. In: W3C Workshop Semantic Web for Life Science, Position Paper, July (2004), http:// www.w3.org/2004/07/swls-cfp.html

79, Page 18 of 18 15. Chen, H.: Semantic Web in Adaptable Health Care Protocols and Pathways Group Charter - A proposal. February (2006), http://www. w3.org/2001/sw/hcls/task_forces/AHPP_Task_Force.doc 16. Popescu, M., Khalilia, M.: Improving disease prediction using ICD-9 ontological features. In: 2011 I.E. International Conference on Fuzzy Systems, pp.1805-1809. Taipei, Taiwan, 27-30 June (2011) 17. Ye, Y., Jiang, Z., Diao, X., Yang, D., and Du, G., An ontology-based hierarchical semantic modeling approach to clinical pathway workflows. In: Computers in Biology and Medicine 39(8):722–732, 2009. 18. Lin, Y. C.: Development of an Ontology-based Flexible Clinical Pathway System. In: WSEAS Transactions on Information Science and Applications, 6 (12), (2009), http://www.wseas.us/e-library/ transactions/information/2009/32-900.pdf 19. Hu, Z., Li, J. S., Zhou, T. S., Yu, H. Y., Suzuki, M., and Araki, K., Ontology-Based Clinical Pathways with Semantic Rules. J Med Syst, Springe, 2011. 20. Huang, Z., Lu, X., Duang, H.: Using Recommendation to Support Adaptive Clinical Pathways, In: Journal of Medical Systems, 36:3, pp.1849–1860, DOI: 10.1007/s10916-010-9644-3, (2012) 21. Martínez-Pérez, B., and Torre-Díez, I., Mobile Clinical Decision Support Systems and Applications: A Literature and Commercial Review. In: Journal of Medical Systems 38:4, 2014. doi:10.1007/ s10916-013-0004-y. 22. Hurley, K.F., Abidi, S.S.R.: Ontology Engineering to Model Clinical Pathways: Towards the Computerization and Execution of Clinical Pathways, In: Proc of the 20th IEEE Int. Symposium on Computerbased Medical Systems, pp. 536-541, 20-22 June 2007, Maribor, Slovenia, (2007) 23. Shen, C.P., Jigjidsuren, C., Dorjgochoo, S: A Data-Mining Framework for Transnational Healthcare System, In: Journal of Medical Systems, 36:4, pp.2565–2575, DOI 10.1007/s10916-0119729-7, (2012) 24. Jalali,V., Borujerdi, M.R.M.: A Unified Architecture for Biomedical Search Engines Based on Semantic Web Technologies, In: Journal of Medical Systems, 35:2, pp.237–249, DOI: 10.1007/s10916-0099360-z, (2011) 25. Scheuermann, R. H., Werner, C., Smith, B.: Toward an Ontological Treatment of Disease and Diagnosis. In: Proceedings of the 2009 AMIA Summit on Translational Bioinformatics, pp. 116-120. San Francisco, CA, (2009), http://ontology.buffalo.edu/medo/Disease_ and_Diagnosis.pdf 26. Hamm, R., Knoop, S.E., Schwarz, P., Block, A.D., Davis, W.L.: Harmonizing clinical terminologies: driving interoperability in

J Med Syst (2014) 38:79

27.

28.

29.

30.

31.

32.

33.

34. 35. 36.

37.

38.

39.

healthcare. In: Loinic Stud Health Technol Inform Report No. 129: 660-3, (2007), http://loinc.org/articles/Hamm2007a Liu, R. L., Tung, S. Y., and Lu, Y. L., Identifying Disease Diagnosis Factors by Proximity-Based Mining of Medical Texts, In Intelligent Information and Database Systems. Lecture Notes in Computer Science (LNCS) 6592:171–180, 2011. Mohammed, O., and Benlamri, R., Building a Diseases Symptoms Ontology for Medical Diagnosis. In: An Integrative Approach, IEEE International Conference on Future Generation Communication Technology, British Computer Society, London, 2012. Mohammed, O.: “Semantic Web System for Differential Diagnosis Recommendations”. MSc Thesis, Supervised By: Prof. Dr. Rachid Benlamri, Lakehead University, (2012) Hughes, T.C., Ashpole, B.C.: The Semantics of Ontology Alignment. In: I3CO Information Interpretation and Integration Conference, (2004), http://www.atl.lmco.com/projects/ontology/ papers/SOA.pdf Zhang, S., Bodenreider, O.: Alignment of multiple ontologies of anatomy: deriving indirect mappings from direct mappings to a reference. In: Proceedings for AMIA Annual Symposium, Washington, DC (2005), http://www.lhncbc.nlm.nih.gov/lhc/docs/ published/2005/pub2005039.pdf Doan, A.H. et al: Learning to match ontologies on the Semantic Web. In: The VLDB Journal — The International Journal on Very Large Data Bases archive, vol. 12, issue 4. November (2003) Homola, M., Serafini, L.: Towards Formal Comparison of Ontology Linking, Mapping and Importing. In: Proceeding of 23rd Int. Workshop on Description Logics (DL2010), CEUR-WS 573, Waterloo, Canada (2010), http://www.cs.uwaterloo.ca/conferences/ dl2010/papers/paper_26.pdf http://flash.lakeheadu.ca/~omohamme/DSO1.owl http://flash.lakeheadu.ca/~omohamme/PatientOntology.owl Victoria Dept. of Health: Pathways for Prediabetes, Type 1, Type 2 and Gestational Diabetes. Department of Health - Loddon Mallee Region, Victoria, Australia (2009) National Institute for Health and Clinical Excellence (NHS): Hypertension: management of hypertension in adults in primary care, August (2004) Goodnough et al: Detection, Evaluation, and Management of Anemia in the Elective Surgical Patient. In: The International Anesthesia Research Society, (2005) Famili, A., Ouyang, J.: Data Mining: Understanding Data and Disease Modeling. In: Proceedings of IASTED-AI-03 Conference. Innsbruck, Austria, February 10-13 (2003)

Developing a semantic web model for medical differential diagnosis recommendation.

In this paper we describe a novel model for differential diagnosis designed to make recommendations by utilizing semantic web technologies. The model ...
5MB Sizes 0 Downloads 6 Views