Locating Relevant Patient Information in Electronic Health Record Data Using Representations of Clinical Concepts and Database Structures Xuequn Pan, PhD1, 2, James J. Cimino, MD1, 2 1 Lister Hill National Center for Biomedical Communications, National Library of Medicine; 2 Laboratory for Informatics Development, NIH Clinical Center; Bethesda, MD Abstract Clinicians and clinical researchers often seek information in electronic health records (EHRs) that are relevant to some concept of interest, such as a disease or finding. The heterogeneous nature of EHRs can complicate retrieval, risking incomplete results. We frame this problem as the presence of two gaps: 1) a gap between clinical concepts and their representations in EHR data and 2) a gap between data representations and their locations within EHR data structures. We bridge these gaps with a knowledge structure that comprises relationships among clinical concepts (including concepts of interest and concepts that may be instantiated in EHR data) and relationships between clinical concepts and the database structures. We make use of available knowledge resources to develop a reproducible, scalable process for creating a knowledge base that can support automated query expansion from a clinical concept to all relevant EHR data. Introduction The retrieval of clinical data from electronic health records (EHRs) for whatever reason, including care of a particular patient or broader inquires to support research, can be thought of as starting with some concept of interest and then expanding to include the concepts, and their representations, that actually appear in EHR data. For example, a clinician looking for evidence that a patient has had a particular disease or a researcher seeking records of all patients with a particular disease will naturally search record documents for codes and text that correspond to the disease in question. However, in order to improve recall, they will likely need to expand their inquiries to include such things as test results or medication orders that imply the presence of the disease. 1 Once the set of concepts has been selected, there still remains the challenge of finding them in the EHR documents. Diagnoses might appear with some unobvious code (e.g., “Diagnosis not Elsewhere Classified”) or synonym. Relevant laboratory results might be easy to find, based on test names, but they might require interpretation of complex results, or they might be buried in a laboratory test comment or even an image of a document obtained from an outside laboratory. We are developing a set of methods to help overcome these challenges by the creation of an ontology that includes representations of concepts of interest, related concepts that appear in EHR data, and concepts corresponding to EHR database structures. Using Lyme disease as an example concept of interest, we demonstrate the construction and use this ontology through four steps: 1) identify terms related to Lyme disease in available knowledge resources, 2) add these concepts and their relationships to a dictionary of coded terms found in EHR data, 3) connect relevant EHR data concepts identified in the previous step with concepts that correspond to EHR data structures, and 4) identify pathways through the ontology that ultimately relate concepts of interest to database structures to support automated, comprehensive data queries. Background Lyme Disease We selected Lyme disease, the most common tickborne infectious disease in the US, as our concept of interest. Lyme disease is diagnosed based on its typical symptoms (e.g., erythema migrans) and detailed medical history of possible exposure to infected ticks. Serologic testing is important for the diagnosis by looking for evidence of antibodies (IgG or IgM) to the causative agent, Borrelia (B.) burgdorferi. To support the diagnosis of Lyme disease, the Centers for Disease Control and Prevention (CDC) recommends a two-step process: a screening test that uses sensitive enzyme immunoassay (EIA)/enzyme-linked immunosorbent assay (ELISA) or indirect immunofluorescence assay (IFA); after a positive result, a specific immunoblot (Western blot) test should be conducted. 2, 3 The diagnosis is considered confirmed if there is a positive result on both tests.

969

The Systematized Nomenclature of Medicine – Clinical Terms (SNOMED-CT) SNOMED-CT is a clinical terminology that represents a wide variety of clinical information that appears in EHRs. SNOMED-CT is composed of concepts, descriptions, and relationships. Each concept contains a unique fully specified name (FSN), a preferred term (PT), and synonyms that are additional terms and phrases about the concept. SNOMED-CT contains 344,000 concepts and a rich set of hierarchical or non-hierarchical relationships. Nonhierarchical semantic relationships connect concepts to relevant concepts from other domains and semantic categories to represent definitional information about the concept. 4 The Biomedical Translational Research Information System (BTRIS) BTRIS is a clinical research data repository at the National Institutes of Health (NIH) that collects data from over 35 NIH sources with 477,000 human subjects since 1976. BTRIS's major data sets include patient demographics, research study enrollment information, laboratory results, medication orders and administration records, and notes and reports from EHR systems. 5 All data in BTRIS are coded with the NIH’s Research Entities Dictionary (RED), a terminology resource that comprises the controlled terminologies used by systems that supply data to BTRIS. Each source term is associated with a specific RED concept; knowledge about the terms is represented as literal-valued properties or through semantic relationships between concepts, including hierarchical relationships organized in a directed acyclic graph. The RED includes concepts that correspond to terms actually found in BTRIS (“data concepts”), database structure concepts that correspond to database structure (“database concepts”), and general biomedical concepts that are not actually present in EHR data (“knowledge concepts”). Methods The ultimate goal of the project is to facilitate the matching of patient data in EHRs to a user’s concepts of interest. For example, if a clinician is interested in finding evidence that a particular patient has Lyme disease, or a researcher is interested in identifying all records of patients with Lyme disease, comprehensive retrieval would at least need to examine problem lists, clinician notes and diagnostic test results. We therefore developed a method to expand the RED to include the knowledge necessary to support queries of these disparate data sources in BTRIS, based on an initial concept of interest. For this initial work, we focused on the knowledge related to Lyme disease. General Knowledge Relating Concepts of Interest to Clinical Data We reviewed clinical information about Lyme disease from authoritative sources and identified relevant key terms. This approach applied general knowledge to identify ways in which terms interrelate, such as “diseases are caused by organisms”, “organisms produce antigens”, “antibodies are produced in response to antigens”, “laboratory tests detect organisms and substances”, etc. Identifying Specific Concept Relationships in Knowledge Resources We examined the ways in which SNOMED CT and the RED represent general knowledge about diseases and their related concepts. We then explored and identified specific concepts and relationships in knowledge resources, such as Lyme disease and its associated terms. We generated a list of concepts based on parent-child relationships among the retrieved concepts. We then expanded our list to include concepts from other domains related through nonhierarchical relationships, such as “Borrelia (organism) is the causative agent of Lyme disease (disease)”. Adding the Concepts and their Relationships to RED After review by domain experts, we added concepts and relationships discovered in the previous step into the RED through the usual maintenance process. We selected the concepts to be added and linked the concept of interest to EHR data concepts through their associations defined. Linking EHR Data Concepts to BTRIS Database Structures Finally, we linked EHR data concepts to database concepts that correspond to the BTRIS database structures, based on the BTRIS data.5 For example, EHR data concept, Lyme disease (RED Code C114654) is linked to the specific database concept (C2179248) that represents the Observation_Value_Text column in the Observation_General table where diagnosis information was stored.

970

Results General Knowledge Relating Concepts of Interest to Clinical Data We reviewed clinical information about Lyme disease from CDC and NIH Web sites. The terms identified included “Lyme disease”, “Borrrelia burgdorferi”, “erythema migrans”, and various antigens of and antibodies to B. burgdorferi, as well as terms for polymerase chain reaction (PCR) tests for B. burgdorferi DNA, serologic tests, antibody screens, and Western blot to measure those antibodies. Based on these findings, we therefore generated a general knowledge structure (Figure 1) in which the concept of interest is connected to terms from clinical domains of disease, organism, substances, and laboratory tests.

Figure 1. General knowledge structure for Lyme disease Identifying Specific Concept Relationships in Knowledge Resources In SNOMED CT, we retrieved a total of 64 SNOMED CT concepts related to Lyme disease including terms for 27 procedures, 21 substances, disorders, and 1 organism. The three non-hierarchical relationships we found are shown in Table 1. Figure 2 depicts for examples of concepts related to Lyme disease in SNOMED CT. Table 1. Non-hierarchical relationships between Lyme disease and other terms in SNOMED CT Non-hierarchical Relationship

Examples

disorder|causative agent|organism

Lyme disease|Caustive Agent|Borrelia

procedure|component|substance

Measurement of Borrelia burgdorferi antibody | Component | Borrelia burgdorferi antibody Measurement of Borrelia burgdorferi 29 kDa antibody | Component | Borrelia burgdorferi 29 kDa antibody Borrelia burgdorferi band pattern dectected|Component| Borrelia burgdorferi antibody Borrelia burgdorferi DNA Assary|Component| Borrelia burgdorferi DNA

finding|interpret|procedure

Lyme ELISA equivocal|Interpret|Lyme ELISA test

971

Figure 2. Examples of concepts related to Lyme disease in SNOMED CT. Numbers in brackets are SNOMED CT identifiers. We retrieved one diagnosis concept from the RED (Lyme disease), 101 laboratory procedure concepts (tests and panels), and one organism concept (Borrelia burgdorferi). A total of 57 test and panel concepts are actually used to code EHR data. RED relationships found in these concepts included “procedure has component”, “diagnostic procedure has targeted infectious organism”, and “procedure has measured gene product”. Figure 3 illustrates the example of the Lyme Disease Cerebrospinal Fluid (CSF) Antibody Panel (C1163227) and its two components: an antibody screen test (C116326) and a confirmation panel (C119063). The confirmation panel, in turn, includes Western blot IgG and IgM tests. The Lyme Disease CSF Antibody Panel targeted the bacterium B. burgdorferi and measured the antibodies.

Figure 3. Examples of concepts related to Lyme disease in the RED. Numbers in brackets are RED codes.

972

Adding Concepts and Relationships to the RED The RED substance concepts were expanded to include B. burgdorferi gene products, antigens, antibodies, IgG and IgM, and IgG and IgM to specific antigens. We did not add additional laboratory terms because the RED laboratory terms already cover the available EHR data. Terms for findings were not included in the current stage of the project. Instances of the “Substance measured” relationship were added between the substance and laboratory test terms. Figure 4 illustrates examples of added concepts and links. A “has causative agent” relationship was created and an instance of that relationship was added between Lyme disease and Borrelia burgdorferi.

Figure 4. Concepts and relationships added to RED to represent EHR database concepts and relate them to EHR data concepts. Linking EHR Data Concepts to BTRIS Database Structures We related domains of EHR data concepts to specific database concept as follows. Diagnostic terms were related to the Observation_General table. Laboratory test order terms (panels) were related to the Event_Measurable table. The terms that were both tests and orders (basically, panels containing only one test) were related to both Event_Measurable and Observation_Measurable tables. We defined the database column associated with laboratory test results (not shown in Figure 4). A specific organism’s RED code appears in the Observation_Value_CONCEPT column, and its name appears in the Observation_Name. Table rows associated with laboratory tests of particular interest included those for which the RED concepts were “Specimen” and “Culture Comment” in the Observation_Name column. In BTRIS, laboratory finding values are in Observation_Value_Text or Observation_Value_Numeric columns, depending on the format of the value. When a RED concept was available (such as for B. burgdorferi), the code was recorded in the Observation_Value_Concept column. Additional textual information is stored in Observation_Note and Observation_Value_Name columns. Identify Pathways that ultimately Relate concepts of interest to Data Structure The general knowledge structure supports the semantic expansion of the concept of interest to EHR data concepts that connected with database concepts. The expansion allows us to follow the concept of interest to its associated concepts from different data domains to identify patients. In the following example, we demonstrate how starting from Lyme disease, to generate SQL queries and retrieve patients who had diagnostic (confirmatory) laboratory tests. We looked up Lyme disease and related EHR data concepts which led to B. burgdorferi, the causative

973

organism, and then B. burgdorferi Antibody IgG and IgM, the measureable substances (step 1). We set concepts association value as where the concepts are these two substances and the association is substance measured (step 2). We used this list of B. burgdorferi Western blot tests to obtain patient data from BTRIS (step 3). Step 2: To obtain the list of concepts representing tests measured B. burgdorferi Antibody IgG and IgM “SELECT Distinct [Concept_ID] , [Database_Concept_Column], [Database_Concept_Table] FROM [BTRIS].[RED_Knowledge_Structure] WHERE Substance_Measured in (' B. burgdorferi Antibody IgG ',' B. burgdorferi Antibody IgM') The list of tests includes C92569, C93238, C93253, C96230, C93416, C96770, C96713, C119061, C119062, and they are associated with the Observation_Measurable table, and the Observation_Name_CONCEPT column. Step 3: To identify patients with laboratory results of B. burgdorferi Antibody Western blot IgM and IgG tests “SELECT * FROM [BTRIS].[Observation_Measurable][Database_Concept_Table] WHERE Observation_Name_CONCEPT [Database_Concept_Column] in () Discussion In this study, we added knowledge of an external clinical terminology and relational database structures to a repository terminology for semantically enhanced EHR data retrieval. Although we only focused on Lyme disease; our specific approach can be applied to any infectious diseases (looking for organisms that cause disease) and this general approach of identifying concept relationships should be applicable to any domain. The knowledge sources used, SNOMED CT and RED may currently be incomplete, but they are expandable and growing. We also consider other knowledge sources such as LOINC or co-associations of terms in PubMed citations 6, 7. Our approach for knowledge extraction is based on established and validated methods in ontology development. For efficient query evaluation, we can measure patient data retrieval performance. We need to work with human raters to construct a gold standard from our EHR data sets. We can then test the retrieval results based on queries generated against the gold standard to calculate recall and precision. The actual validation work is beyond the scope of this paper. All the work was done manually. Automated steps can be considered in the process of knowledge extraction from knowledge resources and mapping database structure concepts. Since the locations of concepts in tables and columns have been specified, SQL statements can be generated programmatically 8. Currently our knowledge model does not handle the assessment of results (for example, to determine if a test result is positive or negative). Additional steps in the SQL query can assist with this or results could be processed through some other means if this is desirable. A richer knowledge model can open more fields in the EHR systems and support dynamic SQL for users. This study only focuses on coded EHR data. The queries can be expanded to text searches. For example, some lab tests' names are too general without specific clinical meaning, such as “Other Laboratory Test” so that a further linking process is needed to connect text data stored in the results or notes. The names of all related concepts and their synonyms are the key terms to identify patient information from textual clinical data. Natural language processing (NLP) techniques can be applied and help the connection of the concept of interest to relevant clinical data hidden in the text. This knowledge model can be applied to other EHR system. We consider the use of HL7 RIM as the intermediary, with the FHIR mapping 9 as the key to connect our knowledge model with local coding systems used in other system. Through mappings, the knowledge model can be replicable, and relevant concepts related to concepts of interest will be connected to their local EHR data concepts. The work left is to add and link database structure concepts of their own databases. Conclusion We constructed a simple general knowledge structure and used it to relate a concept of interest, Lyme disease, to EHR data concepts and their database concepts with specified values of identifiers (table and column) corresponding to BTRIS data structures. This method could be applied to any EHR database in which coded data are stored in specific structures based on their classification.

974

Acknowledgements This research was supported by the Intramural Research Program of the National Institutes of Health (NIH), National Library of Medicine (NLM) and Lister Hill National Center for Biomedical Communications (LHNCBC). This research was also supported in part by an appointment to the NLM Research Participation Program, administered by the Oak Ridge Institute for Science and Education (ORISE) through an interagency agreement between the US Department of Energy (DoE) and the NLM. Disclaimer The views and opinions of the authors expressed herein do not necessarily state or reflect those of the National Library of Medicine, National Institutes of Health or the US Department of Health and Human Services. Competing Interests None References 1. 2. 3. 4. 5. 6. 7. 8. 9.

Pathak J, Kho AN, Denny JC.Electronic health records-driven phenotyping: challenges, recent advances, and perspectives. J Am Med Inform Assoc. 2013 Dec; 20(e2):e206-11. The Centers for Diseases and Prevention. Lyme disease. Available from: http://www.cdc.gov/lyme/ The National Institutes of Health. Understanding Lyme disease. Available from: http://www.niaid.nih.gov/topics/lymeDisease/Pages/lymeDisease.as International Health Terminology Standards Development Organisation. SNOMED CT® User Guide July 2013 International Release. Available from: http://ihtsdo.org/fileadmin/user_upload/doc/download/doc_UserGuide_Current-en-US_INT_20130731.pdf Cimino JJ, Ayres EJ. The clinical research data repository of the US National Institutes of Health. Stud Health Technol Inform. 2010;160(Pt 2):1299-303. Vreeman DJ, McDonald CJ, Huff SM. LOINC® - a universal catalog of individual clinical observations and uniform representation of enumerated collections. Int J Funct Inform Personal Med. 2010;3(4):273-291. Rindflesch TC, Kilicoglu H, Fiszman M, Rosemblat G, Shin D. Semantic MEDLINE: An advanced information management application for biomedicine. Information Services and Use, 2011, 31(1): 15-21. Post AR, Kurc T, Cholleti S, Gao J, Lin X, Bornstein W, Cantrell D, Levine D, Hohmann S, Saltz JH. The analytic information warehouse (AIW): a platform for analytics using electronic health record data.J Biomed Inform. 2013 Jun;46(3):410-24. FHIR: Fast healthcare interoperability resources. Available from: http://hl7.org/implement/standards/fhir/

975

Locating relevant patient information in electronic health record data using representations of clinical concepts and database structures.

Clinicians and clinical researchers often seek information in electronic health records (EHRs) that are relevant to some concept of interest, such as ...
471KB Sizes 1 Downloads 7 Views