538
Original Articles
Biomedical Informatics: We Are What We Publish P. L. Elkin1; S. H. Brown2,3; G. Wright4 1Department
of Biomedical Informatics, University at Buffalo, Buffalo, NY, USA; of Veterans Affairs, Washington, DC, USA; 3Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA; 4Faculty of Health Sciences, Walter Sisulu University, Mthatha, South Africa 2Department
Keywords Medical informatics, bioinformatics, publication record, expert consensus, ontology
Summary Introduction: This article is part of a ForDiscussion-Section of Methods of Information in Medicine on “Biomedical Informatics: We are what we publish“. It is introduced by an editorial and followed by a commentary paper with invited comments. In subsequent issues the discussion may continue through letters to the editor. Objective: Informatics experts have attempted to define the field via consensus projects which has led to consensus statements by both AMIA. and by IMIA. We add to the output of this process the results of a study of the Pubmed publications with abstracts from the field of Biomedical Informatics. Methods: We took the terms from the AMIA consensus document and the terms from the IMIA definitions of the field of Biomedical Informatics and combined them through human review to create the Health Informatics Ontology. We built a terminology server using the Intelligent Natural Language Processor (iNLP). Then we downloaded the entire set of articles in Medline identified by
Correspondence to: Peter L. Elkin, MD, MACP, FACMI, FNYAM Department of Biomedical Informatics University at Buffalo Buffalo, NY USA E-mail:
[email protected] searching the literature by “Medical Informatics” OR “Bioinformatics”. The articles were parsed by the joint AMIA / IMIA terminology and then again using SNOMED CT and for the Bioinformatics they were also parsed using HGNC Ontology. Results: We identified 153,580 articles using “Medical Informatics” and 20,573 articles using “Bioinformatics”. This resulted in 168,298 unique articles and an overlap of 5,855 articles. Of these 62,244 articles (37%) had titles and abstracts that contained at least one concept from the Health Informatics Ontology. SNOMED CT indexing showed that the field interacts with most all clinical fields of medicine. Conclusions: Further defining the field by what we publish can add value to the consensus driven processes that have been the mainstay of the efforts to date. Next steps should be to extract terms from the literature that are uncovered and create class hierarchies and relationships for this content. We should also examine the high occurring of MeSH terms as markers to define Biomedical Informatics. Greater understanding of the Biomedical Informatics Literature has the potential to lead to improved self-awareness for our field.
Methods Inf Med 2013; 52: 538–546 doi: 10.3414/ME13-01-0041 received: April 15, 2013 accepted: August 7, 2013 prepublished: November 19, 2013
1. Introduction Biomedical Informatics is an inherently complex and interdisciplinary field. The interdisciplinary nature of the field accounts for much of the richness of the field. Those of us who have practiced the specialty have come to cherish this diversity. The same diversity makes the field more challenging to define. Each clinical and scientific subset of Informatics professionals has a unique and valid perspective of the specialty. In spite of the challenging nature of this endeavor, AMIA and IMIA have each individually taken steps to define the field. Reinhold Haux et al. defined undergraduate education in Medical Informatics in a landmark article [1]. AMIA published their core competencies (▶ Figure 1) after a set of meetings in 2003 which worked to formally define concepts in the field of Biomedical Informatics. Arie Hasman recognized that Informatics curricula must meet the needs of students aimed at other than pure research careers [2]. In 2002, Musen published that construing work in medical informatics in terms of actions involving ontologies and problem-solving methods may move us closer to a theoretical basis for our field [3]. In 2004, Graham Wright brought together a group of international Informatics experts to develop systematic definitions for the field of Medical and Health Informatics over a four year period, which included an analysis of keywords used in publications (▶ Figure 2 and ▶ Figure 3), however they excluded bioinformatics from this work [4]. In 2009, Schuemie and colleagues found that the Medical Informatics literature could be divided into three subdomains: 1) the organization, application, and evaluation of health information systems, 2) medical
© Schattauer 2013
Methods Inf Med 6/2013 Downloaded from www.methods-online.com on 2013-11-22 | ID: 1000467501 | IP: 160.39.250.193 Note: Uncorrected proof, prepublished online For personal or educational use only. No other uses without permission. All rights reserved.
539
P. L. Elkin et al.: Biomedical Informatics: We Are What We Publish
knowledge representation, and 3) signal and data analysis [5]. In 2011, Reed Gardner published the definitions of the specialty of Clinical Informatics [6] in a consensus statement and in 2012 Kulikowski, Shortliffe et al. published AMIA´s whitepaper defining the field of Biomedical Informatics [7].
Figure 1
Top down approaches to defining Biomedical Informatics have the advantage that they are systematic and involve a broad set of stakeholders. The downside of this expert consensus based approach is that it is not data driven. In this manuscript, we attempt to understand if the published Biomedical Informatics literature
can help us to better understand the breadth of the field of Biomedical Informatics. With the advent of the new sub-specialty of Clinical Informatics [8] approved by the American Board of Medical Specialties (ABMS) and sponsored by the American Board of Preventative Medicine and the
AMIA Consensus Panel Core Competencies for the field of Biomedical Informatics shown in the TMU
Methods Inf Med 6/2013
© Schattauer 2013 Downloaded from www.methods-online.com on 2013-11-22 | ID: 1000467501 | IP: 160.39.250.193 Note: Uncorrected proof, prepublished online For personal or educational use only. No other uses without permission. All rights reserved.
540
P. L. Elkin et al.: Biomedical Informatics: We Are What We Publish
American Board of Pathology, we have seen an increased interest on the part of leaders in our specialty to further define Biomedical Informatics and to specify its core curriculum. The first board examination was administered on October 10, 2013. ABMS approved fellowships which will be available to any physician that holds a current board certification, which will soon be available to U.S. trainees. It is expected that physicians trained as Clinical Informaticians will in some cases serve as the CMIO or work in the
Figure 2
CMIO’s office for healthcare organizations. These groups and individuals provide strategic guidance to the leadership of healthcare organizations regarding current and future developments in Health IT and more broadly in Biomedical Informatics. The iNLP software used in this study has been previously shown to have a very high performance with a recall (sensitivity) of mappings of 99.7%, a precision (positive predictive value) of 99.8% and a specificity of 97.9% [9].
2. Methods The AMIA and IMIA terminologies were analyzed and merged. The AMIA terminology had 287 unique concepts and 315 terms. The IMIA terminology had 333 concepts and 348 terms (▶ Figure 4). They were merged by hand using the Terminology Merge Utility (TMU) an open source utility created in the Laboratory of Biomedical Informatics directed by Dr. Elkin. Dr. Elkin performed the merge finding conceptual equivalence where possible
IMIA Classification of the field of Biomedical Informatics shown in the TMU
© Schattauer 2013
Methods Inf Med 6/2013 Downloaded from www.methods-online.com on 2013-11-22 | ID: 1000467501 | IP: 160.39.250.193 Note: Uncorrected proof, prepublished online For personal or educational use only. No other uses without permission. All rights reserved.
541
P. L. Elkin et al.: Biomedical Informatics: We Are What We Publish
Figure 3
Example as to how the IMIA Classification of Biomedical Informatics is organized
(187 concepts) and creating new hierarchies when the content was unique to one of the contributing classifications. When merged this resulted in a Health Informatics Ontology (HIO) containing 433 concepts and 462 terms (▶ Figure 5). A graphical view of the HIO is shown in ▶Figure 6. The resultant Health Informatics Ontology was built into a terminology server using the intelligent natural language processor (iNLP) [10]. This system was developed at Mount Sinai School of Medicine and is built upon the .Net architecture from Microsoft and is written in C#. The system is terminology and language independent and runs in a four-tier architecture. The system has been used successfully to identify post-operative complications from clinical notes [11] and as a method to improve the United States’s national biosurveillance strategy [12]. In this project, we used the iNLP to build a terminology server capable of parsing free text with the merged AMIA-IMIA Health Informatics Ontology (HIO). Searches of the medical literature were performed using Pubmed from the National Library of Medicine in February of 2006. The search criteria employed was “Medical Informatics OR Bioinformatics” exactly. Then each concept was searched individually. The result sets were downloaded in Medline format with titles and abstracts. Then each of the 168,298 titles and 121,561 abstracts were parsed with the HIO using the iNLP terminology server. The results were stored in a SQLServer database.
A random sample of 27,000 articles with abstracts from the literature was parsed with the Systematized Nomenclature of Medicine – Clinical Terms (SNOMED CT) to determine the breadth of clinical coverage attributable to the Informatics literature. We chose this number as it was approximately the number of records which could be parsed in a three day (24 hr) period. ▶ Figure 7 has an example of a conclusion from an informatics article [12] parsed with SNOMED CT by the iNLP server. The Bioinformatics articles were also parsed with the Human Gene Nomencla-
ture Committee Ontology (HGNC Ontology) to determine the breadth of human gene coverage attributable to the Informatics literature and how well HGNC Ontology represented the content of bioinformatics articles. Descriptive statistics were applied to the resulting dataset of codified Informatics literature.
3. Results The query “Medical Informatics” in Pubmed returned 153,580 articles and the
Figure 4 Terminology Merge Utility showing the highest level categorizations for both the IMIA and AMIA Representations
Methods Inf Med 6/2013
© Schattauer 2013 Downloaded from www.methods-online.com on 2013-11-22 | ID: 1000467501 | IP: 160.39.250.193 Note: Uncorrected proof, prepublished online For personal or educational use only. No other uses without permission. All rights reserved.
542
P. L. Elkin et al.: Biomedical Informatics: We Are What We Publish
Figure 5 Terminology Merge Utility showing the merged IMIA-AMIA Ontology on the left and the AMIA Ontology on the right
covered by the HIO. When indexed on average an article contained 2.32 HIO codes. Only 251 of the 433 concepts from the HIO were identified in the Informatics literature. The top terms identified in the informatics literature using the HIO terminology server are shown in ▶ Table 2. Of the 20,573 Bioinformatics articles, 14,427 had HGNC Ontology codes. Of the 26,953 human genes only 3,275 were identified in this corpus of Bioinformatics literature. When we combine the HIO and HGNC Ontology we find coverage of 76,671 (45.6%) of the 168,298 unique articles. To evaluate the overlap of Biomedical Informatics with clinical medical subspecialties we parsed 27,000 articles with abstracts and the most frequent areas by body site identified are listed in ▶Table 3. Overall there were 37,141 occurrences of SNOMED CT clinical concepts in these 27,000 titles and abstracts. Body site areas without representation in the Biomedical Informatics literature are listed in ▶Table 4.
4. Discussion
Figure 6 Ontology
Graphical representation of the concepts in the merged AMIA-IMIA Health Informatics
query “Bioinformatics” returned 20,573 articles, or in total 174,153 articles. The query “Medical Informatics OR Bioinformatics” in Pubmed returned 168,298 unique records. Therefore the overlap between the two queries was 5,855 articles were indexed by both concepts or contained both keywords. Of the 168,298 unique articles, 121,561 had abstracts and 14,645 were review ar-
Figure 7
ticles. Of the 153,580 articles indexed as “Medical Informatics”, 12,279 were review articles and of the 20,573 articles indexed as “Bioinformatics”, 3,091 were review articles. The iNLP HIO found at least one concept to be associated with 62,244 of the 168,298 articles (37%). Conversely 63% of the articles titles and abstracts were un-
The Biomedical Informatics literature has important content to contribute to our understanding of the breadth and depth of the field of Biomedical Informatics. The top down approach to modeling our field is useful but incomplete and could be advanced by efforts to better understand our literature. In this study of a large corpus of titles and abstracts, only 37% of articles titles and abstracts were identified using the Health Informatics Ontology. Medical Informatics accounted for many fold more articles than Bioinformatics. SNOMED CT indexing showed that the field of Biomedical Informatics has broad involvement with most clinical specialties. This integration with the larger medical scientific community shows a maturing of the field as a health and biomedical scientific discipline.
The conclusion from an Informatics abstract where the iNLP server codified terms in SNOMED CT, which are depicted in blue
© Schattauer 2013
Methods Inf Med 6/2013 Downloaded from www.methods-online.com on 2013-11-22 | ID: 1000467501 | IP: 160.39.250.193 Note: Uncorrected proof, prepublished online For personal or educational use only. No other uses without permission. All rights reserved.
543
P. L. Elkin et al.: Biomedical Informatics: We Are What We Publish
The IMIA and AMIA consensus driven expert designed classifications are both authoritative. We are not suggesting that these necessarily be combined but we do believe that it is a testament to the hard work and diligence of these two communities that there was enough consistency that we were able to merge the two classifications. As our field matures we are increasingly challenged to define our field both for our own internal uses (e.g. curriculum development) and to the biomedical community at large. It is our hope that our literature can help us to be more complete in our description of the breadth and depth of the content of our specialty.
Table 1 AMIA and IMIA terminologies and the merged Health Informatics Ontology Concepts Nodes
5. Limitations A limitation of these results is that the Biomedical Informatics literature is ever increasing and changing and therefore these results represent only a snapshot of the literature in 2006. There are alternate literature searches that might have been employed. There can be false positive articles retrieved in the results from Pubmed queries, so our numbers are likely not gen-
eralizable. We did not index the full text of articles but only the titles and abstracts. Abstracts are more often structured and more and more often convey the more important aspects of a paper’s substance. The basic conclusion that articles that are indexed as being either with “Medical Informatics” or “Bioinformatics” are not indexable by one of the terms that we have used to define our field is likely correct. We have not yet taken the next step to see what
Table 3 SNOMED CT body sites covered in the Biomedical Informatics literature listed by the number of concepts identified. 123646008 Disorder by body site
Number of occurrences
362965005 Disorder of body system
7185
399902003 Disorder of body cavity
4225
128121009 Disorder of trunk
3881
49601007 Disorder of cardiovascular system 127331007 Neoplasm by body site 49483002 Disorder of mediastinum
2402 1693 1643
IMIA
333
348
AMIA
287
315
118934005 Disorder of head
1524
Merged IMIA-AMI Health Informatics Ontology (HIO)
433
462
118940003 Disorder of nervous system
1387
53619000 Disorder of digestive system 928000 Disorder of musculoskeletal system
Table 2 The most frequent HIO concepts identified in the Informatics literature
1158 979
128598002 Disorder of integument
864
123397009 Injury of anatomical site
823
363170005 Inflammation of specific body structures or tissue
776
363171009 Inflammation of specific body systems
762
HIO Concepts
# of Articles
42030000 Disorder of the genitourinary system
732
Standards
11985
50043002 Disorder of respiratory system
722
Technology
10950
301810000 Infection by site
672
Evaluate IS/IT
10303
128605003 Disorder of extremity
630
Networking
8782
19660004 Disorder of soft tissue
597
Evaluation
5981
362968007 Disorder of reproductive system
539
Simulation
5286
362969004 Disorder of endocrine system
530
Guidelines
2986
129565002 Disorder of skeletal AND/OR smooth muscle
508
Decision support
1977
105969002 Disorder of connective tissue
444
Neural networks
1768
128127008 Visual system disorder
431
Multivariate
1592
Primary care
1431
232208008 Ear, nose and throat disorder
269
Study design
1412
362966006 Disorder of auditory system
209
Security
1379
128983005 Non-human disorder by body site
183
Epidemiology
1060
User interface
1009
79604008 Disorder of breast
33308003 Disorder of back 414030009 Disorder of immune structure
Methods Inf Med 6/2013
317
168 163
© Schattauer 2013 Downloaded from www.methods-online.com on 2013-11-22 | ID: 1000467501 | IP: 160.39.250.193 Note: Uncorrected proof, prepublished online For personal or educational use only. No other uses without permission. All rights reserved.
544
P. L. Elkin et al.: Biomedical Informatics: We Are What We Publish
Table 3
Continued
123646008 Disorder by body site
Number of occurrences
414027002 Disorder of hematopoietic structure
161
118939000 Disorder of neck
146
204223000 Ear, face and neck congenitalanomalies
88
363137000 Hereditary disorder by system
65
362970003 Disorder of hemostatic system
49
253937004 Congenital abnormality of lower limb and pelvic girdle
35
362971004 Disorder of lymphatic system
35
70591005 Fetal disorder
Table 4
34
409709004 Chromosomal disorder
34
399986003 Disorder of body wall
27
128604004 Disorder of product of conception
21
421095001 Allergic disorder by body site affected
16
213146005 Complication of transplanted organ
14
is the content which defines the articles that were not indexed by our merged IMIA-AMIA Ontology, which limits the immediate applicability of our work. The HIO requires more work to create formal definitions for the concepts that AMIA and IMIA determined to define our specialty and we need to be able to connect this ontology to an upper level ontology such as the Basic Formal Ontology [13]. Future research should include a failure analysis. One method would be to identify the MeSH terms in the articles that were not identified and see where commonalities are found across articles. Then hold a series of expert reviews to determine relevance. The authors suggest that further research is needed to develop validated automated methods for populating the Health
SNOMED CT body sites and systems not identified in the 27,000 titles and abstracts
123946008 Disorder by body site 302918009 Disorder of stoma 5300004 Hemoglobin Bart´s hydrops syndrome 19856004 Late effect of burns of eye, face, head AND/OR neck 38425000 Fetus OR newborn affected by entanglement of cord 53035006 Fetus OR newborn affected by maternal death 59289007 Fetal OR intrauterine hypercapnia, not clear if notedbefore OR after onset oflabor in liveborn infant 65522009 Fetus OR newborn affected by torsion of cord 70165003 Fetus OR newborn affected by rupture of marginal sinus 72014004 Abnormal fetal duplication 90108007 Fetal OR intrauterine anoxia AND/OR hypoxia not clear if noted before OR after onset of labor in liveborn infant 206161005 Disorders due to slow fetal growth, low and high birth weight 211616004 Foreign body (FB) in orifice 238147009 Organ dysfunction syndrome 240322003 Fetus affected by placenta insufficiency 241999000 Visceral decompression injury 269304002 Accidental organ perforation during a procedure 275368005 Labor fetal anoxia 15861008 Bovine viral leukosis 17811000 Vesicular exanthema of swine 40367008 Polyagglutinable erythrocyte syndrome 58684001 Intraerythrocytic parasitosis by Entopolypoides 63785008 Friction blister without infection 65939004 Mucosal yaws 70978004 Situs inversus thoracis
© Schattauer 2013
Methods Inf Med 6/2013 Downloaded from www.methods-online.com on 2013-11-22 | ID: 1000467501 | IP: 160.39.250.193 Note: Uncorrected proof, prepublished online For personal or educational use only. No other uses without permission. All rights reserved.
545
P. L. Elkin et al.: Biomedical Informatics: We Are What We Publish
Table 4
Continued
123946008 Disorder by body site 95194004 Nodular lymphoma of extranodal AND/OR solid organ site 95356008 Mucosal ulcer 123583002 Tuberculid 126638003 Neoplasm of intrathoracic organs 128210009 Thoracic outlet syndrome 18898009 Lyphosarkoma 191025004 [X]Additional endocrine, nutritional, metabolic and immunity disease classification terms 240844004 Loa loa lymphadenopathy 254290004 Post-transplant lymphoproliferative disorder 255117000 Metastasis to respiratory and intrathoracic organ 255142009 Carcinoma in situ of respiratory and intrathoracic organ 267997004 Exostosis of unspecified site 269481009 Benign tumor of respiratory and intrathoracic organ 280129003 Disorder of soft tissue of thoracic cavity 283985007 Lymphoreticular injury 283999006 Sensory organ injury 284006002 Injury of thoracic cavity 362967002 Disorder of olfactory system 363296001 Sequelae of disorders classified by disorder system 399993004 Disorder of taste 400072006 Fixed drug reaction 402811003 Squamous neoplasm of surface epithelium 404147001 Follicle centre-cell B-cell lymphoma (modal/systemic with skin involvement)
Informatics Ontology using concepts identified in the Biomedical Informatics literature. This could be facilitated by using the context created by the semantics already contained in the HI Ontology. Once expanded, formal definitions should be added and the classification then tested both for its ability to represent new literature and also it should be subjected to a consensus review by the members of the specialty of Biomedical Informatics. We also see this as a catalyst to continue the debate regarding defining the depth and breadth of the field of Biomedical Informatics including how we should go about expanding the knowledge base of our discipline. We relish this debate and look forward to an ongoing dialog toward synergistically improving our discipline’s self-understanding and self-awareness.
6. Conclusions It is important for individuals in the field to understand the depth and the breadth of the specialty of Biomedical Informatics. These results show that in addition to a consensus driven top-down method of defining our specialty, there is value in looking at the Biomedical Informatics literature to provide broader definition of the concepts needed to define our specialty.
References 1. Haux R, Ammenwerth E, ter Burg WJ, Pilz J, Jaspers MW. An international course on strategic information management for medical informatics students: aim, content, structure, and experiences. Int J Med Inform 2004; 73 (2): 97–100. 2. Hasman A, Haux R. Curricula in medical informatics. Stud Health Technol Inform 2004; 109: 63 –74.
3. Musen MA. Medical informatics: searching for underlying components. Methods Inf Med 2002; 41 (1): 12 –19. 4. Wright G. The Development of the IMIA Knowledge Base. SA Journal of Information Management 2011; 13(1), Art. # 458, 5 pages. doi: 10.4102/sajim.v13i1.458 5. Schuemie MJ, Talmon JL, Moorman PW, Kors JA. Mapping the domain of medical informatics. Methods Inf Med 2009; 48 (1): 76 –83. 6. Gardner RM, Overhage JM, Steen EB, Munger BS, Holmes JH, Williamson JJ, Detmer DE; AMIA Board of Directors. Core content for the subspecialty of clinical informatics. J Am Med Inform Assoc 2009; 16 (2): 153 –157. 7. Kulikowski CA, Shortliffe EH, Currie LM, Elkin PL, Hunter LE, Johnson TR, Kalet IJ, Lenert LA, Musen MA, Ozbolt JG, Smith JW, Tarczy-Hornoch PZ, Williamson JJ. AMIA Board white paper: definition of biomedical informatics and specification of core competencies for graduate education in the discipline. J Am Med Inform Assoc 2012; 19 (6): 931– 938. 8. http://www.amia.org/clinical-informaticsmedical-subspecialty
Methods Inf Med 6/2013
© Schattauer 2013 Downloaded from www.methods-online.com on 2013-11-22 | ID: 1000467501 | IP: 160.39.250.193 Note: Uncorrected proof, prepublished online For personal or educational use only. No other uses without permission. All rights reserved.
546
P. L. Elkin et al.: Biomedical Informatics: We Are What We Publish 9. Elkin PL, Brown SH, Husser C, Bauer BA, Wahner-Roedler D, Rosenbloom ST. An evaluation of the content coverage of SNOMED-CT for clinical problem lists. Mayo Clin Proc 2006; 81 (6): 741–748. 10. Elkin PL, Trusko BE, Koppel R, Speroff T, Mohrer D, Sakji S, Gurewitz I, Tuttle M, Brown SH. Secondary use of clinical data. Stud Health Technol Inform 2010; 155: 14 –29.
11. Murff HJ, FitzHenry F, Matheny ME, Gentry N, Kotter KL, Crimin K, Dittus RS, Rosen AK, Elkin PL, Brown SH, Speroff T. Automated identification of postoperative complications within an electronic medical record using natural language processing. JAMA 2011; 306 (8): 848 – 855. 12. Elkin PL, Froehling DA, Wahner-Roedler DL, Brown SH, Bailey KR. Comparison of natural lan-
guage processing biosurveillance methods for identifying influenza from encounter notes. Ann Intern Med 2012; 156 (1 Pt 1): 11–18. 13. Ceusters W, Elkin P, Smith B. Negative findings in electronic health records and biomedical ontologies: a realist approach. Int J Med Inform 2007; 76 (Suppl 3): S326–33. Epub 2007 Mar 21.
© Schattauer 2013
Methods Inf Med 6/2013 Downloaded from www.methods-online.com on 2013-11-22 | ID: 1000467501 | IP: 160.39.250.193 Note: Uncorrected proof, prepublished online For personal or educational use only. No other uses without permission. All rights reserved.