538

Original Articles

Biomedical Informatics: We Are What We Publish P. L. Elkin1; S. H. Brown2,3; G. Wright4 1Department

of Biomedical Informatics, University at Buffalo, Buffalo, NY, USA; of Veterans Affairs, Washington, DC, USA; 3Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA; 4Faculty of Health Sciences, Walter Sisulu University, Mthatha, South Africa 2Department

Keywords Medical informatics, bioinformatics, publication record, expert consensus, ontology

Summary Introduction: This article is part of a ForDiscussion-Section of Methods of Information in Medicine on “Biomedical Informatics: We are what we publish“. It is introduced by an editorial and followed by a commentary paper with invited comments. In subsequent issues the discussion may continue through letters to the editor. Objective: Informatics experts have attempted to define the field via consensus projects which has led to consensus statements by both AMIA. and by IMIA. We add to the output of this process the results of a study of the Pubmed publications with abstracts from the field of Biomedical Informatics. Methods: We took the terms from the AMIA consensus document and the terms from the IMIA definitions of the field of Biomedical Informatics and combined them through human review to create the Health Informatics Ontology. We built a terminology server using the Intelligent Natural Language Processor (iNLP). Then we downloaded the entire set of articles in Medline identified by

Correspondence to: Peter L. Elkin, MD, MACP, FACMI, FNYAM Department of Biomedical Informatics University at Buffalo Buffalo, NY USA E-mail: [email protected]

searching the literature by “Medical Informatics” OR “Bioinformatics”. The articles were parsed by the joint AMIA / IMIA terminology and then again using SNOMED CT and for the Bioinformatics they were also parsed using HGNC Ontology. Results: We identified 153,580 articles using “Medical Informatics” and 20,573 articles using “Bioinformatics”. This resulted in 168,298 unique articles and an overlap of 5,855 articles. Of these 62,244 articles (37%) had titles and abstracts that contained at least one concept from the Health Informatics Ontology. SNOMED CT indexing showed that the field interacts with most all clinical fields of medicine. Conclusions: Further defining the field by what we publish can add value to the consensus driven processes that have been the mainstay of the efforts to date. Next steps should be to extract terms from the literature that are uncovered and create class hierarchies and relationships for this content. We should also examine the high occurring of MeSH terms as markers to define Biomedical Informatics. Greater understanding of the Biomedical Informatics Literature has the potential to lead to improved self-awareness for our field.

Methods Inf Med 2013; 52: 538–546 doi: 10.3414/ME13-01-0041 received: April 15, 2013 accepted: August 7, 2013 prepublished: November 19, 2013

1. Introduction Biomedical Informatics is an inherently complex and interdisciplinary field. The interdisciplinary nature of the field accounts for much of the richness of the field. Those of us who have practiced the specialty have come to cherish this diversity. The same diversity makes the field more challenging to define. Each clinical and scientific subset of Informatics professionals has a unique and valid perspective of the specialty. In spite of the challenging nature of this endeavor, AMIA and IMIA have each individually taken steps to define the field. Reinhold Haux et al. defined undergraduate education in Medical Informatics in a landmark article [1]. AMIA published their core competencies (▶ Figure 1) after a set of meetings in 2003 which worked to formally define concepts in the field of Biomedical Informatics. Arie Hasman recognized that Informatics curricula must meet the needs of students aimed at other than pure research careers [2]. In 2002, Musen published that construing work in medical informatics in terms of actions involving ontologies and problem-solving methods may move us closer to a theoretical basis for our field [3]. In 2004, Graham Wright brought together a group of international Informatics experts to develop systematic definitions for the field of Medical and Health Informatics over a four year period, which included an analysis of keywords used in publications (▶ Figure 2 and ▶ Figure 3), however they excluded bioinformatics from this work [4]. In 2009, Schuemie and colleagues found that the Medical Informatics literature could be divided into three subdomains: 1) the organization, application, and evaluation of health information systems, 2) medical

© Schattauer 2013

Methods Inf Med 6/2013 Downloaded from www.methods-online.com on 2013-11-22 | ID: 1000467501 | IP: 160.39.250.193 Note: Uncorrected proof, prepublished online For personal or educational use only. No other uses without permission. All rights reserved.

539

P. L. Elkin et al.: Biomedical Informatics: We Are What We Publish

knowledge representation, and 3) signal and data analysis [5]. In 2011, Reed Gardner published the definitions of the specialty of Clinical Informatics [6] in a consensus statement and in 2012 Kulikowski, Shortliffe et al. published AMIA´s whitepaper defining the field of Biomedical Informatics [7].

Figure 1

Top down approaches to defining Biomedical Informatics have the advantage that they are systematic and involve a broad set of stakeholders. The downside of this expert consensus based approach is that it is not data driven. In this manuscript, we attempt to understand if the published Biomedical Informatics literature

can help us to better understand the breadth of the field of Biomedical Informatics. With the advent of the new sub-specialty of Clinical Informatics [8] approved by the American Board of Medical Specialties (ABMS) and sponsored by the American Board of Preventative Medicine and the

AMIA Consensus Panel Core Competencies for the field of Biomedical Informatics shown in the TMU

Methods Inf Med 6/2013

© Schattauer 2013 Downloaded from www.methods-online.com on 2013-11-22 | ID: 1000467501 | IP: 160.39.250.193 Note: Uncorrected proof, prepublished online For personal or educational use only. No other uses without permission. All rights reserved.

540

P. L. Elkin et al.: Biomedical Informatics: We Are What We Publish

American Board of Pathology, we have seen an increased interest on the part of leaders in our specialty to further define Biomedical Informatics and to specify its core curriculum. The first board examination was administered on October 10, 2013. ABMS approved fellowships which will be available to any physician that holds a current board certification, which will soon be available to U.S. trainees. It is expected that physicians trained as Clinical Informaticians will in some cases serve as the CMIO or work in the

Figure 2

CMIO’s office for healthcare organizations. These groups and individuals provide strategic guidance to the leadership of healthcare organizations regarding current and future developments in Health IT and more broadly in Biomedical Informatics. The iNLP software used in this study has been previously shown to have a very high performance with a recall (sensitivity) of mappings of 99.7%, a precision (positive predictive value) of 99.8% and a specificity of 97.9% [9].

2. Methods The AMIA and IMIA terminologies were analyzed and merged. The AMIA terminology had 287 unique concepts and 315 terms. The IMIA terminology had 333 concepts and 348 terms (▶ Figure 4). They were merged by hand using the Terminology Merge Utility (TMU) an open source utility created in the Laboratory of Biomedical Informatics directed by Dr. Elkin. Dr. Elkin performed the merge finding conceptual equivalence where possible

IMIA Classification of the field of Biomedical Informatics shown in the TMU

© Schattauer 2013

Methods Inf Med 6/2013 Downloaded from www.methods-online.com on 2013-11-22 | ID: 1000467501 | IP: 160.39.250.193 Note: Uncorrected proof, prepublished online For personal or educational use only. No other uses without permission. All rights reserved.

541

P. L. Elkin et al.: Biomedical Informatics: We Are What We Publish

Figure 3

Example as to how the IMIA Classification of Biomedical Informatics is organized

(187 concepts) and creating new hierarchies when the content was unique to one of the contributing classifications. When merged this resulted in a Health Informatics Ontology (HIO) containing 433 concepts and 462 terms (▶ Figure 5). A graphical view of the HIO is shown in ▶Figure 6. The resultant Health Informatics Ontology was built into a terminology server using the intelligent natural language processor (iNLP) [10]. This system was developed at Mount Sinai School of Medicine and is built upon the .Net architecture from Microsoft and is written in C#. The system is terminology and language independent and runs in a four-tier architecture. The system has been used successfully to identify post-operative complications from clinical notes [11] and as a method to improve the United States’s national biosurveillance strategy [12]. In this project, we used the iNLP to build a terminology server capable of parsing free text with the merged AMIA-IMIA Health Informatics Ontology (HIO). Searches of the medical literature were performed using Pubmed from the National Library of Medicine in February of 2006. The search criteria employed was “Medical Informatics OR Bioinformatics” exactly. Then each concept was searched individually. The result sets were downloaded in Medline format with titles and abstracts. Then each of the 168,298 titles and 121,561 abstracts were parsed with the HIO using the iNLP terminology server. The results were stored in a SQLServer database.

A random sample of 27,000 articles with abstracts from the literature was parsed with the Systematized Nomenclature of Medicine – Clinical Terms (SNOMED CT) to determine the breadth of clinical coverage attributable to the Informatics literature. We chose this number as it was approximately the number of records which could be parsed in a three day (24 hr) period. ▶ Figure 7 has an example of a conclusion from an informatics article [12] parsed with SNOMED CT by the iNLP server. The Bioinformatics articles were also parsed with the Human Gene Nomencla-

ture Committee Ontology (HGNC Ontology) to determine the breadth of human gene coverage attributable to the Informatics literature and how well HGNC Ontology represented the content of bioinformatics articles. Descriptive statistics were applied to the resulting dataset of codified Informatics literature.

3. Results The query “Medical Informatics” in Pubmed returned 153,580 articles and the

Figure 4 Terminology Merge Utility showing the highest level categorizations for both the IMIA and AMIA Representations

Methods Inf Med 6/2013

© Schattauer 2013 Downloaded from www.methods-online.com on 2013-11-22 | ID: 1000467501 | IP: 160.39.250.193 Note: Uncorrected proof, prepublished online For personal or educational use only. No other uses without permission. All rights reserved.

542

P. L. Elkin et al.: Biomedical Informatics: We Are What We Publish

Figure 5 Terminology Merge Utility showing the merged IMIA-AMIA Ontology on the left and the AMIA Ontology on the right

covered by the HIO. When indexed on average an article contained 2.32 HIO codes. Only 251 of the 433 concepts from the HIO were identified in the Informatics literature. The top terms identified in the informatics literature using the HIO terminology server are shown in ▶ Table 2. Of the 20,573 Bioinformatics articles, 14,427 had HGNC Ontology codes. Of the 26,953 human genes only 3,275 were identified in this corpus of Bioinformatics literature. When we combine the HIO and HGNC Ontology we find coverage of 76,671 (45.6%) of the 168,298 unique articles. To evaluate the overlap of Biomedical Informatics with clinical medical subspecialties we parsed 27,000 articles with abstracts and the most frequent areas by body site identified are listed in ▶Table 3. Overall there were 37,141 occurrences of SNOMED CT clinical concepts in these 27,000 titles and abstracts. Body site areas without representation in the Biomedical Informatics literature are listed in ▶Table 4.

4. Discussion

Figure 6 Ontology

Graphical representation of the concepts in the merged AMIA-IMIA Health Informatics

query “Bioinformatics” returned 20,573 articles, or in total 174,153 articles. The query “Medical Informatics OR Bioinformatics” in Pubmed returned 168,298 unique records. Therefore the overlap between the two queries was 5,855 articles were indexed by both concepts or contained both keywords. Of the 168,298 unique articles, 121,561 had abstracts and 14,645 were review ar-

Figure 7

ticles. Of the 153,580 articles indexed as “Medical Informatics”, 12,279 were review articles and of the 20,573 articles indexed as “Bioinformatics”, 3,091 were review articles. The iNLP HIO found at least one concept to be associated with 62,244 of the 168,298 articles (37%). Conversely 63% of the articles titles and abstracts were un-

The Biomedical Informatics literature has important content to contribute to our understanding of the breadth and depth of the field of Biomedical Informatics. The top down approach to modeling our field is useful but incomplete and could be advanced by efforts to better understand our literature. In this study of a large corpus of titles and abstracts, only 37% of articles titles and abstracts were identified using the Health Informatics Ontology. Medical Informatics accounted for many fold more articles than Bioinformatics. SNOMED CT indexing showed that the field of Biomedical Informatics has broad involvement with most clinical specialties. This integration with the larger medical scientific community shows a maturing of the field as a health and biomedical scientific discipline.

The conclusion from an Informatics abstract where the iNLP server codified terms in SNOMED CT, which are depicted in blue

© Schattauer 2013

Methods Inf Med 6/2013 Downloaded from www.methods-online.com on 2013-11-22 | ID: 1000467501 | IP: 160.39.250.193 Note: Uncorrected proof, prepublished online For personal or educational use only. No other uses without permission. All rights reserved.

543

P. L. Elkin et al.: Biomedical Informatics: We Are What We Publish

The IMIA and AMIA consensus driven expert designed classifications are both authoritative. We are not suggesting that these necessarily be combined but we do believe that it is a testament to the hard work and diligence of these two communities that there was enough consistency that we were able to merge the two classifications. As our field matures we are increasingly challenged to define our field both for our own internal uses (e.g. curriculum development) and to the biomedical community at large. It is our hope that our literature can help us to be more complete in our description of the breadth and depth of the content of our specialty.

Table 1 AMIA and IMIA terminologies and the merged Health Informatics Ontology Concepts Nodes

5. Limitations A limitation of these results is that the Biomedical Informatics literature is ever increasing and changing and therefore these results represent only a snapshot of the literature in 2006. There are alternate literature searches that might have been employed. There can be false positive articles retrieved in the results from Pubmed queries, so our numbers are likely not gen-

eralizable. We did not index the full text of articles but only the titles and abstracts. Abstracts are more often structured and more and more often convey the more important aspects of a paper’s substance. The basic conclusion that articles that are indexed as being either with “Medical Informatics” or “Bioinformatics” are not indexable by one of the terms that we have used to define our field is likely correct. We have not yet taken the next step to see what

Table 3 SNOMED CT body sites covered in the Biomedical Informatics literature listed by the number of concepts identified. 123646008 Disorder by body site

Number of occurrences

362965005 Disorder of body system

7185

399902003 Disorder of body cavity

4225

128121009 Disorder of trunk

3881

49601007 Disorder of cardiovascular system 127331007 Neoplasm by body site 49483002 Disorder of mediastinum

2402 1693 1643

IMIA

333

348

AMIA

287

315

118934005 Disorder of head

1524

Merged IMIA-AMI Health Informatics Ontology (HIO)

433

462

118940003 Disorder of nervous system

1387

53619000 Disorder of digestive system 928000 Disorder of musculoskeletal system

Table 2 The most frequent HIO concepts identified in the Informatics literature

1158 979

128598002 Disorder of integument

864

123397009 Injury of anatomical site

823

363170005 Inflammation of specific body structures or tissue

776

363171009 Inflammation of specific body systems

762

HIO Concepts

# of Articles

42030000 Disorder of the genitourinary system

732

Standards

11985

50043002 Disorder of respiratory system

722

Technology

10950

301810000 Infection by site

672

Evaluate IS/IT

10303

128605003 Disorder of extremity

630

Networking

8782

19660004 Disorder of soft tissue

597

Evaluation

5981

362968007 Disorder of reproductive system

539

Simulation

5286

362969004 Disorder of endocrine system

530

Guidelines

2986

129565002 Disorder of skeletal AND/OR smooth muscle

508

Decision support

1977

105969002 Disorder of connective tissue

444

Neural networks

1768

128127008 Visual system disorder

431

Multivariate

1592

Primary care

1431

232208008 Ear, nose and throat disorder

269

Study design

1412

362966006 Disorder of auditory system

209

Security

1379

128983005 Non-human disorder by body site

183

Epidemiology

1060

User interface

1009

79604008 Disorder of breast

33308003 Disorder of back 414030009 Disorder of immune structure

Methods Inf Med 6/2013

317

168 163

© Schattauer 2013 Downloaded from www.methods-online.com on 2013-11-22 | ID: 1000467501 | IP: 160.39.250.193 Note: Uncorrected proof, prepublished online For personal or educational use only. No other uses without permission. All rights reserved.

544

P. L. Elkin et al.: Biomedical Informatics: We Are What We Publish

Table 3

Continued

123646008 Disorder by body site

Number of occurrences

414027002 Disorder of hematopoietic structure

161

118939000 Disorder of neck

146

204223000 Ear, face and neck congenitalanomalies

88

363137000 Hereditary disorder by system

65

362970003 Disorder of hemostatic system

49

253937004 Congenital abnormality of lower limb and pelvic girdle

35

362971004 Disorder of lymphatic system

35

70591005 Fetal disorder

Table 4

34

409709004 Chromosomal disorder

34

399986003 Disorder of body wall

27

128604004 Disorder of product of conception

21

421095001 Allergic disorder by body site affected

16

213146005 Complication of transplanted organ

14

is the content which defines the articles that were not indexed by our merged IMIA-AMIA Ontology, which limits the immediate applicability of our work. The HIO requires more work to create formal definitions for the concepts that AMIA and IMIA determined to define our specialty and we need to be able to connect this ontology to an upper level ontology such as the Basic Formal Ontology [13]. Future research should include a failure analysis. One method would be to identify the MeSH terms in the articles that were not identified and see where commonalities are found across articles. Then hold a series of expert reviews to determine relevance. The authors suggest that further research is needed to develop validated automated methods for populating the Health

SNOMED CT body sites and systems not identified in the 27,000 titles and abstracts

123946008 Disorder by body site 302918009 Disorder of stoma 5300004 Hemoglobin Bart´s hydrops syndrome 19856004 Late effect of burns of eye, face, head AND/OR neck 38425000 Fetus OR newborn affected by entanglement of cord 53035006 Fetus OR newborn affected by maternal death 59289007 Fetal OR intrauterine hypercapnia, not clear if notedbefore OR after onset oflabor in liveborn infant 65522009 Fetus OR newborn affected by torsion of cord 70165003 Fetus OR newborn affected by rupture of marginal sinus 72014004 Abnormal fetal duplication 90108007 Fetal OR intrauterine anoxia AND/OR hypoxia not clear if noted before OR after onset of labor in liveborn infant 206161005 Disorders due to slow fetal growth, low and high birth weight 211616004 Foreign body (FB) in orifice 238147009 Organ dysfunction syndrome 240322003 Fetus affected by placenta insufficiency 241999000 Visceral decompression injury 269304002 Accidental organ perforation during a procedure 275368005 Labor fetal anoxia 15861008 Bovine viral leukosis 17811000 Vesicular exanthema of swine 40367008 Polyagglutinable erythrocyte syndrome 58684001 Intraerythrocytic parasitosis by Entopolypoides 63785008 Friction blister without infection 65939004 Mucosal yaws 70978004 Situs inversus thoracis

© Schattauer 2013

Methods Inf Med 6/2013 Downloaded from www.methods-online.com on 2013-11-22 | ID: 1000467501 | IP: 160.39.250.193 Note: Uncorrected proof, prepublished online For personal or educational use only. No other uses without permission. All rights reserved.

545

P. L. Elkin et al.: Biomedical Informatics: We Are What We Publish

Table 4

Continued

123946008 Disorder by body site 95194004 Nodular lymphoma of extranodal AND/OR solid organ site 95356008 Mucosal ulcer 123583002 Tuberculid 126638003 Neoplasm of intrathoracic organs 128210009 Thoracic outlet syndrome 18898009 Lyphosarkoma 191025004 [X]Additional endocrine, nutritional, metabolic and immunity disease classification terms 240844004 Loa loa lymphadenopathy 254290004 Post-transplant lymphoproliferative disorder 255117000 Metastasis to respiratory and intrathoracic organ 255142009 Carcinoma in situ of respiratory and intrathoracic organ 267997004 Exostosis of unspecified site 269481009 Benign tumor of respiratory and intrathoracic organ 280129003 Disorder of soft tissue of thoracic cavity 283985007 Lymphoreticular injury 283999006 Sensory organ injury 284006002 Injury of thoracic cavity 362967002 Disorder of olfactory system 363296001 Sequelae of disorders classified by disorder system 399993004 Disorder of taste 400072006 Fixed drug reaction 402811003 Squamous neoplasm of surface epithelium 404147001 Follicle centre-cell B-cell lymphoma (modal/systemic with skin involvement)

Informatics Ontology using concepts identified in the Biomedical Informatics literature. This could be facilitated by using the context created by the semantics already contained in the HI Ontology. Once expanded, formal definitions should be added and the classification then tested both for its ability to represent new literature and also it should be subjected to a consensus review by the members of the specialty of Biomedical Informatics. We also see this as a catalyst to continue the debate regarding defining the depth and breadth of the field of Biomedical Informatics including how we should go about expanding the knowledge base of our discipline. We relish this debate and look forward to an ongoing dialog toward synergistically improving our discipline’s self-understanding and self-awareness.

6. Conclusions It is important for individuals in the field to understand the depth and the breadth of the specialty of Biomedical Informatics. These results show that in addition to a consensus driven top-down method of defining our specialty, there is value in looking at the Biomedical Informatics literature to provide broader definition of the concepts needed to define our specialty.

References 1. Haux R, Ammenwerth E, ter Burg WJ, Pilz J, Jaspers MW. An international course on strategic information management for medical informatics students: aim, content, structure, and experiences. Int J Med Inform 2004; 73 (2): 97–100. 2. Hasman A, Haux R. Curricula in medical informatics. Stud Health Technol Inform 2004; 109: 63 –74.

3. Musen MA. Medical informatics: searching for underlying components. Methods Inf Med 2002; 41 (1): 12 –19. 4. Wright G. The Development of the IMIA Knowledge Base. SA Journal of Information Management 2011; 13(1), Art. # 458, 5 pages. doi: 10.4102/sajim.v13i1.458 5. Schuemie MJ, Talmon JL, Moorman PW, Kors JA. Mapping the domain of medical informatics. Methods Inf Med 2009; 48 (1): 76 –83. 6. Gardner RM, Overhage JM, Steen EB, Munger BS, Holmes JH, Williamson JJ, Detmer DE; AMIA Board of Directors. Core content for the subspecialty of clinical informatics. J Am Med Inform Assoc 2009; 16 (2): 153 –157. 7. Kulikowski CA, Shortliffe EH, Currie LM, Elkin PL, Hunter LE, Johnson TR, Kalet IJ, Lenert LA, Musen MA, Ozbolt JG, Smith JW, Tarczy-Hornoch PZ, Williamson JJ. AMIA Board white paper: definition of biomedical informatics and specification of core competencies for graduate education in the discipline. J Am Med Inform Assoc 2012; 19 (6): 931– 938. 8. http://www.amia.org/clinical-informaticsmedical-subspecialty

Methods Inf Med 6/2013

© Schattauer 2013 Downloaded from www.methods-online.com on 2013-11-22 | ID: 1000467501 | IP: 160.39.250.193 Note: Uncorrected proof, prepublished online For personal or educational use only. No other uses without permission. All rights reserved.

546

P. L. Elkin et al.: Biomedical Informatics: We Are What We Publish 9. Elkin PL, Brown SH, Husser C, Bauer BA, Wahner-Roedler D, Rosenbloom ST. An evaluation of the content coverage of SNOMED-CT for clinical problem lists. Mayo Clin Proc 2006; 81 (6): 741–748. 10. Elkin PL, Trusko BE, Koppel R, Speroff T, Mohrer D, Sakji S, Gurewitz I, Tuttle M, Brown SH. Secondary use of clinical data. Stud Health Technol Inform 2010; 155: 14 –29.

11. Murff HJ, FitzHenry F, Matheny ME, Gentry N, Kotter KL, Crimin K, Dittus RS, Rosen AK, Elkin PL, Brown SH, Speroff T. Automated identification of postoperative complications within an electronic medical record using natural language processing. JAMA 2011; 306 (8): 848 – 855. 12. Elkin PL, Froehling DA, Wahner-Roedler DL, Brown SH, Bailey KR. Comparison of natural lan-

guage processing biosurveillance methods for identifying influenza from encounter notes. Ann Intern Med 2012; 156 (1 Pt 1): 11–18. 13. Ceusters W, Elkin P, Smith B. Negative findings in electronic health records and biomedical ontologies: a realist approach. Int J Med Inform 2007; 76 (Suppl 3): S326–33. Epub 2007 Mar 21.

© Schattauer 2013

Methods Inf Med 6/2013 Downloaded from www.methods-online.com on 2013-11-22 | ID: 1000467501 | IP: 160.39.250.193 Note: Uncorrected proof, prepublished online For personal or educational use only. No other uses without permission. All rights reserved.

Biomedical informatics: we are what we publish.

This article is part of a For-Discussion-Section of Methods of Information in Medicine on "Biomedical Informatics: We are what we publish". It is intr...
2MB Sizes 0 Downloads 0 Views