This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2014.2357025, IEEE Journal of Biomedical and Health Informatics

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT)
REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < Registry developed at the Massachusetts General Hospital. i2b2 aims to enable researchers to use clinical data for knowledge discovery. The architecture of the i2b2 approach is designed as a set of services denominated cells that fit together in an integrated environment called hive. Every cell is defined by their function on the hive, i.e., a file repository cell, an ontology management cell, data repository, etc. The data repository cell is designed as the data warehouse to provide information to users. i2b2 also includes an ontology cell responsible for the management of the vocabulary. caBIG is an open information network deployed in 2003 to share data on cancer research. It is based on Open Grid Services Architectures (OGSA) and OGSA-Data Access Integration (OGSA-DAI) [13]. Developed applications are heavily dependent on the GRID-based infrastructure designed for caBIG, which makes it difficult to reuse applications outside the caBIG environment. The projects described above have obtained valuable results, but there are still additional interoperability standards and semantic web mechanisms that are not being fully exploited. From this context emerged the EU research projects INTEGRATE “Driving excellence in integrative cancer research” (FP7-ICT-2009-6-270253) and EURECA “Enabling information re-Use by linking clinical Research and CAre” (FP7-ICT-2012-6-270253). Both projects aim to facilitate semantic interoperability among applications and tools to achieve data sharing for breast cancer clinical trials. These projects have a strong focus on maintainability and use existing standards oriented to clinical research users. INTEGRATE and EURECA are based on the data source federation following a Common Data Model (CDM) [14]. Data owners maintain control of their own information, avoiding the high complexity and costs of a purely distributed database approach [3]. The CDM defines the schema of data stored on each node. For this reason, there are several alternatives to represent clinical information, such as openEHR [15], Biomedical Research Integrated Domain Group (BRIDG) [16], Clinical Data Interchange Standards Consortium (CDISC) [17] and Health Level 7 Reference Information Model (HL7 RIM) [19]. openEHR is an open standard that provides a common model for management and data storing for health care. The main goal of openEHR is to create open semantic systems that are durable over time and economically viable. BRIDG is a “domain model representing protocol-driven biomedical/clinical research”. It was developed with the collaborative effort of clinical trial experts from CDISC, US National Institute of Health (NIH), Health Level 7 and others. The purpose of BRIDG is to support the development of data interchange standards and technology solutions in the healthcare area. CDISC, a Standard Development Organization (SDO), focuses on clinical research and supports global, platform-independent data standards that enable information system interoperability to improve medical research and related areas of healthcare. HL7 [18] is an international organization focused on the standardization of the clinical and administrative domain

2

providing standards for interoperability on health care. HL7 RIM includes most common healthcare domains and works as a general data model for healthcare and clinical information. HL7 RIM is the core of the HL7 v3 standard, defined as a class diagram aiming to cover healthcare needs. The reference model is composed of three main classes: Entity, Act and Role and relationships between these classes, Act Relationship and Participation. Any clinical information, such as the diagnosis of a patient by a clinician, is represented using these classes and relationships, e.g., two entities, participate in an observation act, playing the role of clinician and patient. Significant effort has been put forth to develop medical vocabularies for the semantic representation of the health domain [19][20]. One of the most relevant vocabularies is SNOMED-CT [21], a clinical vocabulary focused on recording health care encounters and the associated electronic health information exchange. SNOMED-CT is a clinical vocabulary with over of 311,000 active medical concepts, nearly a million descriptions and over a million relationships. SNOMED-CT also provides mechanisms to identify postcoordinated concepts and to add new expressions (e.g., it is possible to use the pre-coordinate concept “Neoplasm of breast (disorder)” or the post-coordinated “Neoplasm (morphologic abnormality)” and “Breast structure (body structure)” with the same meaning). Due to the high generality of SNOMED-CT, more specific medical vocabularies are required to fully cover data from clinical trials, such as Hugo Gene Nomenclature Committee (HGNC) [22] and Logical Observation Identifiers Names and Codes (LOINC) [23]. One of the main issues in this area is the development and maintenance of large amounts of concepts present in medical vocabularies, i.e., new concept vs. post-coordination, overlapping hierarchy (branches), relationships of a new concept, etc. The union of these ontologies is required for a complete medical vocabulary [24] that can be used within the ETL process [25] for data annotation and HL7 v3 message generation. In the following sections, the proposed semantic interoperability approach of the EURECA project, as well as a comparison with other models, is described. III. METHODS The proposed solution to homogenize access among clinical systems is based on widely adopted standards that facilitate uniform access to legacy systems. Figure 1 depicts the interfaces of the proposed semantic interoperability layer with different applications and data hosting (Proprietary and local IM). In order to achieve data normalization, a Common Information Model (CIM) has been defined.

2168-2194 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2014.2357025, IEEE Journal of Biomedical and Health Informatics

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT)
REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < and returns the corresponding SPARQL template. Each template is generated from a basic query and includes the normalized form of the concept. The proposed templates also contain a set of optional filters and attributes to operations, such as value comparison or adding more restrictions to the query, including a target site, method or interpretation code on the original query. Once the final query is produced, the CIM Access Service exploits the semantic knowledge contained in the Core Dataset by expanding the SPARQL query with hierarchical relationships from medical vocabularies. The query abstraction approach proposed in this work follows the same processes of the semantic normalization pipeline: (i) generation of SNOMED-CT Normal form of the concept, (ii) matching of the Normal Form and the CDM following Terminology Binding information and finally (iii) selection of the required query template to retrieve information from the CDM. IV. EVALUATION & RESULTS In this section, we have performed a qualitative and a quantitative evaluation of the proposed solution by comparing the same data source and different semantic solutions. The target data source contains 80 patients based on a postgenomic multi-centric clinical trial (the TOP clinical trial) [30] with 511 observations, including gene expressions information following St. Gallen International Breast Cancer guidelines [31]. Data have been stored and queried from the following: (i) HL7 RIM without normalization (INTEGRATE), (ii) OMOP, (iii) i2b2 and (iv) HL7 RIM-based CDM with semantic normalization pipeline (the proposed solution). HL7 RIM without normalization was implemented without any semantic transformation, while OMOP and i2b2 instances of the common data source were implemented following the guidelines described in each project. The qualitative evaluation includes a set of general characteristics regarding the functional capabilities of each approach: a) Data ambiguity: refers to different representations of the same information in the data model b) Security access: refers to the system’s ability to filter data depending on user permission c) Query languages used: refers to query language used for accessing the data d) Temporal queries: refers to the capabilities of retrieving information with temporal restrictions e) Traceability: refers to the storage of the original information from data sources f) Inferring vocabulary knowledge: refers to the exploitation of hierarchical and synonym information from domain vocabularies g) Query abstraction from data model: refers to the possibility of building queries agnostic to the data model schema h) Multimedia information: refers to the ability of storing related information to other observations i) Genetic information: refers to the ability of storing genetic information in the data model

4

Finally, in the quantitative evaluation, a set of characteristics that can be measured with the same test data has been defined. A performance evaluation was also made with different queries extracted from the Eligibility Criteria (EC) of the TOP trial: j) The presence of redundant data with the same dataset k) Execution time EC queries range from retrieving general information about a patient to previous diagnostics, images related to a diagnosis, treatments of substance administrations and gene expression related to a patient. Queries were built to match EC from the TOP trial on different models and to retrieve information about patients related to genes such as HER2, Ki-67 or PgR:  Inclusion criterion 1: Age of patient =1500mm3  Inclusion criterion 6: Patients with GOT=1.5N  Exclusion criterion 1: Patients with metastatic breast cancer.  Exclusion criterion 2: Patients with previous treatment with anthracyclines. All these queries can be accessed on the bitbucket public repository in SPARQL version for HL7 models and SQL for the OMOP model. A. HL7 without semantic normalization This approach uses the CIM comprised of a CDM and a Core Dataset, as mentioned previously. Regarding the qualitative evaluation, using this model, it is possible to find the same information represented in different places. This depends on the data source and the ETL process. If the original data include normalized and not normalized SNOMED-CT expressions, they will be stored on different HL7 RIM classes. With regard to other characteristics, using this approach, it is possible to add security access at the application level. The HL7 RIM can store the trial provenance of every act for traceability. SPARQL is the query language used, but it is also possible to use SQL, although this will not exploit the semantic capabilities of the approach. It is also possible to make temporal queries because every act has an effective time. A method to infer semantic knowledge of the Core Dataset on the CDM is accessible at the query level, allowing the possibility of using SNOMED-CT synonyms. HL7 RIM provides options to store related images and documents related to other acts (diagnoses or procedures). HL7 RIM also defines that gene information has to be stored as an observation related to a patient, depending on the medical vocabulary used. While flexibility is one of the main important features of HL7 RIM, it is also one of its main disadvantages, as querying a HL7 RIM-based model requires users to be aware of both

2168-2194 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2014.2357025, IEEE Journal of Biomedical and Health Informatics

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < model schema and data representation. Although a query abstraction service can be provided, it does not cover all of the possible results due to lack of data normalization. Regarding quantitative evaluation, it is important to note that every EC is translated as one SPARQL query. This approach uses query expansion to extract Core Dataset knowledge, and SPARQL is translated to SQL. These processes directly impact performance by increasing execution time, as can be observed in Table 1. B. OMOP Following the guidelines facilitated by OMOP, a relational database was created and populated with the dataset from TOP trial and including gene expressions. EC queries were then translated to SQL language to test the OMOP model. Compared to the previous approach, similar results were obtained in different points: data ambiguity (a), security access (b) and temporal queries (d). The main differences appear concerning the medical vocabulary. Using this approach, inference using implicit vocabulary knowledge has to be performed manually on queries. The exploitation of information about synonyms, which is present in vocabulary tables of the OMOP model, should be included manually in the queries. Building SQL queries for retrieving information from the OMOP model requires data model and SQL knowledge. Genetic information can be partially stored on the model for laboratory tests. However, this is only possible if the required vocabulary concepts are present in mappings facilitated by OMOP. Regarding the quantitative evaluation, it is important to note

5

that it produces better results with respect to time, due to queries executed using SQL directly on a relational database [14]. The main drawback of this approach is that it is necessary to manually extend queries with vocabulary tables to use semantic knowledge from the medical domain. In fact, all concepts from medical vocabularies such as SNOMED-CT or LOINC are mapped in these vocabulary tables. C. i2b2 An instance of an i2b2 data repository implemented as a relational database was deployed to compare this with the rest of the approaches. The main difficulty populating the repository is that the i2b2 data model is very generic. Thus, as recommended by i2b2 Clinical Research Chart (CRC) guidelines, the research dataset has been modeled ad-hoc for the current study. This means that a hierarchical model representing the study should be defined inside the i2b2 database. Values of diagnosis and laboratory tests are coded using standard vocabularies such as SNOMED-CT. The defined tree structure only includes string labels (separated by “\” characters), thus most of the semantic information is lost. Additionally, it is important to remark that searches will depend on ad-hoc tree definitions. D. HL7 with semantic normalization and query abstraction This is the approach adopted in the EURECA project. The main differences with the other approaches are: (i) the semantic normalization pipeline for storing normalized data and (ii) the query abstraction process to obtain SPARQL templates for building CDM queries. This approach shares certain advantages with the first HL7

TABLE I EVALUATION OF THE DIFFERENT APPROACHES

Measure

HL7 RIM w/o Normalization

OMOP

i2b2

HL7 RIM with Normalization

a

Yes, it depends on the data sources and ETL process

Yes, it depends on the data sources and ETL process

Yes, it depends on the structure tree defined for displaying data

No, semantic normalization process avoids ambiguous data representation

SQL

Yes, at application level as another hive service SQL

Yes, at application and data level (fine-grained security) SPARQL Yes, effective times are stored in the CDM Yes, it is transformed and the original data are also stored in the same location

b c

Yes, at application and data level (fine-grained security) SPARQL

Yes, at application level

d

Yes, effective times are stored in the model

Yes, dates are stored in the model

Yes, each data has its own associated date

e

Yes, information is not transformed

No, it depends on the ETL process

No, it depends on the ETL for creating the tree

f

Yes, with the query expansion of the Core Dataset

Manually, using a vocabulary table as a dictionary

It is a manual process with no semantic knowledge, only for retrieving the same strings

g

No, it is necessary to know the CDM for building queries

h

Yes, metadata and linked images can be stored as an ‘act’ related to another ‘act’

i

Yes, it depends on the representation vocabulary used

No, it is necessary to modify the model to add this information

Yes, i2b2 model supports clinical data and genomic data

Yes, it depends on the representation vocabulary used

j

Yes (i.e. Breast cancer vs. Infiltrating Duct Carcinoma located on a Breast finding)

No, vocabulary concepts are mapped on a table avoiding this issue

Yes, i2b2 could have redundant data because it depends on the vocabulary tree.

No, the semantic normalization pipeline avoid this issue

k

300-600 ms

20-250 ms

300-650 ms

400-750 ms

No, it is necessary to know the OMOP model for building queries No, there is not a table for storing relations between observations or procedures

No, it is necessary to know the i2b2 model for building queries No, relations available are only those defined in data tree

Yes, with the query expansion of the Core Dataset Yes, the query abstraction process facilitates a SPARQL template for building queries Yes, metadata and linked images can be stored as an ‘act’ related to another ‘act’

2168-2194 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2014.2357025, IEEE Journal of Biomedical and Health Informatics

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < model. Therefore, b), c), d), f), h) and i) have the same results. Regarding data ambiguity (a), there is no ambiguity in this model due to the normalization pipeline transforming and standardizing data storage to avoid different representations of the same information on CDM (e). Regarding the usability for building queries on this model (g), the query abstraction, following the same normalization pipeline, facilitates a template for querying the CDM without any knowledge about the model. Quantitative results are very similar to the first approach. Performance is slightly worse due to the automatic query expansion and the query abstraction mechanism to build queries. In this approach, redundant data are avoided as a result of the normalization process. The semantic interoperability solution described in this paper, namely the CDM RDF dump and the Query Template Library (QTL), are freely available at http://bitbucket.org/sparaiso/semantic-normalization-andquery-abstraction-based-on-snomed.

ACKNOWLEDGMENT The presented work has been partially funded by the European Commission through the INTEGRATE (FP7-ICT2009-6-270253) and the EURECA (FP7-ICT-2012-6-270253) project and also it has been funded by the Ministry of Health of the Spanish Government under the grant no. PI13/02020. References [1]

[2]

[3]

[4]

V. CONCLUSION Interoperability among systems involved in modern clinical trials is still one of the main bottlenecks in information management for clinical research. Most of the tasks are currently performed manually, so the sustainability of studies including –omic information is especially dependent on advanced methods to automate certain procedures. Even more important, however, is to exploit semantic knowledge inferred from clinical vocabularies that has been developed in recent years. The method proposed in this paper tackles such challenges within the environment built within two EU research projects. A semantic normalization pipeline to homogenize the representation of clinical data by using normalization mechanisms from the HL7 interoperability standard and SNOMED-CT vocabulary was developed. An additional query abstraction mechanism to facilitate the information retrieval by applications in this environment has also been created. This mechanism encapsulates the data model and the query syntax, allowing the creation of complex SPARQL queries required by the clinical scenarios. A qualitative and quantitative evaluation has been performed to compare the proposed method with other state of the art implementations such as OMOP or i2b2. Results have shown additional query capabilities that exploit knowledge extracted from biomedical vocabularies and follow the latest semantic technologies and healthcare information management standards. Although slightly worse performance has been found, applications using the proposed approach have produced usable Graphical User Interfaces. There are plans to extend the proposed solution to other areas in oncology and beyond, while there are also research efforts to automate certain tasks in the data annotation and ETL-related tasks. This work describes a valuable effort with the aim of improving and streamlining clinical research execution and translating current –omics related findings into clinical practice.

6

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15] [16]

[17]

McShane, L. M., Cavenagh, M. M., Lively, T. G., Eberhard, D. A., Bigbee, W. L., Williams, P. M., ... & Conley, B. A. (2013). Criteria for the use of omics-based predictors in clinical trials. Nature, 502(7471), 317-320. Hodge Jr, J. G., Gostin, L. O., & Jacobson, P. D. (1999). Legal issues concerning electronic health information. JAMA: the journal of the American Medical Association, 282(15), 1466-1471. Anguita A, Martín L, Pérez-Rey D, Maojo V “A review of Methods and Tools for Database Integration in Biomedicine”. Current Bioinformatics, 2010, 5, 253-269 Georges, D. E. (2008). A Data Protection Framework for Transeuropean genetic research projects. In Collaborative Patient Centered Ehealth: Proceedings of the HIT@ HealthCare 2008 Joint Event: 25th MIC Congress, 3rd International Congress Sixi, Special ISV-NVKVV Event, 8th Belgian EHealth Symposium (Vol. 141, p. 67). IOS Press.C Oeffinger, K. C., Ford, J. S., Moskowitz, C. S., Diller, L. R., Hudson, M. M., Chou, J. F., ... & Robison, L. L. (2009). Breast cancer surveillance practices among women previously treated with chest radiation for a childhood cancer. JAMA: the journal of the American Medical Association, 301(4), 404-414. Pakhomov, S., Weston, S. A., Jacobsen, S. J., Chute, C. G., Meverden, R., & Roger, V. L. (2007). Electronic medical records for clinical research: application to the identification of heart failure. The American journal of managed care, 13(6 Part 1), 281. Hersh, W. R. (2007). Adding value to the electronic health record through secondary use of data for quality assurance, research, and surveillance. Clin Pharmacol Ther, 81, 126-128. Stang, P. E., Ryan, P. B., Racoosin, J. A., Overhage, J. M., Hartzema, A. G., Reich, C., ... & Woodcock, J. (2010). Advancing the science for active surveillance: rationale and design for the Observational Medical Outcomes Partnership. Annals of internal medicine, 153(9), 600-606. Murphy, S. N., Weber, G., Mendis, M., Gainer, V., Chueh, H. C., Churchill, S., & Kohane, I. (2010). Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). Journal of the American Medical Informatics Association, 17(2), 124-130. Oster, S., Langella, S., Hastings, S., Ervin, D., Madduri, R., Phillips, J., ... & Saltz, J. (2008). caGrid 1.0: an enterprise Grid infrastructure for biomedical research. Journal of the American Medical Informatics Association, 15(2), 138-149. Fp7-integrate.eu [homepage on the internet]. Driving excellence in integrative cancer research [updated 31 December 2013; cited 31 December 2013]. Available from: http://www.fp7integrate.eu/index.php/project eurecaproject.eu [homepage on the internet]. Enabling information reUse by linking clinical REsearch and Care [updated 31 December 2013; cited 31 December 2013]. Available from: http://eurecaproject.eu/ Antonioletti, M., Atkinson, M., Baxter, R., Borley, A., Chue Hong, N. P., Collins, B., & Westhead, M. (2005). The design and implementation of Grid database services in OGSA‐DAI. Concurrency and Computation: Practice and Experience, 17(2‐4), 357-376. Moratilla, J. M., Alonso-Calvo, R., Molina-Vaquero, G., ParaisoMedina, S., Perez-Rey, D., & Maojo, V. (2012). A Data Model Based on Semantically Enhanced HL7 RIM for Sharing Patient Data of Breast Cancer Clinical Trials. Studies in health technology and informatics, 192, 971-971. Kalra, D., Beale, T., & Heard, S. (2005). The openEHR foundation. Studies in health technology and informatics, 115, 153-173. Fridsma, D. B., Evans, J., Hastak, S., & Mead, C. N. (2008). The BRIDG project: a technical report. Journal of the American Medical Informatics Association, 15(2), 130-137. Kuchinke, W., Aerts, J., Semler, S. C., & Ohmann, C. (2009). CDISC standard-based electronic archiving of clinical trials. Methods of information in medicine, 48(5), 408.

2168-2194 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2014.2357025, IEEE Journal of Biomedical and Health Informatics

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < [18] Beeler, G. W. (1998). HL7 Version 3—An object-oriented methodology for collaborative standards development. International Journal of Medical Informatics, 48(1), 151-161. [19] Benson, T. (2012). Principles of health interoperability HL7 and SNOMED. Springer. [20] Aso, S., Perez-Rey, D., Alonso-Calvo, R., Rico-Diez, A., Bucur, A., Claerhout, B., & Maojo, V. (2012). Analyzing SNOMED CT and HL7 Terminology Binding for Semantic Interoperability on Post-Genomic Clinical Trials. Studies in health technology and informatics, 192, 980980. [21] Bos, L. (2006). SNOMED-CT: The advanced terminology and coding system for eHealth. Medical And Care Compunetics 3, 121, 279. [22] Seal, R. L., Gordon, S. M., Lush, M. J., Wright, M. W., & Bruford, E. A. (2011). genenames. org: the HGNC resources in 2011. Nucleic acids research, 39(suppl 1), D514-D519. [23] McDonald, C. J., Huff, S. M., Suico, J. G., Hill, G., Leavelle, D., Aller, R., & Maloney, P. (2003). LOINC, a universal standard for identifying laboratory observations: a 5-year update. Clinical chemistry, 49(4), 624633. [24] Paraiso-Medina S, Perez-Rey D, Alonso-Calvo R, Claerhout B, de Schepper K, Hennebert P, Lhaut J, Van Leeuwen J and Bucur A. “Semantic Interoperability Solution for Multicentric Breast Cancer Trials at the Integrate EU Project” In Proceedings of HEALTHINF 2013. (1):34-41. [25] Bucur, A., van Leeuwen, J., Perez-Rey, D., Calvo, R. A., Claerhout, B., & de Schepper, K. (2012, November). Identifying the semantics of eligibility criteria of clinical trials based on relevant medical ontologies. In Bioinformatics & Bioengineering (BIBE), 2012 IEEE 12th International Conference on (pp. 413-421). IEEE. [26] World Health Organization. (1992). The ICD-10 classification of mental and behavioural disorders: clinical descriptions and diagnostic guidelines (Vol. 1). World Health Organization. [27] Sioutos, N., Coronado, S. D., Haber, M. W., Hartel, F. W., Shaiu, W. L., & Wright, L. W. (2007). NCI Thesaurus: a semantic model integrating cancer-related clinical and molecular information. Journal of biomedical informatics, 40(1), 30-43. [28] Noy, N. F., Shah, N. H., Whetzel, P. L., Dai, B., Dorf, M., Griffith, N., & Musen, M. A. (2009). BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic acids research, 37(suppl 2), W170-W173. [29] Cheetham E, H. Dolin R, Markwell D, Curry J, Gabriel D, Hausam R, Knight B, Rector A, Spackman K, Townend I. Using SNOMED CT in HL7 v3 Implementation Guide, Release 1.5. 2008. [30] TOP, Jules Bordet Institute. Topoisomerase II Alpha Gene Amplification and Protein Overexpression Predicting Efficacy of Epirubicin (TOP). Available at: http://clinicaltrials.gov/ct2/show/NCT00162812 [31 December, 2013] [31] Goldhirsch, A., Wood, W. C., Coates, A. S., Gelber, R. D., Thürlimann, B., & Senn, H. J. (2011). Strategies for subtypes—dealing with the diversity of breast cancer: highlights of the St Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2011. Annals of Oncology, 22(8), 1736-1747.

Sergio Paraiso-Medina holds a Masters Degree in Computer Science and a Masters Degree in Artificial Intelligence at Universidad Politécnica of Madrid. He is currently working on his PhD in Computer Science. During the last 3 years, he has been a member of the Biomedical Informatics Group at UPM. His research interests are mainly focused on database integration, semantic technologies and reasoning methods. He has been working on various European projects such as EURECA and INTEGRATE. David Perez-Rey holds a PhD in Computer Science from the UPM. He has been a visiting researcher at Rutgers University (USA) and University of Utah (USA). In 2007, he was a collaborating professor at Universidad Francisco de Vitoria, and since

7

2008, he has worked at the Departamento de Inteligencia Artificial, Facultad de Informatica at UPM, where he is currently an Assistant Professor. For the last 9 years, he has been a member of the Biomedical Informatics Group at UPM, working on several EU research projects. His research interests are mainly focused on semantic interoperability, information retrieval and search engines in biomedicine. He has published research papers in journals such as Bioinformatics, BMC Bioinformatics, Journal of Biomedical Informatics, Methods of Information in Medicine and Computers in Biology and Medicine. Anca Bucur, Philips Research Europe, holds a PhD in Computer Science from Delft University of Technology and a Masters degree from the Technical University of Bucharest. She is currently a senior scientist with Philips Research Europe. She has led several industrial research projects in the healthcare domain related to Clinical Information Systems, medical imaging, and computational genomics, with a main customer being Philips Healthcare. She is the coordinator of the FP7 projects INTEGRATE and EURECA (Enabling information re-Use by linking clinical Research and Care) and leads Philips’ contribution to the FP7 projects p-Medicine (from data sharing and integration via VPH models to personalized medicine) and CHIC (Computational Horizons in Cancer). Brecht Claerhout holds a Masters degree in electronics engineering. He has previously been active in FOSS development as an author of a major network security tool (Sniffit). He has worked at the IMEC (Interuniversity Microelectronics Center) and RAMIT (Research in Advanced Medical Informatics and Telematics) research groups. Brecht is currently leading Custodix, one of the first Trust Service Providers in the world focusing on data protection, and he has been actively involved in a large number of European research projects mainly dealing with the health data integration. Recent projects include EHR4CR, INTEGRATE and EURECA. Brecht has published several conference and journal papers on the subject of security and privacy protection and semantic integration of clinical data. Raul Alonso-Calvo holds a PhD in Computer Science from the UPM. For the last 11 years, he has been a member of the Biomedical Informatics Group at UPM, working on different research projects related to data integration, semantic interoperability and biomedical image processing. He has been a visitor researcher at the bioinformatics group of the Universidade de Aveiro (Portugal). He has also been working on different IT projects in private companies since 2005 as a consultant and currently as an IT manager. Since 2009, he has been a part-time Professor at the DLSIIS (Departamento de Lenguajes y Sistemas e Ingeniería de Software), Facultad de Informatica at UPM.

2168-2194 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Semantic Normalization and Query Abstraction Based on SNOMED-CT and HL7: Supporting Multicentric Clinical Trials.

Advances in the use of omic data and other biomarkers are increasing the number of variables in clinical research. Additional data have stratified the...
771KB Sizes 1 Downloads 4 Views