HHS Public Access Author manuscript Author Manuscript

Stud Health Technol Inform. Author manuscript; available in PMC 2016 June 16. Published in final edited form as: Stud Health Technol Inform. 2015 ; 216: 1107.

Real-time Data Fusion Platforms: The Need of Multi-dimensional Data-driven Research in Biomedical Informatics Satyajeet Rajea, Bobbie Kiteb, Jay Ramanathana, and Philip Payneb aDepartment

of Computer Science and Engineering, The Ohio State University, Columbus, OH,

USA

Author Manuscript

bDepartment

of Biomedical Informatics, The Ohio State University, Columbus, OH, USA

Abstract Systems designed to expedite data preprocessing tasks such as data discovery, interpretation, and integration that are required before data analysis drastically impact the pace of biomedical informatics research. Current commercial interactive and real-time data integration tools are designed for large-scale business analytics requirements. In this paper we identify the need for end-to-end data fusion platforms from the researcher's perspective, supporting ad-hoc data interpretation and integration.

Keywords

Author Manuscript

Research informatics; knowledge bases; data curation

Introduction Data-driven research and analytics is at the core of clinical and translational informatics [1]. Data analysis involves several preprocessing activities such as data exploration, collection, cleaning, curation, interpretation, etc. These activities are a necessary prerequisite to perform analyses and answer research questions, and these activities are time-consuming in current research evnviornments. They can be broadly classified into the steps of data discovery, interpretation, and integration. We identify the need for a data fusion platform that will aid researchers in this time-consuming process.

Author Manuscript

We define a data fusion platform as a system that provides automated, end-to-end, realtime and ad-hoc functionality across the data preprocessing tasks of data discovery, interpretation and integration. As shown in Figure 1, such a platform will serve as a middlelayer between researchers and crowd-sourced data collections promoting faster discovery and interpretation of datasets. It would allow researchers to publish their datasets by leveraging reusable annotations to standard reference ontologies and vocabularies.

This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License. Address for correspondence: Satyajeet Raje, MS, PhD candidate, Depatment of Biomedical Informatics, The Ohio State University, Columbus, OH, USA, [email protected].

Raje et al.

Page 2

Author Manuscript

Methods To determine the plausibility of this notion, we conducted an informal interview with researchers, statisticians, and business analysts. To further understand the exact requirements of the researchers and assess the capabilities of the current set of tools to satisfy them, we conducted semi-structured interviews containing 8 questions with 4 biomedical researchers with varying levels of experience working, primarily, in different areas of research. In addition we also interviewed 2 business data analysts and 2 biostatisticians.

Results

Author Manuscript

The interviews revealed major differences in the process and philosophy of the individual researchers and the large-scale business analytics. About two-thirds of data driven research in biomedical informatics relies on tabular data. Currently, researchers have limited access to automated tools to help in the data preprocessing and usually perform all tasks manually.

Discussion Informal interviews of researchers, statisticians, and business analysts confirmed that interactive data fusion platforms could serve to bolster the efficiency of multi-dimensional research. Even though existing tools and methods perform individual data preprocessing tasks well, none of the approaches are able to provide an end-to-end, interactive data fusion platform capable of performing in real-time. We argue that a data fusion platform with such capabilities is crucial for the effective research in the modern data environment.

Author Manuscript

The next step is to formalize and conduct interviews with a larger, more diverse set. The primary focus of ongoing research is to formally articulate the specifics of such systems and the frameworks that would be needed to design and implement them.

References 1. Embi PJ, Payne PR. Clinical research informatics: challenges, opportunities and definition for an emerging domain. J Am Med Inform Assoc. 2009; 163:316–27. [PubMed: 19261934]

Author Manuscript Stud Health Technol Inform. Author manuscript; available in PMC 2016 June 16.

Raje et al.

Page 3

Author Manuscript Author Manuscript

Figure 1. Role of Data Fusion Platforms in research

Author Manuscript Author Manuscript Stud Health Technol Inform. Author manuscript; available in PMC 2016 June 16.

Real-time Data Fusion Platforms: The Need of Multi-dimensional Data-driven Research in Biomedical Informatics.

Systems designed to expedite data preprocessing tasks such as data discovery, interpretation, and integration that are required before data analysis d...
90KB Sizes 0 Downloads 13 Views