This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 1

A Cloud-based Approach for Interoperable EHRs Arshdeep Bahga, and Vijay K. Madisetti, Fellow, IEEE Abstract—We present a cloud-based approach for the design of interoperable Electronic Health Record (EHR) systems. Cloud computing environments provide several benefits to all the stakeholders in the healthcare ecosystem (patients, providers, payers, etc.). Lack of data interoperability standards and solutions has been a major obstacle in the exchange of healthcare data between different stakeholders. We propose an EHR system - Cloud Health Information Systems Technology Architecture (CHISTAR), that achieves semantic interoperability through the use of a generic design methodology which uses a reference model that defines a general purpose set of data structures and an archetype model that defines the clinical data attributes. CHISTAR application components are designed using the Cloud Component Model approach that comprises of loosely coupled components that communicate asynchronously. In this paper we describe the high level design of CHISTAR and the approaches for semantic interoperability, data integration and security. Index Terms—Electronic Health Records, cloud EHR, healthcare, data integration

F

1

I NTRODUCTION

Healthcare ecosystem consists of the healthcare providers (doctors, physicians, specialists, etc.), payers (health insurance companies), pharmaceutical companies, IT solutions and services firms, and the patients. The process of provisioning healthcare involves massive healthcare data which exists in different forms (structured or unstructured) on disparate data sources (such as relational databases, file servers, etc.) and in different formats. When a patient is admitted to a hospital, his/her information is entered into Electronic Health Record (EHR) systems. Physicians diagnose the patient and the diagnostic information (from medical devices such as CT scanners, MRI scanners, etc) is stored in EHR systems. In the diagnosis process, the doctors retrieve the health information of patients and analyze it to diagnose the illness. Doctors can take expert advice by sharing the information with consulting specialists. Figure 1 shows how cloud computing environments can be applied to the healthcare ecosystem [1]. The cloud can provide several benefits to all the stakeholders in the healthcare ecosystem through systems such as Health Information Management System (HIMS), Laboratory Information System (LIS), Radiology Information System (RIS), Pharmacy Information System (PIS), etc. With public cloud based EHR systems hospitals dont need to spend a significant portion of their budgets on IT infrastructure. Public cloud service providers provide on-demand provisioning of hardware resources with pay-per-use pricing models. Thus hospitals using public cloud based EHR systems can

save on upfront capital investments in hardware and data center infrastructure and pay only for the operational expenses of the cloud resources used. Hospitals can access patient data stored in the cloud and share the data with other hospitals. Patients can provide access to their health history and information stored in the cloud (using SaaS applications) to hospitals so that the admissions, care and discharge processes can be streamlined. Physicians can upload diagnosis reports (such as pathology reports) to the cloud so that they can be accessed by doctors remotely for diagnosing the illness. Patients can manage their prescriptions and associated information such as dosage, amount and frequency, and provide this information to their healthcare provider. Health payers can increase the effectiveness of their care management programs by providing value added services and giving access to health information to members. The Veterans Health Information Systems and Technology Architecture (VistA) [2] is the largest single medical system in the United States that caters to a quarter of the nation’s population. VistA is not a single application, but a collection of about 168 application packages/modules. Traditional EHR systems such as VistA are based on client-server architectures as shown in figure 2. The VistA front end comprises of applications such as ADT, Pharmacy, Radiology, etc. The applications communicate with the server through a Remote Procedure Calls (RPC) Broker. VistA server comprises of RPC Broker, Kernel/Tools (such as TaskMan, Package Manager, etc.), FileMan (which provides APIs and utility functions) and Database (which is composed of FileMan which is a MUMPS-based database management system). Each application module generates at-least one global data file which contains clinical,

A. Bahga and V. Madisetti are with the Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, 30332. E-mail: [email protected], [email protected] Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 2

Fig. 1. Application of cloud computing environments to the healthcare ecosystem.

administrative and computer infrastructure related information. The underlying technology of most of VistA’s applications is MUMPS which is both a procedural programming language and a hierarchical or multidimensional keyvalue database. OpenVistA [3] brings technological and functional enhancements to VistA. Figure 3 shows the technology stacks for VistA, OpenVistA [4] and our proposed EHR system - Cloud Health Information Systems Technology Architecture (CHISTAR). OpenVistA employs a client/server architecture in which the client accesses services over a network provided by the server. OpenVistA gives users freedom of choice with regard to application server, operating system and hardware. OpenVistA server uses Java-based OpenVistA Interface Domain (OVID) layer [5] which provides a set of development tools designed to enable software developers easier access to OpenVistA data. With the use of OVID, OpenVistA preserves the legacy VistA MUMPS code at both the server and application server levels while enabling technological advancements. CHISTAR transforms the OpenVistA technology stack to bring the benefits of cloud technologies. CHISTAR enables interoperability of EHR data through a cloud-based data integration engine and semantic interoperability through the use of archetype models. Table 1 shows a comparison between VA VistA and CHISTAR. The rest of the paper is organized as follows. The related work is described in section II, motivation for a cloud-based EHR system is described in Section III, details of CHISTAR architecture are described in section IV and the benefits of CHISTAR are described

Fig. 2. VA VistA architecture [2].

in section V.

2

R ELATED W ORK

VistA [2] is the most widely used EHR system in the United States. There are three main distributions of VistA outside VA - WorldVista [6] (GPL licensed), OpenVistA [3] (AGPL licensed) and vxVista [7] (EPL licensed). OpenEHR [8] is an EHR system that is designed for achieving semantic interoperability. OpenEHR puts special emphasis on semantic interoperability to improve the quality of data exchanged between different stakeholders in the healthcare ecosystem. OpenEHR is based on a two-level modeling approach in which a reference model constitutes the first level of modeling, while the formal definitions of clinical content in the form of archetypes and templates constitute the second level.

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 3

Fig. 3. Technology stacks of VA VistA, OpenVistA and the proposed CHISTAR system. VA VistA and OpenVistA are client-server systems, whereas CHISTAR is a cloud based system.

To enable interoperability of healthcare data, various solutions have been developed that allow integrating data from different sources. Mirth Connect [9] is an open source integration engine that supports a variety of messaging standards and protocols for connecting to external systems and databases. FM Projection [10] is a set of tools that allows inspecting VistA File Manager (FileMan) data and structures using SQL like representations. In our previous work [11] we proposed a data collection framework for collecting big sensor data in a cloud. For CHISTAR we propose a similar approach for data collection that is based on a cloud-based distributed batch processing infrastructure. Since EHR systems handle massive healthcare data, benchmarking the performance of such systems is important to ensure the effectiveness of such systems in provisioning healthcare. For testing cloud-based systems such as EHRs we proposed an approach for prototyping and benchmarking cloud-based systems in our previous work [12], [13]. A similar approach will be used for evaluating the performance of CHISTAR. CHISTAR uses the Cloud Component Model approach for application design described in our previous work [14]. For design of mobile applications that can utilize the capabilities of the next generation of cellular networks, CHISTAR adopts the guidelines described by Radio et. al. [15].

3

M OTIVATION

In this section, we describe the motivation for a cloudbased EHR. 3.1 Design Methodologies Traditional EHR systems have been built using the design methodologies described as follows [16]: 1) U nstructured Approach : This approach consists of unstructured data. 2) Big M odel Approach : This approach consists of structured data. A separate table is maintained for each clinical concept leading to a large number of tables. 3) Generic Approach : This approach allows a wide variety of data to be stored in generic data structures. A constraint mechanism is used to ensure that the stored data is valid in terms of the clinical domain. VistA [2] follows the big model approach where each application module generates at-least one global data file that is stored in the MUMPS database. OpenEHR [8] and CHISTAR follow the generic approach where a reference model defines the general purpose set of data structures and the archetype model defines the clinical data attributes. 3.2 Data Interoperability Data integration and interoperability are major challenges faced by traditional EHR systems. Traditional EHR systems use different and often conflicting technical and semantic standards which leads to data

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 4

Fig. 4. Architecture of proposed CHISTAR system.

integration and interoperability problems. Traditional EHR systems are based on different EHR standards, different languages and different technology generations. The consequence is that EHR systems are fragmented and unable to exchange data. Acquiring medical data from different sources requires a high grade of data interoperability. Most medical information systems store clinical information about patients in proprietary formats. Interoperability of EHR systems will contribute to more effective and efficient patient care by facilitating the retrieval and processing of clinical information about a patient from different sites. Transferring patient information automatically between care sites will speed delivery and reduce duplicate testing and prescribing. 3.3

Loose Coupling

CHISTAR adopts the Cloud Component Model approach described in our previous work [14] and software engineering best practices such as loose coupling between various application components, statelessness, etc. In the Cloud Component Model approach, instead of hard-wired links, the components interface through clearly defined functional and service boundaries. Links between the components are established and broken as they respond to service requests. Loose coupling of components relies on use of API interfaces and web services interfaces. Common communication protocols such as REST [11] allow components developed in different programming languages to communicate with each other. 3.4

Scalability and Performance

Traditional EHR systems are built on a client-server model with dedicated hosting that involves a server

which is installed within the organization’s network and multiple clients that access the server. Data is stored on the server and can be accessed within the organization’s network by authorized clients. Scaling up such systems requires additional hardware. Cloud computing is a hosting abstraction in which the underlying computing infrastructure is provisioned on demand and can be scaled up or down based on the workload. Public cloud-based applications run on cloud infrastructure which is managed by the cloud service provider. Scaling up cloud applications is easier as compared to client-server applications. For cloud-based applications, additional computing resources can be provisioned on-demand when the application workload increases. Cloud offers linear scalability without any changes in the application software.

4 P ROPOSED C LOUD - BASED EHR S YSTEM Figure 4 shows the layered architecture of the proposed CHISTAR system. The infrastructure services layer consists of the cloud instances (for load balancers, application servers, Hadoop master and slave nodes, etc.) on which CHISTAR is deployed. The information services layer consists of a data integration engine (that allows integrating data from multiple disparate data sources into the cloud), models for data storage and clinical concepts, and the data governance module. The application services layer provides various services such as EHR service, demographic service, archetype service and terminology service. The presentation services layer consists of smart and connected healthcare applications (web and mobile based). The key design principles of CHISTAR are described as follows:

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 5

TABLE 1 Comparison of VA VistA and CHISTAR VA VistA Big Model design methodology where each application module generates at-least one global data file that is stored in the MUMPS database. Scaling VistA’s client-server architecture requires untenably complex infrastructures. Tightly coupled application components.

Semantic interoperability difficult as VistA doesn’t use explicit formal ontologies VistA is based on heterogeneous and dated technologies (such as MUMPS) which impacts innovations, maintenance and operations. Old-fashioned UI based on dated technology (Delphi).

VistA’s stateful design limits scalability. E.g., VistA makes use of RPC Brokers that are stateful and suffer from limited scalability. Data querying with VistA is cumbersome given the complex RPC’s used for querying (such as DDR LISTER).

4.1

CHISTAR Generic design methodology which uses a reference model that defines general purpose set of data structures and an archetype model that defines the clinical data attributes. CHISTAR’s cloud component model based design approach and use of cloud technologies make scalability easy. Loosely coupled application components that communicate asynchronously. Design based on Cloud Component Model. Achieves semantic interoperability by using a two level modeling approach - reference model and archetype model. CHISTAR uses state of the art cloud technologies and cloud component model approach for system design that allows easy maintenance and integration of new technologies. Moder UI based on Web 2.0 technologies such as AJAX, jQuery, etc. CHISTAR UI is optimized for desktop, tablet and mobile platforms. CHISTAR uses stateless and scalable design approach based on Cloud Component Model.

Fig. 5. Two-level modeling approach for EHR system design.

CHISTAR uses modern technologies and frameworks for data querying such as Hive and hQuery that are scalable and make data querying easy.

Semantic Interoperability

Semantic interoperability is defined as the ability to share, interpret and make effective use of information exchanged. CHISTAR achieves semantic interoperability by using a generic design approach. CHISTAR uses a two-level modeling approach which separates data from clinical knowledge as shown in figure 5. A two level modeling approach for an EHR system consists of a data storage model and an archetype model [8]. Data storage model defines entities for data storage and represents the semantics of storing data. Archetype model defines the clinical concepts. Archetype model represents the domain-level structures and constraints on the generic data structures defined by the data storage model. Two-level modeling approach makes the system more robust as the software need not be changed whenever there is a change in the clinical knowledge. CHISTAR extends and incorporates archetype models provided by OpenEHR [8]. An archetype definition contains: (1) a header section that includes

Fig. 6. Example of an application tals/Measurements) created with templates archetypes.

(Viand

the archetype meta-data, (2) a definition section that contains the modeled clinical concept, and, (3) an ontology section that describes the entities defined in the definition section. Archetypes are separate from the data and are stored in a separate repository. Archetypes are deployed at run time using templates that specify groups of archetypes to use for a particular purpose (e.g. laboratory report). Figure 6 shows example of the Vitals/Measurements application that stores records of all vital signs of a patient such as blood pressure, pulse, etc. The application can be created using templates and archetypes. CHISTAR reference model extends and adapts the HL7 v3.0 [17] and OpenEHR [8] data types. Figure 7

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 6

Fig. 7. Example of CHISTAR’s implementation of reference model data types.

Fig. 8. Proposed data integration approach.

shows example of an implementation of a CHISTAR data type (CV QU AN T IT Y ) which is specified in the reference model. CHISTAR implements data types (defined in reference model) in Java and uses HBase for storing EHR data specified in the form of the data types. An archetype denotes a model defining some domain concept (such as Haemoglobin), expressed using constraints on instance structures of an underlying reference model. Archetypes and templates perform two key functions. Firstly, they facilitate data validation at the time of data capture or data import, to ensure that the data conforms to the archetypes. The second function of archetypes and templates is to facilitate data querying. 4.2

Data Integration

Healthcare data exists in various forms (structured or unstructured) on different data storage systems such as relational databases (RDBMS such as MySQL, Oracle, etc), file servers (as text, image, video files, etc)

Fig. 10. CHISTAR data integration engine for integrating data from variety of standards (such as HL7, XML, X12, DICOM, NCPDP, etc.) and protocols (such as JDBC, FTP, MUMPS, etc.).

and EHR standards (such as HL7 messages). Figure 8 shows the proposed approach for data integration. The proposed data integration engine is based on Hadoop MapReduce framework [19]. Figure 9 shows the cloud-based EHR data storage architecture of CHISTAR. The data integration engine converts EHR data from different sources to flat files which are stored in HDFS distributed storage. A MapReduce based bulk loader loads the data from the flat files into HBase (which is a non-relational distributed database) [20]. Hive [21] is data warehouse system for Hadoop that facilitates easy data queries and the analysis of large datasets stored in Hadoop compatible file systems. Cloud-based application servers access EHR data in HBase either using Java APIs or the SQL-like query language provided by Hive called HQL. CHISTAR supports distributed

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 7

Fig. 9. Cloud-based EHR data storage architecture of CHISTAR.

Fig. 13. (a) SAML-SSO based authentication and federated identity management approach adopted by CHISTAR, (b) CHISTAR’s OAuth authorization flow, (c) Role based access control approach adopted by CHISTAR, (d) Encryption and key management approach adopted by CHISTAR.

querying of healthcare data using hQuery open source framework [22]. The benefit of using HBase for EHR data storage is that the EHR system can handle massive scale data while meeting the performance requirements. CHISTAR maintains separate HBase tables for EHR data and demographic/identity data. Separation of EHR data from demographic data provides additional security as the EHR data alone provides no information on the identity of the patient it belongs to. Figure 10 shows the architecture of CHISTAR’s data integration engine which provides interoperability with different health information systems. CHISTAR’s data integration engine is built upon the Mirth Connect [9] open source integration engine and the

FM Projection toolkit [10]. To integrate EHR data from VA VistA (which is stored in MUMPS database), CHISTAR builds upon the FM Projection toolkit. FM Projection projects the FM data and structures which can be viewed via standard database query and reporting tools. FM Projection allows hooking up any tool that knows JDBC or ODBC to FileMan data. The data integration engine uses JDBC connectors to hook up with VistA FileMan via FM Projection. CHISTAR’s data integration engine includes source connectors which can connect to external systems. Data standards currently supported by CHISTAR’s data integration engine include HL7 v2.x, HL7 v3, CDA, CCR, DICOM, X12, CCD, XML, NCPDP, EDI, Delimited Text and Raw ASCII. The data integration

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 8

Fig. 11. Meta-data lookup and semantic matching steps involved in data integration process.

Fig. 12. CHISTAR security features

engine has a pluggable architecture that makes it easy to add support for additional data standards. In the first step of data integration, a source connector connects to an external system. Source connectors are configured against data sources. For example, to import a set of HL7 files, a source connector is created by supplying an input HL7 data file that is used to determine the type and semantics of the input data. Meta-data lookup is performed to discover the semantics of the data elements in the source file. CHISTAR maintains meta-data repositories for all the data types supported by the data integration engine. In the metadata lookup process, the input file is parsed and the meta-data repository of the source format is looked up to retrieve the semantics of all the data elements in the source file. The meta-data lookup process is data driven and produces an intermediate XML file which has all the data elements in the source file along with the annotations which are obtained by looking up the meta-data repository of the source format. The intermediate XML file eliminates the source data syntax and retains the hierarchy and properties of the data elements along with the annotations.

In the next step, semantic matching is done in which the meta-data repository of the destination format is searched and a list of candidate mappings for each data element in the intermediate file is retrieved. The search process is guided by the annotations for the source data elements which are included in the intermediate file. The semantic matching step has the option of both automated and manual mapping. In automated semantic matching, the most similar candidate mappings for all the source data elements are retained. For semantic matching we adopt the two step approach proposed by Giunchiglia et al [18]. In the first step, we calculate mappings between schema elements by computing semantic relations (e.g., equivalence, more general, less general and disjointness). In the second step, we determine semantic relations by analyzing the meaning which is codified in the elements and the structures of schemas. The manual option allows the user to manually specify the mappings for all the source data elements. The manual option is useful when no matching element is found in the destination meta-data repository for a source data element. The user can then specify a custom mapping for the source data element, which becomes a part of the meta-data repository for subsequent automated matching. The semantic matching process produces a mappings XML file. Figure 11 shows the meta-data lookup and semantic matching steps. The mappings XML file is used to guide the data importing process. The benefit of generating a mappings file is that the same mappings can be used to import a number of source files of the same format. When importing multiple data files, the input is split to parallelize the data importing process. Data aggregation is then done using MapReduce jobs which transform the data from input splits to the destination format. Destination connectors then write the created data files on HDFS storage. 4.3 Security The biggest obstacle in the widespread adoption of cloud computing technology for EHR systems is security and privacy issues of healthcare data stored in the cloud, due to its outsourced nature. In the U.S., organizations called covered entities (CE), that create, maintain, transmit, use, and disclose an individual’s protected health information (PHI) are required to meet Health Insurance Portability and Accountability Act (HIPAA) requirements [23]. HIPAA requires covered entities (CE) to assure their customers that the integrity, confidentiality, and availability of PHI information they collect, maintain, use, or transmit is protected. HIPAA was expanded by the Health Information Technology for Economic and Clinical Health Act (HITECH), which addresses the privacy and security concerns associated with the electronic transmission of health information [24].

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 9

Fig. 14. Cloud Component Model for an EHR application.

CHISTAR addresses they key requirements of HIPAA and HITECH as summarized in Figure 12. CHISTAR adopts the Cloud Security Alliance’s (CSA) Trusted Cloud Initiative (TCI) reference architecture [25]. Key security features include: 1) Authentication : CHISTAR adopts Single Sign On (SSO) for authentication. SSO enables users to access multiple applications after signing in only once, for the first time. When a user signs in, the user identity is recognized and there is no need to sign in again and again to access related applications. Our implementation of SSO is based on Security Assertion Markup Language (SAML). Figure 13(a) shows the authentication flow using SAML SSO. When a user tries to access CHISTAR, a SAML request is generated and user is redirected to the identity provider. The identity provider parses the SAML request and authenticates the user. A SAML token is returned to the user, who then accesses CHISTAR with the token. SAML prevents man-in-the-middle and replay attacks by requiring the use of SSL encryption when transmitting assertions and messages. SAML also provides a digital signature mechanism that enables the assertion to have a validity time range to prevent replay attacks. 2) Authorization : Authorization services include policy management, role management and role based access control. CHISTAR supports OAuth [26] for

authorization as shown in figure 13(b). OAuth is an open standard for authorization that allows resource owners to share their private resources stored on one site with another site without handing out the credentials. In the OAuth model, an application (which is not the resource owner) requests access to resources controlled by the resource owner (but hosted by the server). The resource owner grants permission to access the resources in the form of a token and matching shared-secret. Tokens make it unnecessary for the resource owner to share its credentials with the application. Tokens can be issued with a restricted scope and limited lifetime, and revoked independently. Figure 13(c) shows the role based access control framework used by CHISTAR. A user who wants to access healthcare data in the cloud is required to send his/her data to the system administrator who assigns permissions and access control policies which are stored in the User Roles and Data Access Policies databases respectively. The role based access control framework provides access to healthcare data to the users based on the assigned roles and data access policies. 3) Identity M anagement : Identity management services provide consistent methods for identifying persons and maintaining associated identity attributes for the users across multiple organizations. CHISTAR uses Federated Identity Management (FidM) as shown in figure 13(a).

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 10

4) Securing Data at Rest : CHISTAR adopts AES256 (256-bit Advanced Encryption Standard) which is a data encryption standard established by NIST [28]. All CHISTAR data that is stored in HBase is first encrypted with AES-256 encryption and then inserted into HBase. 5) Securing Data in T ransit : All transmission of data is protected with HTTP over Secure Socket Layer (SSL) encryption technology. 6) Key M anagement : Figure 13(d) shows the key management approach adopted by CHISTAR. Each patient has a separate key. All keys for encryption are stored in a data store in the cloud which is separate and distinct from the actual data store. Additional security features such as key rotation and key encrypting keys are also used. Keys can be automatically or manually rotated. The key change frequency can be configured. In the automated key change approach, the key is changed after a certain number of transactions (i.e. accesses to a patient’s records). All keys are themselves encrypted using a master key. 7) Data Integrity : Data integrity ensures that the data is not altered in an unauthorized manner after it is created, transmitted or stored. CHISTAR uses Message Authentication Codes (MAC) to detect both accidental or deliberate modifications in the data. MAC is a cryptographic checksum on the data that is used to provide an assurance that the data has not changed. Computation of MAC involves the use of (1) a secret key that is known only to the party that generates the MAC and the intended recipient, and (2) the data on which the MAC is computed. 8) Auditing : Regulations such as HIPAA/HITECH require that log data on the accesses to PHI be maintained for accountability purposes. CHISTAR logs all read and write accesses to patient health records. Logs include the user involved, type of access, timestamp, actions performed and records accessed. 4.4

Component-based Architecture

CHISTAR adopts the Cloud Component Model approach for application design described in our previous work [14]. Cloud component model allows identifying the building blocks of a cloud application which are classified based on the functions performed and type of cloud resources required. Each building block performs a set of actions to produce the desired outputs for other components. The model is represented as a component map in which the columns represent various functions of the application, and rows represent cloud resources. Figure 14 shows an example of the cloud component model for an EHR application. The blocks in the figure show individual components of the application that perform specific functions. Each cloud application component is characterized by the function performed and type of cloud resources required. Each component takes specific

inputs, performs a pre-defined set of actions and produces the desired outputs. Components in the Cloud Component Model are loosely coupled. Loosely coupled components communicate asynchronously through message based communication. The benefit of loose coupling for EHR applications is that if one component is receiving and processing requests faster than other the components, buffering of requests using messaging queues will help make the overall system more resilient to bursts of traffic. For interactions between two components, a messaging queue is used. A non-relational status database is used for storing state, intermediate status and user data about tasks of various components. The status database captures the state of the application. Components independently store their state in the status database. 4.5 Evaluation We deployed CHISTAR on the Amazon Elastic Compute Cloud (EC2) infrastructure [29]. Figure 15 shows the deployment architecture of CHISTAR. In this deployment, tier-1 consists of web servers and load balancers, tier-2 consists of application servers and tier-3 consists of a cloud based distributed batch processing infrastructure such as Hadoop [19]. HBase is used for the database layer. HBase [20] is a distributed non-relational column oriented database that runs on top of HDFS. HBase provides a fault-tolerant way of storing large quantities of sparse data. HDFS is used for the storage layer for storing healthcare data in the form of flat files, images, etc. Hive [21] is used to provide a data warehousing infrastructure on top of Hadoop. Hive allows querying and analyzing data in HDFS/HBase using the SQL-like Hive Query Language (HQL). Zookeeper [27] is used to provide a distributed coordination service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. For simplicity in describing CHISTAR’s multi-tier deployment configuration we use the naming convention - (#L(size)/#A(size)/#H(size)), where #L is the number of instances running load balancers and web servers, #A is the number of instances running application servers, #H is the number of instances running the Hadoop/HBase cluster and (size) is the size of an instance. A small instance size is equivalent to 1 EC2 compute unit, large is equivalent to 4 EC2 compute units and extra large is equivalent to 8 compute units, where each EC2 compute unit provides an equivalent CPU capacity of 1.0-1.2 GHz 2007 Opteron processor or 2007 Xeon processor. Figure 16 shows a screenshot of CHISTAR application. We used the Amazon Simple Queuing Service (SQS) [29] for message queues between various components of CHISTAR. Amazon SQS offers a reliable, highly scalable, hosted queue service for storing messages. To store the intermediate status, we

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 11

used Amazon SimpleDB [29] as the status database. Amazon SimpleDB is a highly available and flexible non-relational data store. CHISTAR components communicate asynchronously using the SQS messaging queues and store the state externally in a SimpleDB database. In order to evaluate the scalability of CHISTAR, we performed a series of experiments with very large data sets (upto 1,000,000 patient health records). The data sets for the experiments were generated synthetically. The patient record data used for experiments consisted of diagnosed problems, medications, vital signs, etc. for patients. Figure 17 shows the average response time for the CHISTAR application for four different deployment configurations and varying number of patient health records. The results shown in Figure 17 were obtained with 100 users accessing the CHISTAR application simultaneously. We observe that response time increases as the number of records increase. Figure 17 also demonstrates the vertical and horizontal scaling options of CHISTAR. Comparing deployments (1L(large)/2A(small)/2H(large)) and (1L(small)/3A(small)/2H(large)), we observe that by horizontal scaling (increasing the number of application servers), lower response times are achieved. Similarly comparing (1L(large)/2A(small)/2H(large)) and (1L(xlarge)/2A(large)/2H(xlarge)) deployments, we observe that by vertical scaling (increasing the compute capacity of servers) lower response times are achieved. For (1L(xlarge)/4A(xlarge)/5H(xlarge)) deployment, a response time of 48 ms is achieved for 1 million patient health records with 100 simultaneous users. Figure 18 shows the average response time for the CHISTAR application for four different configurations and varying number of simultaneous users. The results in Figure 18 were obtained with 10000 patient health records in the CHISTAR application. With increase in number of users the mean request arrival rate increases since CHISTAR services higher number of requests per second, therefore an increase in response time is observed. Figure 18 also demonstrates the vertical and horizontal scaling options of CHISTAR. Comparing the four different deployment configurations of CHISTAR we observe that lower response times are achieved by vertical and horizontal scaling. For (1L(xlarge)/4A(xlarge)/5H(xlarge)) deployment, a response time of 514 ms is achieved for 10000 patient health records with 2500 simultaneous users. This response time is well below the acceptable response time for web applications (≈4000 ms) for a good user experience. These results demonstrate the ability of CHISTAR to scale up to handle a very large number of patient health records and users.

Fig. 15. Deployment architecture of CHISTAR.

Fig. 17. Average response time for CHISTAR for varying number of patient records with 100 simultaneous users.

5

B ENEFITS

In this section we describe the advantages of cloudbased EHR systems over client-server EHRs that are based on a dedicated hosting model: 1) Interoperability : CHISTAR has better interoperability as compared to client-server based EHR systems such as VistA. To achieve interoperability, CHISTAR adopts a two level modeling approach for separation of information

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 12

Fig. 16. Screenshot of CHISTAR application showing summary of records for a patient.

Fig. 18. Average response time for CHISTAR for varying number of simultaneous users with 10000 patient records.

from the clinical knowledge. Furthermore, the data integration engine of CHISTAR allows integrating data from disparate data sources such as MySQL servers, JDBC servers, Oracle, file servers and different EHR standards (HL7 messages, HL7 CDA documents, etc) into a cloud-based storage. 2) Scalability : Cloud-based EHRs such as CHISTAR have better scalability as compared to client-server EHRs. CHISTAR adopts the Cloud Component Model approach for application design which provides better scalability by decoupling application components and providing asynchronous communication mechanisms. Since components are designed to process requests asynchronously, it is possible to parallelize the

processing of requests. Using Cloud Component Model, CHISTAR can leverage both horizontal and vertical scaling options. 3) M aintainability : CHISTAR has better maintainability as compared to client-server based EHR systems. The functionality of individual components of CHISTAR can be improved or upgraded independent of other components. Loose coupling allows replacing or upgrading components, without changing other components. Since CHISTAR has loosely coupled components, it is more resilient to component failures. In case of client-server based EHR systems with tightly coupled components, failure of a single component can bring down the entire application. 4) P ortability : Cloud-based EHR systems such as CHISTAR have better portability. By designing loosely coupled components that communicate asynchronously, it is possible to have innovative hybrid deployments in which different components of an application can be deployed on cloud infrastructure and platforms of different cloud vendors. 5) Reduced Costs : Client-server EHR systems with dedicated hosting require a team of IT experts to install, configure, test, run, secure and update hardware and software. With cloud-based EHR systems, organizations can save on the upfront capital investments for setting up the computing infrastructure as well as the costs of managing the infrastructure as all of that is done by the cloud provider. Though hardware maintenance overhead

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 13

is reduced, organizations still need to pay for the software maintenance and support costs. Additional cost benefits come by scaling cloud resources up (or scaling out) only for those components which require additional computing capacity.

6

C ONCLUSION & F UTURE W ORK

In this paper we described the design of a cloud-based EHR system - CHISTAR which addresses the problems faced by traditional client-server EHR systems. CHISTAR adopts a two level modeling approach for achieving semantic interoperability. The data integration engine of CHISTAR allows aggregating healthcare data from disparate data sources. CHISTAR supports advanced security features and addresses the key requirements of HIPAA and HITECH. CHISTAR has better interoperability, scalability, maintainability, portability, accessibility and reduced costs as compared to traditional client-server EHR systems. Future work will focus on development of a cloud-based Information Integration and Informatics (III) framework for healthcare applications. III framework will allow development of smart and connected healthcare applications backed by massive scale healthcare data integrated from heterogeneous and distributed healthcare systems within a scalable cloud infrastructure. Furthermore, we will develop a data thinning and progressive sampling approach within the CHISTAR infrastructure that will further improve the querying efficiency and accuracy.

R EFERENCES

[15] N. Radio, Y. Zhang, M. Tatipamula, V.K. Madisetti, NextGeneration Applications on Cellular Networks: Trends, Challenges, and Solutions, Proceedings of the IEEE, Vol. 100 , Iss. 4, 2012. [16] J. Patrick, R. Ly, D. Truran,Evaluation of a persistent store for openEHR, In Proceedings of the HIC and HINZ. Health Informatics Society of Australia, 8389, 2006. [17] HL7, http://www.hl7.org, 2012. [18] F. Giunchiglia, P. Shvaiko, M. Yatskevich, Semantic schema matching. In Proceedings of CoopIS, pp. 347-365, 2005. [19] Apache Hadoop, http://hadoop.apache.org/mapreduce, 2012. [20] Apache HBase, http://hbase.apache.org, 2012. [21] Apache Hive, http://hive.apache.org, 2012. [22] hQuery, http://projecthquery.org, 2012. [23] Health Insurance Portability and Accountability Act, http://www.hhs.gov/ocr/privacy/, 2013. [24] Health Information Technology for Economic and Clinical Health Act, http://www.hhs.gov/ocr/privacy/hipaa/ administrative/enforcementrule/hitechenforcementifr.html, 2013. [25] CSA Trusted Cloud Initiative, https://research.cloudsecurityalliance.org/tci/, 2012. [26] OAuth, http://oauth.net, 2012. [27] Apache Zookeeper, http://zookeeper.apache.org, 2012. [28] AES, csrc.nist.gov/publications/fips/fips197/fips-197.pdf, 2012. [29] Amazon Web Services, http://aws.amazon.com, 2012

Arshdeep Bahga Arshdeep Bahga is a Research Scientist with Georgia Institute of Technology. Arshdeep received his B.E. degree in Electronics and Electrical Communication from Punjab Engineering College, Chandigarh, India in 2006 and M.S. degree in Electrical & Computer Engineering from Georgia Institute of Technology, Atlanta, USA, in 2010. He has previously worked as a Software Engineer with Electronics for Imaging, Bangalore, India. His research interests include cloud computing, digital signal processing and embedded software systems.

[1] KPMG, The Cloud Changing the Business Ecosystem, www.kpmg.in, 2012 [2] VistA Monograph, www.va.gov/vista monograph, 2012. [3] Medsphere OpenVistA, http://medsphere.org/community/ project/openvista-server, 2012. [4] Medsphere Systems Corporation, From VistA to OpenVista - EnVijay K. Madisetti Vijay K. Madisetti earned hancing an Already Valuable Tools, www.medsphere.com/vistaa Bachelor of Technology (Honors) degree to-openvista, 2010. from the Indian Institute of Technology (IIT), [5] Medsphere Systems Corporation, From MUMPS to Java: Kharagpur, India, in Electronics and ElecOVID Unleashes Power of Open-source Health IT, trical Communications Engineering in 1984. http://www.medsphere.com/ovid-white-paper, 2010. Following this, he went to the University of [6] WordVistA, http://worldvista.org, 2012. California at Berkeley where he earned a [7] vxVistA, https://www.vxvista.org, 2012. Ph.D. in Electrical Engineering and Com[8] OpenEHR, http://www.openehr.org, 2012. puter Sciences in 1989. Dr. Madisetti joined [9] Mirth Connect, http://www.mirthcorp.com/products/mirththe ECE faculty at Georgia Tech in 1989. connect, 2012. He leads several research and educational [10] Medsphere FM Projection, http://medsphere.org/community/ programs at Georgia Tech in the area of digital signal processing, project/fm-projection, 2012. embedded computing systems, chip design, wireless and telecom [11] A. Bahga, V.K. Madisetti, Analyzing Massive Machine Maintesystems, and systems engineering. He has authored or edited nance Data in a Computing Cloud, IEEE Transactions on Parallel several books, including VLSI Digital Signal Processors (1995) and and Distributed Systems, vol. 23, iss. 10, pp. 1831 - 1843, 2012. the Digital Signal Processing Handbook (Second Edition, 2010). Dr. [12] A. Bahga, V.K. Madisetti, Synthetic Workload Generation for Madisetti is a Fellow of the IEEE, and received the 2006 Frederic Cloud Computing Applications, Journal of Software Engineering Emmons Terman Medal from the American Society of Engineering and Applications, vol. 4, no. 7, pp. 396-410, 2011. Education and HP Corporation. He is currently serving on several [13] A. Bahga, V.K. Madisetti, Performance Evaluation Approach for campus initiatives, and is the Executive Director of Georgia Tech’s Multi-tier Cloud Applications, Journal of Software Engineering India Initiative. and Applications, vol. 6, no. 2, pp. 74-83, 2013. [14] A. Bahga, V.K. Madisetti, Rapid Prototyping of Advanced CloudBased Services and Systems, Submitted to IEEE Computer, Nov. 2012. Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

A cloud-based approach for interoperable electronic health records (EHRs).

We present a cloud-based approach for the design of interoperable electronic health record (EHR) systems. Cloud computing environments provide several...
4MB Sizes 20 Downloads 9 Views