A framework for uniform access to data, software and knowledge.

A Framework for Uniform Access to Data, Software and Knowledge Erik M. van Mulligen, Teun Timmers, Freek van den Heuvel

Department of Medical Informatics Erasmus University Rotterdam The Netherlands Abstract An object-oriented framework is presented that offers integration of various types of entties at one workstation. Five types of entities are distinguished: data, knowledge, functions, presentation forms and hardware, and for each of these entities an 'accessor' is introduced. An accessor offers abstraction from the particularities of access to the entities. For the interaction with this framework a programming language has been defined. A restricted form of the framework has been used to implement a prototype medical workstation for the support of clinical data analysis.

Introduction Although many software applications are now available, these applications are still used separately. The synthesis of individual applications to a larger context is still not supported by vendors, in spite of the promises of powerful hardware, open system technology and increasing telecommunications. Moreover, the use of these applications has shifted from a tool-oriented approach to a task-oriented one. The integration synthesis of software applications has to be directed to the following aspects: (1) transparent exchange of data between the underlying applications, (2) availability of all applications from one workstation, thereby offering transparent network usage, (3) macro functions that abstract from keystrokes and command languages, (4) a friendly, uniform user interface, and (5) a medical data model and medical orientation. These aspects together support the basics of horizontal integration [1] and have been mentioned already in our previous work [2,3]. In this paper, we present a general framework that is used for achieving horizontal integration on existing, commercially available applications. This framework has been tested in a prototype medical workstation for the support of data analysis in cardiology. The existing applications that have been horizontally integrated are representative for many environments: a MUMPS-based departmental information system (TUS), two database management systems (INGRES and dBaseIII), a statistical application (BMDP), a graphical presentation system (Harvard Graphics) and a text processor (WordPerfect 5.0). The protoWpe has been developed on a BP9000 Unix workstation, and uses network access to remote data and applications. Details of the workstation architecture can be found elsewhere [2]. 0195-4210/91/$5.00 © 1992 AMIA, Inc.

496

Approach The prototype workstation consists of five facilities, offering integration of locally and remotely existing applications: (1) a user-interface facility (UWF), (2) a data translaion facility (DMF) for slating data between the different formats used by the applications, (3) a command generation facility (CGF) that generates keystrokes and commands for the applications, (4) a network facility (NE) for addressing remote applications and an executive management facility (EMF) that controls the operation of the other facilities. This EMF acts as integration server and translates function requests from the user interface into calls to the DTF and the CGF (Figure 1). These in turn, call the NF when the application has to be accessed thfrogh the network [2].

Figure 1. Integation as achieved in the prototype medical workstation. From the experience with the development of a prototpe medical workstation, the following shortcomings were recognized. First of all, integration of applications in the workstation is too much dependent on the underlying applications: changes and new applications require continuous updates of the integration server. From the workstaion point of view, the entities are

not the applications such as dBase, INGRES, BMDP etc, but data, knowledge, functions, presentation forms and hardware. The integration concept of the prototype primarily focused on applications as sets of functions, rather than integrating all entities according to this concept The integration concept can be extended to include all types of entities, to treat them uniformly and to address them as entities. To overcome these problems, a new approach to the concept of integration has been defined. This approach uses accessors as a framework for defining the access to all entities: functions, data, knowledge, presentation forms and hardware. Through these accessors, the interfaces to these entities can be declared, hiding the workstation from the specific details. Accessors encapsulate existing applications offering standard interfaces. Abstraction of access details has been proposed by others. Lederberg [4] proposed 'knowbots' as an abstraction of the access to knowledge entities. Wiederhold [5,6] defines object-oriented 'mediators' for access to data. We extend these notions to include all types of entities. The accessor interfaces are used by a programming language that interacts with the accessor framework and evokes the associated methods. Through this programming language, workstation facilities address all known entities, without having to know their underlying methodology (Figure 2). This programming language on top of an object-oriented structure offers a powerful interface to the 'world' that is lacking in for example the Helios project [7]. In the Helios project, an object-oriented software bus has been proposed, and applications directly interact with this software bus. The interaction is fully defined by the structure and organization of the bus, whereas a programming language offers a more flexible separation between description and behaviour. Addressing these entities from within one progamming environment, has often been referred to as mega-programming [8]. A mega-programming language can be derived naturally from our definition of accessors. In this paper, we both discuss the accessors and briefly describe the features of a programming language necessary to use these accessors. Framework Accessors are defmed as objects in an object-oriented environment Accessors that have common properties can be grouped into classes and methods can be defined at accessor and class level. The classes are organized as a tree, where the root class is the most general class, and every child class is a specialization of the father class. Associated with each class are properties and methods. Properties and methods are inherited from father accessor classes. Methods are linked with specific drivers, allowing to overload methods (different drivers linked per accessor for a particular method). Values of properties can also be specifled at the class level. In addition to these class-defined methods

Figure 2. Integration as achieved through a franework of accessors. and property values, objects can have also local property values and methods, which overrule the global ones. References to other objects or classes can be specified through a property.

References to other objects are used to form compound objects: compound objects have a property containing references to basic objects. Examples of compound objects are found with data, presentation and function accessor objects. Through composition, data are arranged in logically larger components. A compound accessor object is addressed as though it is a basic data object. Complex presentation forms with multiple elements are specified through composition of basic presentation accessors in a compound presentation accessor. Series of functions are grouped by creating a compound object that references to the basic function accessors. Each compound accessor has methods associated for joining basic accessor objects. Each method is specified as a tuple of message and the name of a function accessor. Sending a message to an object results in an activation of the function with as argument to the function the name of the object. If a message is not defined in an object, the class is searched for the method. If the class does not have the message associated, the father class is searched etc. The function accessor is has a property with the name of the actual driver program. The programming language offers facilities for sending messages to accessor objects. The interpreter of this programming language locates the object that contains the method, and activates the proper call. Firstly, the accessor structure will

497

be defmed, and subsequently the programming language will be described. Accessors Five major tes of accessors are distinguished: (1) data accessors, (2) knowledge accessors, (3) function accessors, (4) presentation accessors and (5) hardware accessors. Each accessor tpe serves a specific entity type and these accessors are interlinked to constitute a functional framework. Each accessor and class is uniquely named for direct referencing. Each accessor has (management) e s defined: accesor name ( creation time: last access time: modificaton time: number of times referred: author: access: location: available: ,< time> generalization: pretty name:

At the root class, basic methods are defined, that are applicable to all accessor objects in the environment. The methods include management functions for the environment: create, delete, modify and print for both accessor objects and classes. Depending on whether the method is applied to an accessor object or a class name, a different function accessor is called.

Data accessors Increasingly, data are stored routinely in electronic databases. The use of these data is still limited to the information system that has been developed around these databases, and retrieval of data for other purposes (e.g. workstations) is still very difficult These difficulties are mainly caused by (1) the lack of standard query languages and (2) the lack of a medical data-model or data dictionary. Therefore, specific knowledge is required to directly access the data. Besides this, patient data can be dispersed over many databases and facilities to join these are in general still lacking. The workstation offers a facility for tansrent access and join of data from different databases. Data accessors have the following erties: ( medical name: synonyms: ,.. code according to system: (,),.. database: address: , type:

Figure 3. Composition of data accessors Each data accessor has a cofresponding file containing the real The files are structred according to a common Intermediate Storage Format (ISF). Additional information in the daa accessor specifies how to treat multiple record occurrences of one patient, which codes of different coding systems cofrespond with the data accessor, e.g. in MeSH, ICD-9, SNOMED or UIMLS [9,10], and which triggers of knowledge accessors for retrieval exist. Methods for data accessor objects are: get and put.

Compound accessors (= aggregration) can be achieved by oining the underlying accessors on a key (Figure 3). The get and put methods of a compound object activate repetitively the method with the same name to the data accessors which compose the object. After the get method has been executed by the accessors, the data files are, restricted to the uniqueness constraint, joined on key attribute by a join method. Knowledge accessors Although integrity and reference constraints are dealt with by most infomation systems, many other checks and constaints remain. The knowledge accessors hide the specific modules that test and validate data according to medical knowledge. The underlying systems that perform this testing may vary from a simple compare test to an expert system that generates critique based on medical klowledge.

size: limits: , code list: key: uniqueness: method for obtaining uniqueness: , methods: (,fmctio),.. insert trigger: retrieve trigger:

498

applications directly interact with the user and no presentation accessor can be used. However, applications may generate an output file that can be presented to the user via presentation accessors. These accessors address user interface entities and present an output fle to the user in a graphical form. The presentation accessors can be combined into compound accessors for more advanced presentation objects (Figure 4).

Altiough standardization of knowledge description is gaining more attention [11], the knowledge accessors are feeing the workstation from storage and evoke detais of the underlying systems. Knowledge accessors also hide details of application's required data format These data can in tum be delivered by data accessors and formatted by a data translation method. Knowledge accessors have the following strwture: Acnowledge accessor name> ( methods: (,functios>)..

The typical accessor classes are: lists, tables, curves, histograms etc. The presentation accessors are a specific type of function accessor and the structure is similar to that of the function accessors. Implementation of the presentation drivers has been achieved in the prototype with OSF/Motif. These drivers can be easily replaced by any graphical system, independent of modifications to the workstation environment.

dat: ,.

Typical methods that can be applied to knowledge accessors are evoke and explain. Through the evoke function, a function accessor will be activated. Function accessors Functions are entities that are supported by applications. Function accessors aim at abstraction from the specific application that offers the function. This abstraction makes it possible to replace an application with a functional equivalent application, without having to modify the interface of workstation applications. Each method is available through a function accessor. The structure of a typical function accessor is:

Hardware accessors More and more, network communication is available in health care. Through network communication, databs on remote machines can be addressed, applications on remote computers can be executed and electronic mail can be exchanged. A number of network protocols have evolved, and (de facto) sandards are available for network communication applications. The structure of a hardware accessor is defined as:

(

data translator: command generator: computer: application: input form: output translator:

chardw

accessor namne ( protocol: login: password:

If either the login or the password is not specified, the accessor will users to supply these. The methods that are associated with a hardware accessor are: remote execute, send file and retrieve rile.

Creation of methods Although a general famework for obtaining abstractions of network-distributed data and functions, the implementation of the specific methods remains. Through these methods, the abstraction is obtained from the particularities of access to different entities. Changes in this access can be dealt with by adapting the methods. This framework offers a possibility to locate the dependency in only a limited number of methods. New developments at the workstation side can be implemented using the accessors without having to worry about details of access.

Figure 4. Composition of presentation accessors At the function accessor level, direct access to underlying drivers is supported. The information stored in the function accessor is sufficient for geneating the proper calls to a data translator, a command generator, the application itself and in some cases for translaing the output file to an Intermediate Storage Format File. Presentation accessors Many applications have their own user interface. These

499

For several domains, the generation of methods can be automated. Tools should be available for the creation of data translation methods and for the creation of command generators. These methods can be sufficiently generalized and tools can be developed. Other methods have sfill to be developed and implemented by hand, although a number of these methods are not depending on access to entities and thus remain stable.

Programming Environment A programming language has been defined that provides the interface to the accessors. This language must possess all the normal control structures of programming languages with some features for sending messages to accessors. In this section, an example will be outlined. As programming language, UNIXt has been used with an extension for interaction with the object environment. This extension is interpreted and translated into 'ordinary' UNIXt calls. Three special shell variables are used: INPUT, OUTPUT and COMMAND to indicate the name of the data input file or accessor, the name of the output file and the name of the generated command file. In order to select a compound data accessor object 'test-population' that exists of 'lab-data' and 'zis-data', a get message is sent to 'test-population'. Then, a ttest function of BMDP will be applied to this data and a graph is made. Both the databases and BMDP are available on remote computers. The program:

The accessor framework is limited to entities that 'behave' regular. Irregular constructions will demand an excessive structure, that is difficult to develop and maintain. For regular entities, the accessor approach has successfully been applied. Development of a programming language to address the accessors is vital for the workstation concept. lhis programming language can be used by new applications to access the accessors and facilitates control of the behavior of the integration server. Standardization of the names of accessors could be a new research direction: through these names, applications can address methods and accessors. Although for some tpes of accessors standards are evolving, UMLS as names for the data accessors and SQL statements as the methods for data manipulation, the names for other accessors are still ad hoc chosen. References [11 Greenes RA. Promoting Productivity by Propagating the Practice of "Plug-compatible Proam ng. Proceedings of the 14th Symposium on Computer Applicadons in Medical Care. Washington DC. New York.c IEE Computer Society Pres. November 1990. 22-26. [2] Van Mulligen TIimmers T, Leo B de P. Implementation of a EK Medical Wodstidon for Resch Support in Cardiology. Proceedings of the 14th Symposium on Computer Applicatiom in Medical Cae. Wbhington DC, New York: IEEE Computer Society Press. November

get 'test-population' t-tet graph

will be translated by the interpreter to: remote execute on vax VAX10 mulligem pasord g _from zis( zis data ) retrieve file from vax VAXI0 mulligen pasord $INPUT

translate zis_to_SF $INPUT remote execute on HP9O LAB muEg.. pssword get lab data )

1990. 769-773. [3] Leow B de F, Timmers T, Van der Lei J, Van Mulligen EM. HEARTVIEW, a Knowledge Base to Support Clinical Research in Cardiology. Proceeding of the 14th Sympoium on Computer Applications in Medical Care. Washington DC. New York: IEEE Compute Society Press. November 1990. 605.609. [4] Lederberg J, Unapher K. Towards a National Collaboration. NSP, Mach 1989. [51 Wiedewhold G., Basalou T., Sharing Information among Biomedical Applications. Proceedings SEMI: IMIA Working Conference on Software Egeing in Medical Informatics. Amsterdam, October 1990; in press. [6] Baslou T., Wiedehold G., Knowledge-Directed Mediation Betwee Application Objects and Base Data Proceedings of the Working Conference on Data and Knowledge Bae Integration. October 1989. [7] Degoulet P. Coigard FJ, Jaulet M-C, Lucas L, Ben Said M, Meinzer H-P. Enagehnn U, Pringub A, Baud R, Scherrer J-R. The HELIOS European Project on Softw ngig. Proceedin SEMI: IIA Workng Conference on Software Engiering in Medical Iformatica. Amsterdam, October 1990; in press. [8] Wiedehold T., Wegper P., Ceri S., Towards Megaprogrmming

from.Jab(

retrieve file from HP9000 LAB mulligen pasword $INPUI

translate lab to_SF sINPuT

join test-population trandate_ISF_to_BMDP $INPUT generate BMDP ttest send file to vax EURVAX3 mulligen pasword $INPUT send file_to_vax EURVAX3 mulligen paord $COMMAND remote execute on vax EURVAX3 mulligUn passord bmdp

$SINPUT $COMMAND graph $OUTPUT

retrieve file from vax EURVAX3 muligen pasword $OUTPUT'

Conclusion In the medical workstation project, integration of existing applications is aimed for, while it can directly be used for integrating current applications. From a management and maintenance point of view however, it is important to separate the access to the applications from the workstation applications, limiting the dependencies to one location. An accessor framework has been designed that hides the access particularities frm the workstation.

(unpublished)

[9] Humphreys BL, Lindbeg DAB. Building the Unified Medical Language System Proceeding of the 13th Symposium on Computer Applications in Medical Cae. Washington DC. New York: IEEE Computer Society Press. November, 1989. 475480. [10] McCray AT. The UMLS Semtic Networkd Proceedings of the 13th Symposium on Computer Applicaions in Medical Car Washingwu DC. New York: IEEE Computer Society Press. November,1989; 503507. [11] Hripcsak 0, Clayton PD, Pryor TA, Hang P. Wigert OB, Van der Lei J. The Arden Syntax for Medical Logic Modules. Prceedings of the 14th Symposium on Computer Applicato in Medical Care. Washngton DC. New Yodr IE Computer Society Press. November 1990. 22-26.

The framework as discussed in this paper, has partly been implemented in a prototype medical workstation for the support of clinical data analysis. The advantage of creating this kind of accessors for abstraction fnro access to entities, is that dependencies on implementation of the entities are limited to a set of methods. Changes in implementation of access can be resolved by modifying the methods. This frees workstation applications from having to deal with these changes.

500

The linked medical data access control framework.

The Regional Hydrologic Extremes Assessment System: A software framework for hydrologic modeling and data assimilation.

Software to Facilitate Remote Sensing Data Access for Disease Early Warning Systems.

Software Framework for Controlling Unsupervised Scientific Instruments.

Aber-OWL: a framework for ontology-based data access in biology.

HCI∧2 framework: a software framework for multimodal human-computer interaction systems.

Regulatory framework for access to safe, effective quality medicines.

PGTools: A Software Suite for Proteogenomic Data Analysis and Visualization.

metaX: a flexible and comprehensive software for processing metabolomics data.

Access to data: a contemporary direction for clinical trials.

New Mexico practitioners' access to and satisfaction with online clinical information resources: an interview study using qualitative data analysis software.

NOVA: a software to analyze complexome profiling data.

A unified software framework for deriving, visualizing, and exploring abstraction networks for ontologies.

Software tools for visualizing Hi-C data.

OriginPro 9.1: scientific data analysis and graphing software-software review.

Growing access to phenotype data.

A Bayesian framework for knowledge attribution: evidence from semantic integration.

A scalable, knowledge-based analysis framework for genetic association studies.

A specialized framework for medical diagnostic knowledge-based systems.

A Pipeline Software Architecture for NMR Spectrum Data Translation.

IoT Big-Data Centred Knowledge Granule Analytic and Cluster Framework for BI Applications: A Case Base Analysis.

A specialized framework for Medical Diagnostic Knowledge Based Systems.

A software tool for the analysis of neuronal morphology data.

A Unified Framework for Data Visualization and Coclustering.