Occupational Medicine 2015;65:651–658 doi:10.1093/occmed/kqv074

Data warehouse for detection of occupational diseases in OHS data L. Godderis1,2, G. Mylle1, M. Coene1, C. Verbeek1, B. Viaene1, S. Bulterys1 and M. Schouteden1 IDEWE, External Service for Prevention and Protection at Work, Interleuvenlaan 58, 3001 Heverlee, Belgium, 2Katholieke Universiteit Leuven, Centre for Environment and Health, 3000 Leuven, Belgium.


Correspondence to: L. Godderis, IDEWE, External Service for Prevention and Protection at Work, Interleuvenlaan 58, 3001 Heverlee, Belgium. Tel: +32 16 390411; e-mail: [email protected]


To build a ‘data warehouse’ to make OHS data available for research and to investigate sectorspecific health problems.


Medical data were extracted, transformed and loaded into the data warehouse. After validation, data on lifestyle, categorized medication use, ICD-9-CM encoded sickness absences and health complaints, collected between 2010 and 2014, were analysed with logistic regression to compare proportions between employment sectors, taking into account age, gender, body mass index (BMI) and year of examination.


The data set comprised 585 000 employees. Average age and employment seniority were 39 ± 12 and 8 ± 9 years, respectively. BMI was 26 ± 5 kg/m2. Health complaints, medication use and sickness absence significantly increased with BMI and age. The proportion of employees with health problems was highest in health care (64%), government (61%) and manufacturing (60%) and lowest in the service sector. In all sectors, 10% of workers reported locomotor health problems, apart from the service sector (8%) with similar results for medication consumption. Neuropsychological drugs were more frequently used by health care workers (8%). The transport sector contained the highest proportion of cardiological medication users (12%). Finally, 30–59% of employees reported at least one sickness absence episode. Sickness absence due to locomotor issues was highest in manufacturing (11%) and health care (10%), followed by government (9%) and construction (9%).

Conclusions Significant differences in indices of workers’ health were observed between sectors. This information is now being used in the implementation of a sector-oriented health surveillance programme. Key words

Data collection; information storage and retrieval; occupational diseases; occupational health services; population surveillance; public health.

Introduction The early detection of health impairment induced or partly caused by work remains difficult. Reliable figures on occupational and work-related diseases are lacking in the European Union due to differences in criteria for notification and recognition of occupational diseases in the legal and social security context. Consequently, countries do not adequately monitor or become alerted to new occupational diseases [1]. Occupational health and safety (OHS) services collect large amounts of data during health surveillance, risk assessments, etc. Despite the potential for early detection

of work-related diseases and for reporting trends, these data have exclusively been used to detect health problems (e.g. occupational asthma) in specific work populations (e.g. laboratory workers). Most OHS data are collected through several applications and stored in databases for mainly operational reasons, i.e. keeping individual and company-related data up to date and available. Consequently, these data remain mainly fragmented, unexplored and unavailable for case finding or studying worker cohorts with particular exposures and diseases [2]. Typically, researchers need to extract data manually from worker records and re-enter them into databases, a labour intensive process prone to errors [3].

© The Author 2015. Published by Oxford University Press on behalf of the Society of Occupational Medicine. All rights reserved. For Permissions, please email: [email protected]

Downloaded from http://occmed.oxfordjournals.org/ at Deakin University Library on October 29, 2015

Background Occupational health and safety (OHS) services collect a wide range of data during health surveillance.


Methods In Belgium, periodic health surveillance is mandatory by law for employees exposed to occupational hazards and includes a yearly medical examination [6]. This study used data on workers under surveillance by the largest Belgian OHS provider, IDEWE. Data from consultations were recorded by occupational health nurses (194 at the end of 2013)  and physicians (166 at the end of 2013) in an electronic worker record and encoded using international or national classifications standards [7]. Several work characteristics were encoded per worker. Occupational hazards were registered using the Belgian legislation codification system. Jobs were classified according to the International Standard Classification of Occupations of the International Labour Organization. The Statistical Classification of Economic Activities in the European Community (hereafter referred to as NACE) was used to characterize the economic sector in which the employee was employed. For the analysis, the NACE codes were regrouped in main categories. Self-reported health complaints and sickness absence were encoded using the International Classification of Diseases version 9 with Clinical Modifications (ICD9-CM). Finally, the use of medication was encoded according to the index of the Belgian Compendium of Pharmaceuticals, which refers to the main pharmaceutical use of the drug.

The data stored in the electronic medical file were extracted, translated and loaded into a data warehouse. The project was carried out according to Belgian and international privacy and ethical legislation, allowing post hoc analysis of anonymized data obtained during occupational health surveillance. We followed six steps for building a medical data warehouse as described in Szirbik NB et al. [8]. In brief: Step 1: identify the requirement for building the data warehouse We established a project group consisting of occupational health officers, researchers, data managers, IT engineers and developers. After setting the scope, we undertook a needs assessment. Step 2: quality and scope of the sources We inventoried and assess the data models of the different applications used by the OHS service and categorized the variables of interest for research and reporting into data types taking into account data registration options. Step 3: identify which data was needed by the stakeholders We then performed a functional analysis to determine the most suitable architecture of the data warehouse both to allow easy maintenance and to enable us to reach our objective of making the data available in order to query, analyse and present information easily for research and reporting purposes. We reviewed different database models, tested different relational models and validated the extract-transform-load process to ensure no data were lost and that all data values and meaning were correct. We also assessed whether all possible data manipulations in the source systems were correctly mirrored in the data warehouse. After validating the extract-transform-load process, we extracted a data set for the sector analysis. Step 4: build procedure We then built a procedure to extract data from their original databases and to transform them to meet the architectural and data quality requirements of the data warehouse. Finally, we loaded all data in the data warehouse. Step 5: how to update the central repository After worker consultations, nurses and physicians synchronize with the central server to integrate the resulting data into the operational medical database. We programmed a daily incremental load to update the central repository with the newly entered data. Step 6: enact exception-handling protocol Cases were built to make a functional analysis and report issues. We discussed each change (e.g. in registration)

Downloaded from http://occmed.oxfordjournals.org/ at Deakin University Library on October 29, 2015

This problem is not uncommon in health care where system-wide relational databases linking administrative, clinical, laboratory and therapeutic data are rarely implemented [3]. Clinicians mostly write notes in medical records and consequently information on encounters between patients and clinicians is mostly stored as text rather than encoded or numerical values. In contrast to health care, where consultations differ depending on the setting and health problem, occupational health surveillance can be standardized, and information (e.g. clinical data and medical observation) obtained from workers can be encoded and stored in their records. This field is rapidly evolving and technology makes the electronic transfer of data from patient and worker records into a ‘data warehouse’ feasible [4]. A data warehouse is a central repository for integrated data optimized for distribution, mass storage, complex query processing, data analysis and reporting. It collects and stores integrated sets of historical data from multiple operational systems and feeds them into one or more data structures designed for faster access [5]. In this project, we built a data warehouse in which we integrated workers’ medical records data to perform studies on self-reported health problems, use of medication and sickness absence. Our aim was to investigate differences in health problems between sectors in order to adapt a general health surveillance programme to sector-specific needs.


Results Several applications contributed to the data warehouse. Here, we only consider data from the electronic medical files, comprising information about medical history, work conditions, biometrics, vaccinations, medication use and sickness absence for ~860 000 employees in 40 000 companies over >10 years. These variables were loaded into a relational database system containing a relational database model consisting of 55 linked tables, accessible by means of a structured query language (SQL) query interface. SQL is a programming language used to work with relational databases. Validation showed a 100% match between the information in the data warehouse and those in the original databases. For 585 000 employees information on the health indicators and confounding variables mentioned above was gathered during at least one medical encounter between 2010 and 2013. The characteristics of the population and distribution per sector are described in Table 1. Figure 1 shows the proportion of employees per sector with a reported and registered health complaint, medication use and sickness absence in 2013. The health care, governmental and manufacturing workers (64, 61 and 60%, respectively) reported at least one health complaint in 2013 during the medical examination. This is about 10% higher than the service sector. Figures 2–4 give an overview of the prevalence of locomotor, neuropsychological and cardiological conditions, medication use and sickness absence. Apart from the service sector (8%), in all sectors at least 10% of workers reported locomotor complaints. Neuropsychological and cardiological conditions were less frequently reported and registered (around 2–4% in most sectors). The highest level of reported medication use was in the health care sector (60%), followed by government

Table 1.  Employee distribution by sector and gender Male

Health care Manufacturing Distributive trade Government Services Construction Education Transport and storage Accommodation and food service Other Total For 3432 subjects employment sector was unknown.


n (%)

Mean age

n (%)

Mean age

57 603 (17) 128 959 (81) 58 353 (69) 66 484 (57) 66 511 (67) 46 880 (98) 26 876 (28) 24 158 (92) 7562 (45) 28 195 (67) 511 581 (50)

41.0 40.5 38.4 43.7 34.5 37.4 29.7 42.9 36.2 41.2 39.2

27 3657 (83) 29 582 (19) 26 520 (31) 49 656 (43) 32 503 (33) 1044 (2) 68 813 (72) 2207 (8) 9347 (55) 14 098 (33) 507 427 (50)

40.4 40.5 37.7 41.2 35.7 36.6 27.7 39.2 39.4 39.6 38.2

Downloaded from http://occmed.oxfordjournals.org/ at Deakin University Library on October 29, 2015

and each technical issue in a monthly meeting assessing the impact on both data registration and reporting. For each employee in the data warehouse, we retained the following independent variables: year of examination (2010, 2011, 2012 or 2013), seniority in employment, age, gender, body mass index (BMI) and employment sector. We treated health indicators as dependent variables and dichotomized them as follows: 1 if a minimum of one complaint, sickness absence episode or use of medication, respectively, was reported, and 0 if absent. In total, we gathered data on ~300 000 employees each year between 2010 and 2013. Before analysis, we selected data based on inclusion criteria and availability. See Supplementary Figure 1, available as Supplementary data at Occupational Medicine Online, for the different data selection and cleaning steps and the effect on the number of retained employee data sets per year of investigation. As a result of data cleaning, 20% of the total data were not considered. For each health indicator, we investigated differences between sectors in proportions of employees with or without the respective indicator by means of logistic regression, controlling for confounding factors gender, age, seniority, BMI and year of examination. Men were identified with value 1, women with value 0. The sector variable was dummy-coded with the health care sector as reference group. Differences between sectors were investigated by comparing beta values and odds ratios. We first assessed sector differences in health indicators on a more general level (presence/absence of health complaint, sickness absence and therapy). Next, we focused on three different pathologies selected: locomotor (ICD-9: 710–739), cardiological (ICD-9: 390–459) and neurological (ICD-9: 290–319) health problems and medication. SPSS version 19.0.0 was used for the analyses.


Figure 2.  Proportion of employees per sector with a registered locomotor health complaint, medication use or sickness absence in 2013.

(53%). Ten per cent of health care sector employees were taking medication for pain and inflammation, with 8% in manufacturing and in government doing likewise. The highest rate of medication use for neuropsychological conditions was also found in the health care sector (8%), while workers in the transport and storage and governmental sectors had the highest rate of cardiological medication use (12%). Finally, 59% of employees in the health care and manufacturing sectors reported at least one sickness absence episode in 2013. Sickness absence due to locomotor conditions was highest in manufacturing and health care (11 and 10%, respectively), followed by government (9%) and construction (9%). Sickness absences due to

neuropsychological or cardiological conditions were less frequently registered. Remarkably, the ranking of sectors according to the proportion of workers with sickness absence, medication use and health complaints varied according to the parameter studied. For example, the transport and storage sector was ranked fifth or sixth for most indicators, other than cardiological diseases and medication consumption, where it was ranked first and second, respect­ ively (Figures 2–4). The differences between sectors remained significant after controlling for BMI, age, gender and year of examination (Table 2). Including both age and seniority in the logistic regression analysis resulted in unreliable

Downloaded from http://occmed.oxfordjournals.org/ at Deakin University Library on October 29, 2015

Figure 1.  Proportion of employees per sector with a registered health complaint, medication use or sickness absence in 2013.


Figure 4.  Proportion of employees per sector with a registered cardiological health complaint, medication use or sickness absence in 2013.

parameter estimates due to a high correlation between the two predictors (r  =  0.57). Two model fit indices were compared between models including either age or seniority, together with the other predictors. Models with age showed a better fit in 21 out of 24 cases (Supplementary Table  1, available as Supplementary data at Occupational Medicine Online) and therefore we opted to retain age and discard seniority from further analyses. Only the ranking of sectors based on the extent of medication use was slightly modified due to the characteristics of the workers in the sector. The proportions of employees with health complaints, medication use and

sickness absence, respectively, significantly increased with BMI and age (P 

Data warehouse for detection of occupational diseases in OHS data.

Occupational health and safety (OHS) services collect a wide range of data during health surveillance...
1KB Sizes 1 Downloads 11 Views

Recommend Documents

MouseMine: a new data warehouse for MGI.
MouseMine (www.mousemine.org) is a new data warehouse for accessing mouse data from Mouse Genome Informatics (MGI). Based on the InterMine software framework, MouseMine supports powerful query, reporting, and analysis capabilities, the ability to sav

Protecting privacy in a clinical data warehouse.
Peking University has several prestigious teaching hospitals in China. To make secondary use of massive medical data for research purposes, construction of a clinical data warehouse is imperative in Peking University. However, a big concern for clini

Characteristics desired in clinical data warehouse for biomedical research.
Due to the unique characteristics of clinical data, clinical data warehouses (CDWs) have not been successful so far. Specifically, the use of CDWs for biomedical research has been relatively unsuccessful thus far. The characteristics necessary for th

Developing a standardized healthcare cost data warehouse.
Research addressing value in healthcare requires a measure of cost. While there are many sources and types of cost data, each has strengths and weaknesses. Many researchers appear to create study-specific cost datasets, but the explanations of their

Insights in Public Health: For the Love of Data! The Hawai'i Health Data Warehouse.
Data form the framework around which important public health decisions are made. Public health data are essential for surveillance and evaluating change. In Hawai'i, public health data come from a multitude of sources and agencies. The Hawai'i Health

HDVDB: a data warehouse for hepatitis delta virus.
Hepatitis Delta Virus (HDV) is an RNA virus and causes delta hepatitis in humans. Although a lot of data is available for HDV, but retrieval of information is a complicated task. Current web database 'HDVDB' provides a comprehensive web-resource for

Validating emergency department vital signs using a data quality engine for data warehouse.
Vital signs in our emergency department information system were entered into free-text fields for heart rate, respiratory rate, blood pressure, temperature and oxygen saturation.

Half-century archives of occupational medical data on French nuclear workers: a dusty warehouse or gold mine for epidemiological research?
This article discusses the availability and completeness of medical data on workers from the AREVA NC Pierrelatte nuclear plant and their possible use in epidemiological research on cardiovascular and metabolic disorders related to internal exposure

Normalization of Phenotypic Data from a Clinical Data Warehouse: Case Study of Heterogeneous Blood Type Data with Surprising Results.
Clinical data warehouses often contain analogous data from disparate sources, resulting in heterogeneous formats and semantics. We have developed an approach that attempts to represent such phenotypic data in its most atomic form to facilitate aggreg

Roadmap to a Comprehensive Clinical Data Warehouse for Precision Medicine Applications in Oncology.
Leading institutions throughout the country have established Precision Medicine programs to support personalized treatment of patients. A cornerstone for these programs is the establishment of enterprise-wide Clinical Data Warehouses. Working shoulde