Environmental Science Processes & Impacts View Article Online
Published on 02 July 2015. Downloaded by University of Cambridge on 01/11/2015 07:44:13.
PAPER
Cite this: Environ. Sci.: Processes Impacts, 2015, 17, 1482
View Journal | View Issue
Data quality through a web-based QA/QC system: implementation for atmospheric mercury data from the global mercury observation system Francesco D'Amore,*a Mariantonia Bencardino,a Sergio Cinnirella,a Francesca Sprovieria and Nicola Pirroneb The overall goal of the on-going Global Mercury Observation System (GMOS) project is to develop a coordinated global monitoring network for mercury, including ground-based, high altitude and sea level stations. In order to ensure data reliability and comparability, a significant effort has been made to implement a centralized system, which is designed to quality assure and quality control atmospheric mercury datasets. This system, GMOS-Data Quality Management (G-DQM), uses a web-based approach with real-time adaptive monitoring procedures aimed at preventing the production of poor-quality data. G-DQM is plugged on a cyberinfrastructure and deployed as a service. Atmospheric mercury datasets,
Received 30th April 2015 Accepted 1st July 2015
produced during the first-three years of the GMOS project, are used as the input to demonstrate the application of the G-DQM and how it identifies a number of key issues concerning data quality. The
DOI: 10.1039/c5em00205b
major issues influencing data quality are presented and discussed for the GMOS stations under study.
rsc.li/process-impacts
Atmospheric mercury data collected at the Longobucco (Italy) station is used as a detailed case study.
Environmental impact Mercury is a persistent pollutant that exists naturally in the environment. However, levels have risen because of human activity and pollution. The UNEP Global Mercury Partnership has the goal of protecting human health and the global environment from the release of mercury and its compounds. In this framework, the Global Mercury Observation System (GMOS) was developed. In order to assure data quality of datasets produced within the GMOS project, a common QA/QC process has been implemented as a centralized system able to ensure, control and report on the quality of mercury data from the GMOS monitoring stations. The system, called GMOS-Data Quality Management (G-DQM), is based on a QA/QC methodology which automatically processes data retrieved from Tekran instruments.
1
Introduction
Mercury is a persistent pollutant that exists naturally in the environment. However, its levels have risen because of human activities and pollution. Currently, the UNEP Global Mercury Partnership has the goal of protecting human health and the global environment from the release of mercury and its compounds by minimizing and, where feasible, ultimately eliminating global and anthropogenic mercury releases to air, water and land. In this framework, among other large atmospheric mercury monitoring networks, the Global Mercury Observation System (GMOS) was implemented. It is a European funded network even if it has a global perspective including stations widespread in different countries. The network was developed by integrating previously established ground-based atmospheric mercury monitoring stations, such as EMEP and AMAP sites (W¨ angberg et al.;1 Tørseth et al.;2 AMAP3), with new
a
CNR-Institute of Atmospheric Pollution Research, Division of Rende, Italy
b
CNR-Institute of Atmospheric Pollution Research, Montelibretti, Rome, Italy
1482 | Environ. Sci.: Processes Impacts, 2015, 17, 1482–1491
stations, which are widespread in the northern and southern hemisphere, and located both at high altitude and sea level locations, as well as in climatically diverse regions (Sprovieri et al.4). Within the network, special attention was paid to the harmonization of measurements in order to ensure full comparability between data from all the monitoring sites. To achieve this, Standard Operating Procedures (SOPs) were developed during the planning and implementation stage of the GMOS network (Munthe et al.5). This was done in accordance with best practice on measurements adopted in well-established regional monitoring networks, and based on the most recent literature (Brown et al.;6 Steffen et al.;7 Gay et al.8). The GMOS network produces data coming in near real-time from a large number of sources. Strict QA/QC procedures are required to avoid the production of poor-quality data and to ensure the correct implementation of the SOPs. Furthermore, a common QA/QC approach for each raw dataset reduces the time between raw data and nal data production and publication (Campbell et al.9).
This journal is © The Royal Society of Chemistry 2015
View Article Online
Published on 02 July 2015. Downloaded by University of Cambridge on 01/11/2015 07:44:13.
Paper
Environmental Science: Processes & Impacts
To this end, a centralized system which is able to ensure, control and report on the quality of mercury data from the GMOS monitoring stations was designed. The system, called GMOS-Data Quality Management (G-DQM), is based on a QA/ QC methodology which automatically processes data retrieved from Tekran instruments (Landis et al.;10 Steffen et al.11), from different data providers. It forms a part of a dedicated cyberinfrastructure (GMOS-CI) that oversees data acquisition and data sharing among major stakeholders, policy-makers and the public, using an interoperable approach (Cinnirella et al.12). G-DQM uses a service approach to facilitate adaptive network monitoring, which supports both routine and alert notications to ensure proper instrument maintenance. It is deployed as a web based application and all QA/QC processes are made available using a common web browser. In order to test this system, an initial evaluation of data quality has been performed on three years of atmospheric mercury data produced within the GMOS network. This work provides a synthetic description of the features of the G-DQM system, and the analysis of the results obtained from its use. The detailed application of the system is demonstrated using the atmospheric mercury dataset collected at the Longobucco (Italy) station.
Table 1
2 Data quality Advances in cyberinfrastructure and sensor networks now provide enormous quantities of data, even in near real-time. Dedicated Information Technology (IT) frameworks make it possible to deliver ever larger datasets to the end user. In the coming years, improvements in sensor network technologies will provide researchers with more robust frameworks for data collection and management. Sensor network technologies enter many elds of modern life as they offer the opportunity to observe a wealth of environmental variables. This type of device can be used to create an Internet of Things (IoT), in which sensors, but also actuators, blend perfectly with the environment around us (Ashton13). Therefore, the problem is no longer how much data we have, but what kind of data we have, and above all its quality. Sensor networks are still subject to inevitable faults that may cause loss of data or poor quality and it is imperative to have a system in place to minimise data loss and alert operators to non-standard sensor performance. A rst approach to obtain good quality data from raw datasets may include a post-processing performed individually and oen manually by each station manager. This approach is unsuitable when data are coming in
Flagging criteria for general parameters and all readings
Flag code
Description
Flagging criteria
Data agged
IB0 WB1 WB2 WB3 IB5 IDL WM2 IMX WOL WV5 IV7 WTG
Baseline voltage too low Baseline voltage low or high Baseline voltage change Baseline deviation high Baseline deviation too high Below detection limit Multiple peaks detected Multiple peaks detected Overload Questionable sample volume Questionable sample volume Time gap
Baseline voltage < 0.01 V 0.01 V < baseline voltage < 0.05 V or baseline voltage > 0.25 V |Baseline voltagei baseline voltagei1| > 0.02 V Baseline deviation > 0.10 V for 5 consecutive readings Baseline deviation > 0.20 V Hg concentration < 0.1 ng m3 Status ¼ M2 (multiple peaks) Status > M2 (multiple peaks) Status ¼ OL (overload) 5% < |(volumemeas volumeexp)/volumeexp| # 7% ALL concentration GEM concentration
ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL
Table 2
Flagging criteria for GEM/TGM readings. A and B refer to gold cartridges used in Tekran
Flag code
Description
Flagging criteria
Data agged
WEH WEL
Hg concentration high Hg concentration low
GEM GEM
WE5
Same cartridge difference > 50%
WK1
A/B cartridge difference within 5– 10% A/B cartridge difference > 10% No peak Non-representative GEM values aer calibration
GEM concentration > 4.0 ng m3 GEM concentration lower than a value varies according to site specic conditions (0.2–1 ng m3) |(GEMi GEMi1)/GEMi| > 0.5 for the same cartridge 5% < |(A B)/average (A, B)| # 10% |(A B)/(average)(A, B)| > 10% Status ¼ NP (no peaks) Following calibration cycles the rst GEM value from each cartridge is not considered representative Following the desorption cycle the rst GEM value from each cartridge is not considered representative
GEM GEM GEM
WK2 INP IC0
ID0
Non-representative GEM values aer desorption
This journal is © The Royal Society of Chemistry 2015
GEM GEM
GEM
Environ. Sci.: Processes Impacts, 2015, 17, 1482–1491 | 1483
View Article Online
Environmental Science: Processes & Impacts
Published on 02 July 2015. Downloaded by University of Cambridge on 01/11/2015 07:44:13.
Table 3
Paper
Flagging criteria for the desorption cycle. Capital letters are defined in Table 5
Flag code
Description
Flagging criteria
Data agged
WP0 WG0 IP1 IG1 IP2 IG2 IL1 IID WS0 IS1 *B* *E*
No PBM No GOM PBM desorption arguable GOM desorption arguable PBM negative value GOM negative value Load cycle Incomplete desorption Speciation blanks (C) Speciation blanks (C) Beginning of desorption End of desorption
E+F+G¼0 H+I+J¼0 E < 0.70(E + F + G) or F > 0.20(E + F + G) or G > 0.10(E + F + G) H < 0.70(H + I + J) or I > 0.20(H + I + J) or J < 0.10(H + I + J) E + F + G < 3C H + I + J < 3C Load cycle < 1 or 2 or 3 h 0 GEM cycles < 12 or 24 or 36 before desorption Desorption cycle is incomplete < 12 step 1.67 pg m3 < cycle (C) # 10 pg m3 Cycle (C) > 10 pg m3 Beginning of each single desorption cycle End of each single desorption cycle
DES DES DES DES DES DES DES DES DES DES DES DES
near-real time from sensor networks. To deal with this scenario, QA/QC algorithms should run within an IT platform (i.e. a cyberinfrastructure) so that process optimization and data handling become more efficient. Individual data quality control does not ensure comparability: different stations spread around the world, within the same framework, require a homogeneous approach in order to dene a common data quality standard and data lineage. 2.1
Quality assurance and quality control (QA/QC)
The G-DQM system presented in this paper is related to both Quality Assurance (QA) and Quality Control (QC) on datasets produced within the GMOS network. QA and QC are oen presented together even if they are two quite different concepts: QA is related to the process regarding data collection, while QC is applied to the nal product of monitoring. QC is supervised by site operators, who are in charge of clarifying suspicious measurements as well as identifying anomalies and conrming data rejection within their own datasets (Campbell et al.9). In this regard, the system proposed is both process and product oriented. It enables site operators to monitor instrument performance and to promptly take corrective action when problems arise. G-DQM is able to verify if the monitoring process adheres to standard procedures in a way that minimizes losses and inaccuracies in data production.
Table 5
Scheme of the desorption cycle by the Tekran 1130/1135
Tekran event ag
Measurement type
Label
1 1 1 2 2 2 2 3 3 3 1 1
Zero air Zero air Zero air Pyrolysis air PBM PBM PBM GOM GOM GOM Zero air Zero air
A B C D E F G H I J K L
Even though it is necessary to have a level of human intervention and inspection in QA/QC, the use of automated common checks represents an improvement because it ensures consistency and reduces human bias thus avoiding misinterpretation and inappropriate data use. Through G-DQM we were able to automate the QA process making it available on the web via a user-friendly QC step that supports the expert supervision. Details and denitions of each component of the system are presented below.
Table 4 Flagging criteria for the calibration cycle. RFA and RFB refer to response factors over the two A and B gold cartridges used in Tekran
Flag code
Description
Flagging criteria
Data agged
WF1 IF2 WR1 IR2 WD1 WD2 WC1 IC2 WZ1 IZ2 IIC
Calibration interval Calibration interval Detector sensitivity Detector sensitivity Calibration change Calibration change Calibration trap bias Calibration trap bias Calibration blanks Calibration blanks Incomplete calibration
25 h < time between calibrations # 96 h Time between calibrations > 96 h 4 106 units # RespFact < 6 106 units or RespFact > 12 106 units RespFact < 4 106 units 5% < |(calibrationi calibrationi1)/calibrationi| # 10% |(Calibrationi calibrationi1)/calibrationi| > 10% 0.05 < |(RFA RFB)/average (RFA, RFB)| # 10% |(RFA RFB)/average (RFA, RFB)| > 10% Zero > 1500 peak area units Zero > 1% SPAN Calibration cycle incomplete
CAL CAL CAL CAL CAL CAL CAL CAL CAL CAL CAL
1484 | Environ. Sci.: Processes Impacts, 2015, 17, 1482–1491
This journal is © The Royal Society of Chemistry 2015
View Article Online
Paper
Published on 02 July 2015. Downloaded by University of Cambridge on 01/11/2015 07:44:13.
2.2
Flagging datasets
As described later in Section 4.1, data to be processed for quality are stored in tables managed in a database. G-DQM controls data by checking each observation (i.e. each row of the table) and returns specic information on data quality. For this purpose, a set of validation ags are used, which are derived either from instrument manufacturer recommendations or from the GMOS SOPs. Given an input dataset to the G-DQM, the output is the same dataset where each row is agged with a tag that identies the measurement as a valid, warning (suspicious) or invalid observation. Each ag refers to specic agging criteria. The evaluation process consists of comparing: (1) Warning limits, established to draw attention to data for possible corrective action; (2) Control limits, which invalidate data when exceeded. Each agging criterion triggers the corresponding ags using thresholds. G-DQM checks if rows, or a set of rows, within a dataset comply with these thresholds in order to tag the corresponding observations with ags that indicate valid/warning/ invalid data. Two existing suites of soware aimed to ensure data quality were taken as references in dening thresholds: the Research Data Management Quality (RDMQ) and the AMNet Quality Control (AMQC) programs, individually developed by Environment Canada and by the National Atmospheric Deposition Network (NADP), respectively (Steffen et al.7). Flagging criteria used in both tools were analysed and compared before their inclusion into the G-DQM system. Each control parameter has been set to meet the specic needs of the GMOS community because it is important to take into account different site specic conditions (i.e. polar sites and high-altitude locations). The ags and related agging criteria being used in G-DQM are summarized in Tables 1–4. The variables used in Table 3 are described in Table 5 where the desorption cycle is reported.
3 GMOS project The worldwide scope of the GMOS project provides valuable data for a deeper understanding of atmospheric mercury on a global scale. With respect to data collection and management, this global structure poses a challenge to mercury scientists because traditional approaches to QA/QC are not so easily applicable due to the size of datasets coming from different monitoring stations across the globe, and also because data arrive in near real-time. Moreover, comparability of atmospheric mercury measurements at the global level is imperative for the GMOS infrastructure in order to ensure data that are useful for both the scientic and policy communities. To specically meet these requirements, a centralized G-DQM system was developed and employed. 3.1
Measurement of mercury
Mercury in air is measured as three operationally dened forms: (1) Gaseous Elemental Mercury (GEM) or Total Gaseous Mercury (TGM);
This journal is © The Royal Society of Chemistry 2015
Environmental Science: Processes & Impacts
(2) Gaseous Oxidized Mercury (GOM); (3) Particle-bound mercury less than 2.5 mm (PBM). Gaseous Elemental Mercury (GEM) is the dominant form of atmospheric mercury (Lindberg and Stratton14). It can be oxidized in the atmosphere to form reactive and water-soluble Hg(II) compounds. This oxidised Hg is dened as all forms of mercury sampled using a KCl-coated denuder (GOM) (Landis et al.10), and/or Particle-Bound Mercury (PBM) (Lin and Pehkonen15), both are deposited to ecosystems through wet and dry processes (Amos et al.16). TGM, measured when speciation is not possible, is the sum of GEM and GOM (Lindqvist and Rodhe17). GMOS network sites measure the concentrations of atmospheric mercury fractions using an automated and continuous mercury speciation system: the Tekran Mercury Vapour Analyser Model 2537 coupled with the speciation models 1130 for GOM and 1135 for PBM. This equipment meets the GMOS requirements and is commonly available. Tekran utilizes two gold cartridges (A and B) in parallel to allow for continuous measurements with alternating operation modes (sampling versus desorbing/analysing stage) on a predened time base (e.g., 10 min) (Tekran18). Measurements are obtained through a multi-step procedure as described elsewhere (Lindberg et al.19) using an impactor inlet (2.5 mm cut-off aerodynamic diameter at 10 L min1), a KCl-coated quartz annular denuder in the 1130 unit, and a quartz regenerable particulate lter (RPF) in the 1135 unit. The operation and principles of the Tekran instrument are described in the study by Landis et al.10 The main operational phases are: (1) GEM or TGM measurements (GEM/TGM); (2) Desorption cycle (DES) (see Table 5); (3) Calibration cycle (CAL). During the DES phase it is possible to perform speciation measurements and determine both GOM and PBM concentrations. During the CAL cycle the Tekran 2537 CVAFS mercury analyzers are automatically calibrated using internal permeation sources that emit vapor mercury at a constant rate to ensure acceptable Response Factors (RF) over each cartridge (RFA, RFB) (Tekran18). Where it is not possible to perform speciation, only the 2537 module of Tekran is used in order to perform TGM measurements and the CAL cycle.
3.2
GMOS network
Within the GMOS network, stations are classied as master, if they provide mercury speciation measurements (GEM, GOM and PBM), and secondary, when they provide only TGM concentrations. The on-going GMOS network consists of 28 monitoring stations, whose institutions are internal GMOS partners, and 11 monitoring stations managed by external partners. Almost all internal GMOS stations provide near realtime raw data that are archived and managed by the GMOS-CI. In order to test its compliance with the adopted GMOS SOPs, the G-DQM system was tested on 16 different raw datasets: 11 from secondary stations and 5 from master sites. Names,
Environ. Sci.: Processes Impacts, 2015, 17, 1482–1491 | 1485
View Article Online
Published on 02 July 2015. Downloaded by University of Cambridge on 01/11/2015 07:44:13.
Environmental Science: Processes & Impacts
Paper
Fig. 1 Coverage and consistency, on a monthly basis, of TGM data collected at some of the on-going GMOS secondary stations, over the period 2011–2013.
Coverage and consistency, on a monthly basis, of GEM/GOM/PGM data collected at some of the on-going GMOS master stations, over the period 2011–2013.
Fig. 2
locations, reference institutes, as well as data coverage for 2011/ 2013 are shown in Fig. 1 and 2, respectively.
4 QA/QC and cyberinfrastructure From the user's point of view, G-DQM is a web-based application developed by using a Soware as a Service (SaaS) approach: the soware is developed as a product, but deployed as a service through a web browser. Data to be processed appear to users as managed within a computer cloud, and operators can access them aer a login phase. By using this web application, users are able to follow the whole QA/QC process. From an IT point of view, G-DQM is part of the GMOSCyberInfrastructure (GMOS-CI) cited above, which is a research environment that supports advanced data acquisition, storage, management, integration, mining and visualization, built on an IT infrastructure (DAmore et al.20).
1486 | Environ. Sci.: Processes Impacts, 2015, 17, 1482–1491
The core of the GMOS-CI is a Spatial Data Infrastructure (SDI) that integrates modules providing a set of services and features using open source components widely used by the geographic information community (de la Beaujardiere21). Services and processes of the GMOS-CI were designed to provide geographic services, necessary for the integration of datasets into federated systems such as GEOSS (GEO22). The GMOS-CI also ensures that data can be shared with major stakeholders, policy makers and the public. The G-DQM plugs into this cyberinfrastructure: it runs over datasets acquired by the GMOS-CI and makes use of some of the GMOS-CI's features, such as security and user management. The integration of the QA/QC component into the GMOS-CI allows us to deal with issues related to data and process integration, as well as the analysis of large datasets. To this end, managing large amounts of data, coming even in near real-time, is not straightforward if it is done by individual researchers using their personal computers: the G-DQM
This journal is © The Royal Society of Chemistry 2015
View Article Online
Published on 02 July 2015. Downloaded by University of Cambridge on 01/11/2015 07:44:13.
Paper
Environmental Science: Processes & Impacts
workow, described later in Section 4.2, is potentially able to work without an ICT platform but in this way it could not scale, if necessary. Let us consider that environmental data are increasing day by day coming from mobile applications or smart sensor networks. Deploying the soware as part of a cyberinfrastructure permits us to deal with this scenario in a more feasible way. Moreover, the QA/QC process can be scheduled by the cyberinfrastructure in order to automatically process new data coming from sensors, mainly for real-time acquisition. Furthermore, automatic outcomes, such as warnings or alarms, would notify operators about instrumental malfunctions thus preventing poor data quality and data loss (Campbell et al.9). All datasets collected by GMOS partners should respect the same SOPs and the same data lineage. It is possible to comply with both requirements by providing QA/QC processes as a common facility through the GMOS-CI. 4.1
Data acquisition
G-DQM is a service that starts working aer data are stored in the GMOS databases. The data integration process is held by a soware agent plugged on the GMOS-CI. This component acquires data coming from stations managed by the GMOS partners: it reads data shared by each partner using File Transfer Protocol (FTP), even though many other different protocols are supported. For stations where any type of automatic data connection is not available, it is possible to upload information manually on the GMOS web portal. In both cases, GMOS-CI stores data in tables managed by a Data Base Management System (DBMS). The data stored represent the standard output of Tekran, along with information regarding geographical locations, names of data sources and quality elds. When data are stored for the rst time, quality elds are empty. Periodically, G-DQM checks for new information available on GMOS databases. If new data are found, the quality process starts in order to tag all the new observations. At the end of this process, the quality elds contain ags about quality concerns. The ags used are those cited in Section 2.2. 4.2
G-DQM workow: main features and components
As described in Section 4.1, G-DQM essentially takes Tekran raw data as the input while the output is a agged dataset with information on data validation. Aer data acquisition, datasets are processed for quality assurance using the workow reported in Fig. 3. In step one (1), G-DQM runs an automated process that lters the raw data stored in GMOS databases. The system compares the dataset against 43 potential ags corresponding to 43 criteria that specically refer to the three operation phases cited above: GEM/TGM, DES, and CAL (see Tables 1–4). The ags are grouped in three sets: valid, warning and invalid. Thus, each ag refers to a specic condition, or criterion, that the system checks in order to screen the Tekran data. Each raw observation is agged depending on the result of each corresponding criterion and returns, as a temporary output, a agged dataset.
This journal is © The Royal Society of Chemistry 2015
Fig. 3
G-DQM workflow with the main five-step process on which it is
based.
The second step (2) consists of instrument reports compiled by site operators during their visits to stations. Field notes, anomalies, routine controls and part changes are reported in the station e-logbook, which is provided as a different service by means of a web application integrated into the GMOS-CI. GMOS SOPs are fully integrated into the e-logbook, which also serves as a reminder for routine maintenance. The third step (3) requires the site operator's approval of the intermediate agged dataset. Site operators are allowed to clarify data records prior to their full approval. At the end of the above processes the system outputs are fully QAed/QCed. A further process (4) computes GOM and PBM concentrations for those sites that are performing speciation. Aer step (4), measurements tagged as invalid are tossed and only the valid data will be considered available for dissemination purposes. Step (5) thus stores the nal valid datasets that will be accessible from the GMOS web portal (http://www.gmos.eu) for dissemination purposes. As an additional service, the G-DQM system is able to provide an alerting system (0) by which it is possible to visualize the near real-time Tekran output parameters. This helps site operators to identify any questionable events and take quick corrective actions in order to prevent the production of poor-quality data.
5 Case study: Longobucco (Italy) dataset In this section, a case study is shown using data from the Longobucco monitoring station. Longobucco is a GMOS master site whose atmospheric mercury speciation measurements covered the period Oct 2012–Oct 2013. Running the G-DQM we
Environ. Sci.: Processes Impacts, 2015, 17, 1482–1491 | 1487
View Article Online
Published on 02 July 2015. Downloaded by University of Cambridge on 01/11/2015 07:44:13.
Environmental Science: Processes & Impacts
Paper
Time-series highlighting the final (valid or invalid) Longobucco GEM dataset after all the G-DQM step processes.
Fig. 5
Time-series in which data points change colour according to the main flag assigned by the first automatic G-DQM step process. They refer to: (a) GEM concentrations; (b) PBM concentrations and (c) response factor values, recorded at the Longobucco station in 2013. Fig. 4
obtained the initial agged datasets for the main three operation phases (GEM, DES and CAL). At this stage, the resulting ags are only outcomes of the automated QA scripts and they can be used in time-series plots to highlight quality-related issues. In Fig. 4 the time-series refers to data recorded during 2013 at Longobucco. Colour coding is used to distinguish each specic quality-related issue: points change colour according to the main ag assigned by the automatic G-DQM process. In Fig. 4(a) (GEM data), dots in red and yellow refer to a problem with the sampled volume, indicated by the ags IV7 and WV5, respectively. The ag IV7 is used when the measured volume differs by over 7% from the expected value, while the WV5 ag indicates that the volume is between 5 and 7% of the expected value. Dots in blue in Fig. 4(a) refer to the warning ag WK2 that, as it can be seen, oen occurs in the analysed dataset. The WK2 ag indicates that the concentrations measured over A and B gold cartridges in the Tekran instrument is diverging more than 10% (see Table 2). In Fig. 4(b), PBM data from the desorption cycle are reported. In addition to the volume-related
1488 | Environ. Sci.: Processes Impacts, 2015, 17, 1482–1491
problem (IV7-dots in red) already encountered with the GEM data, for mercury speciation the automated quality screening highlights an issue associated with high zero values during the desorption cycle, tagged with the WS0 ag (dots in pink). Similarly, during the calibration cycle, shown in Fig. 4(c), there are issues notied by ags WZ0 and IZ1 (dots in brown and red, respectively). Both ags are related to a high calibration blank: WZ0 is a warning, while IZ1 invalidates the whole calibration cycle. It is important to consider that ags related to calibration cycles affect all related data. Fully automated checks have limitations: there is a risk that a real and potentially important phenomenon could be ignored, i.e. when a real but extreme value is censored for falling outside an expected range. To ensure that this does not happen, the G-DQM system includes a mandatory nal step, requiring that all data, and especially those agged as suspicious, be carefully reviewed by the responsible scientist/site operator of each station. In the case of the Longobucco dataset, the station experts reevaluated all GEM data tagged with the WK2 ag as valid. The reason being that the higher A/B cartridge divergence did not occur continuously over time, thus conrming that there was not an issue related to cartridge passivation. Moreover, they identied a strong increase in PBM concentrations, tagged as suspicious (WS0) by the G-DQM automatic screening, as the consequence of a concurrent Saharan dust storm occurrence. Control analysis was also carried out by site operators for data related to calibration cycles. The nal valid dataset for Longobucco will thus consist of data highlighted in green in Fig. 5, which is the result of the combination of the valid calibration cycles and the manual review performed by the site operators.
6 Data quality evaluation of the ongoing GMOS stations As with the Longobucco dataset, using G-DQM it was possible to screen each GMOS monitoring dataset reported in Fig. 1 and 2. The results for data quality, in terms of ag incidence, are
This journal is © The Royal Society of Chemistry 2015
View Article Online
Paper
Environmental Science: Processes & Impacts
presented below: the three Tekran operational phases are considered separately, even if as expected, at the end of the validation process the calibration results will affect both GEM/TGM and DES data. For the main Tekran parameters, box-and-whisker plots are later reported highlighting the compliance of the ongoing GMOS Tekran measurements with the adopted SOPs.
Published on 02 July 2015. Downloaded by University of Cambridge on 01/11/2015 07:44:13.
6.1
Issues affecting GEM/TGM measurements
For each of the 16 GMOS stations studied, a summary regarding the percentage of both warning and invalidating ags affecting GEM/TGM datasets is shown in Table 6. The results reveal that the larger part of the datasets are affected by various qualityrelated problems, labelled by different numbers of ags. For these datasets a lower percentage completely meets the criteria for GEM/TGM. It is possible to notice that much of the data is tagged with the WK1 and WK2 ags that both refer to the A/B cartridge divergence. The GMOS SOPs specically recommend to calculate the Absolute Percent Difference (APD) between the A and B cartridges and then check that it falls in the range 5–10% (WK1) or if it is higher than 10% (WK2, see Table 2). If data result to be continuously agged with WK2 over one day, values underestimated are tossed. The APD distribution recorded at each of the examined 16 GMOS stations was observed over the period 2011–2013 and is shown using a box and whisker plot (Fig. 6). From this gure, the level of compliance with the current version of the GMOS SOPs can be easily seen: there are 6 stations with more than 50% of their datasets over the higher cut-off value, and numerous stations that fall within the warning range. Only a very small percentage of each dataset is compliant with GMOS recommendations. 6.2
Issues affecting desorption cycles
Table 7 shows warning and invalid ags for the GMOS master sites under study. The most commonly observed ag was WS0
Table 6
Distribution of Absolute Percent Difference (APD) within each examined GMOS dataset. Each box includes the median (midlines), 25th and 75th percentile (box edges), and 5th and 95th percentile (whiskers).
Fig. 6
which refers to high values for the third step of the desorption cycle (label C in Table 5) also considered as a speciation blank measurement. In this regard, a specic warning range has been introduced: the range 1.67–10 pg m3 corresponds to the warning tagged WS0. Values of speciation blank (C step) higher than 10 pg m3 leads, instead, to an invalidation ag (IS1) that determines the invalidation of the whole DES cycle. The distribution of values observed at the GMOS master stations is shown as a box and whisker plot in Fig. 7, where the
Table 7 Incidence of flags for the DES cycle within each examined GMOS dataset. Master station codes are reported in the first column
M1 M2 M3 M4 M5
IS1
IP1/IG1
IP2/IG2
WS0
OK
5.5% 0.0% 0.5% 2.0% 6.1%
16.5% 0.2% 12.5% 0.5% 29.0%
11.8% 0.3% 9.5% 0.5% 1.3%
49.4% 9.8% 33.5% 13.7% 22.3%
16.9% 89.7% 44.0% 83.4% 41.2%
Incidence of flags for GEM/TGM data within each examined GMOS dataset. Secondary and master station codes are reported in the first
column
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 M1 M2 M3 M4 M5
INP
IDL
IV7
IB5
IB0
IMX
WV5
WB3
WB2
WB1
WK1
WK2
WE5
WM2
WTG
OK
0.1% 0.5% 3.3% 0.0% 0.0% 2.0% 0.0% 3.8% 0.0% 0.0% 2.7% 3.8% 0.1% 8.0% 9.4% 0.1%
0.0% 0.0% 0.0% 0.0% 0.0% 0.1% 0.0% 0.1% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
0.0% 0.0% 3.0% 0.0% 2.7% 0.0% 0.0% 0.1% 0.0% 0.0% 0.0% 0.0% 10.0% 18.6% 8.3% 21.2%
0.2% 2.8% 3.4% 0.0% 0.0% 0.7% 0.0% 0.1% 0.8% 0.5% 0.0% 0.0% 1.7% 0.1% 19.2% 1.2%
0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 3.3% 0.2% 2.7% 0.1% 0.0%
0.0% 0.2% 0.1% 0.0% 0.0% 0.2% 0.0% 0.0% 1.1% 1.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.1%
0.0% 0.0% 0.0% 0.0% 2.4% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 16.7% 0.2% 11.4% 0.6% 0.0%
0.1% 0.5% 4.6% 0.0% 0.0% 1.9% 0.0% 0.0% 0.0% 3.5% 0.1% 0.1% 0.0% 0.0% 0.0% 7.8%
0.1% 1.7% 0.0% 0.0% 0.0% 0.1% 0.0% 0.5% 9.7% 0.0% 0.0% 12.2% 0.0% 1.6% 39.6% 0.0%
20.0% 0.1% 24.4% 1.2% 10.2% 2.2% 0.2% 9.0% 61.0% 0.2% 21.4% 8.6% 25.0% 9.5% 11.1% 10.2%
22.2% 28.4% 21.5% 17.9% 20.2% 20.0% 7.8% 26.9% 16.4% 19.5% 15.2% 33.8% 19.1% 6.4% 8.1% 11.0%
8.4% 11.8% 8.2% 48.9% 23.1% 41.3% 1.5% 11.4% 10.7% 37.9% 5.0% 1.0% 0.6% 0.2% 0.7% 24.6%
0.1% 0.4% 0.0% 0.0% 0.0% 0.4% 0.0% 0.5% 0.0% 5.6% 0.0% 1.7% 0.1% 3.3% 0.0% 0.0%
0.0% 0.1% 0.3% 0.0% 0.0% 1.0% 0.0% 0.0% 0.0% 1.2% 0.1% 0.1% 0.0% 0.5% 0.1% 0.3%
0.1% 0.0% 0.4% 0.2% 0.0% 0.3% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% 0.1% 0.0% 0.0%
48.8% 53.4% 31.0% 31.7% 41.3% 29.9% 90.5% 47.4% 0.4% 30.5% 55.3% 18.7% 42.9% 37.4% 2.7% 23.5%
This journal is © The Royal Society of Chemistry 2015
Environ. Sci.: Processes Impacts, 2015, 17, 1482–1491 | 1489
View Article Online
Published on 02 July 2015. Downloaded by University of Cambridge on 01/11/2015 07:44:13.
Environmental Science: Processes & Impacts
Paper
Fig. 7 Distribution of speciation blank values (pg m3) within each examined GMOS dataset. Each box includes the median (midlines), 25th and 75th percentile (box edges), and 5th and 95th percentile (whiskers).
compliance (and non-compliance) with the control limits can clearly be seen. Only one site shows more than half of its dataset in the invalid range. The other four sites are mostly compliant with this specic QA criterion. 6.3
Issues affecting calibration cycles
For the calibration cycle the percentage of ags was also calculated (Table 8). The results reveal that this particular operational phase of the Tekran instrument was oen affected by an issue related to the Response Factor (RespFact). For this parameter, the G-DQM system produces a warning ag if the RespFact falls in the range of 4 106–6 106 units, or if it exceeds 12 106 (WR1). If RespFact is lower than 4 106 units, the system returns an invalid ag (IR2). In Fig. 8, the distribution of the RespFact values recorded during calibration cycles is shown as a box-and-whisker plot. It can be seen that for three stations nearly half the data proved to be invalid. Another nine stations had at least half of their RespFact values within the warning range.
Fig. 8 Distribution of response factor values (units) within each
examined GMOS dataset. Each box includes the median (midlines), 25th and 75th percentile (box edges), and 5th and 95th percentile (whiskers).
7
Conclusions and future directions
The monitoring network established within the Global Mercury Observation System (GMOS) project provides a valuable resource for a deeper understanding of atmospheric mercury concentration and distribution trends on a global scale. In the context of the UNEP Global Mercury Partnership, results of this on-going project are also expected to support the effective implementation of the Minamata Convention, which is aimed at reducing the harmful impacts of mercury on human and ecosystem health. Although current instruments measuring mercury levels in air may provide useful information to both the policy and scientic communities, they are susceptible to malfunctions that can result in lost or poor-quality data. Some level of instrument failure is inevitable; however, steps can be taken to minimize the risk of loss and to improve the overall quality of the data. The G-DQM system is a web-based tool aimed to control data quality that has been specically developed to ensure data comparability among atmospheric mercury
Table 8 Incidence of flags for the CAL cycle within each examined GMOS dataset. Secondary and master station codes are reported in the first
column
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 M1 M2 M3 M4 M5
IR2
IC2
IZ2
IF2
WR1
WC1
WD1
WD2
WZ1
WF1
OK
14.1% 4.3% 54.8% 0.0% 10.7% 22.3% 0.4% 9.8% 0.0% 2.8% 68.3% 20.9% 30.3% 3.3% 10.4% 3.5%
0.0% 5.4% 1.2% 0.0% 12.1% 18.0% 0.4% 2.6% 0.0% 10.3% 3.8% 20.6% 0.0% 5.7% 2.5% 5.0%
1.0% 2.3% 8.0% 1.5% 9.3% 0.9% 0.4% 0.8% 26.5% 0.0% 1.3% 0.4% 0.6% 0.5% 2.5% 6.3%
1.0% 0.5% 0.2% 0.0% 0.0% 0.9% 1.1% 0.9% 0.3% 0.8% 1.2% 0.8% 2.2% 0.6% 1.4% 0.4%
42.4% 28.8% 8.0% 25.3% 19.3% 18.6% 71.2% 44.7% 28.9% 23.8% 7.4% 22.2% 27.0% 39.9% 25.2% 40.1%
1.0% 2.9% 0.3% 3.0% 4.3% 8.8% 0.0% 0.0% 0.0% 12.7% 3.9% 5.4% 0.0% 4.3% 8.3% 5.2%
13.1% 4.8% 3.6% 15.6% 14.3% 9.1% 1.9% 7.2% 1.3% 10.7% 7.6% 3.8% 6.7% 4.2% 13.3% 7.4%
8.6% 14.2% 4.6% 7.4% 13.6% 16.5% 1.9% 7.4% 1.3% 6.7% 3.8% 2.3% 9.6% 3.6% 10.4% 5.8%
1.0% 2.3% 18.1% 7.4% 12.9% 0.9% 0.4% 14.3% 32.6% 1.6% 1.4% 3.3% 16.3% 2.3% 6.5% 11.0%
6.6% 0.5% 0.8% 5.2% 0.0% 0.9% 0.0% 2.5% 7.4% 2.0% 0.9% 0.1% 7.3% 0.2% 4.3% 6.1%
11.1% 34.0% 0.5% 34.6% 3.6% 3.0% 22.3% 9.8% 1.7% 28.6% 0.4% 20.2% 0.0% 35.4% 15.1% 9.1%
1490 | Environ. Sci.: Processes Impacts, 2015, 17, 1482–1491
This journal is © The Royal Society of Chemistry 2015
View Article Online
Published on 02 July 2015. Downloaded by University of Cambridge on 01/11/2015 07:44:13.
Paper
datasets collected within the GMOS network. Its application to three years of data allowed a very detailed analysis for each Tekran analyser used in the network. This centralized tool gave a fast and general overview of the analyser behaviour, and a rapid check of data quality. The ags adopted to tag values within datasets allowed us to understand issues occurring frequently and noticeably affecting data quality. The analysis performed here by means of the G-DQM on the GMOS network should be considered preliminary, since the site operator approval step is necessary to nalize the validation process through a human check. However, the results presented here provide an important rst assessment of the mercury data acquired with the on-going GMOS stations and give important feedback for future instrument management and maintenance guidelines that could be taken into account in further development of mercury-oriented monitoring networks. G-DQM has been specically designed to give rapid feedback on monitoring of atmospheric mercury based on the Tekran instrument, and is now being expanded to include the mercury analyser manufactured by Lumex, following ad-hoc SOPs. Further progress will also include an inter-comparison with existing systems aimed to quality assure and control mercury datasets. Apart from mercury, the amount of environmental data in general is expected to increase rapidly in the coming years, thus there is an increasing need for automated, platform-based methods to check and correct data to ensure that datasets provided to various end users are of highest quality.
Acknowledgements This work contributes to the EU-FP7 project Global Mercury Observation System (GMOS). We deeply thank the staff at the GMOS stations: for Bariloche (Argentina), M. Diguez and E. Garcia; for Calhau (Cape Verde), K. Read; for Cape Point (South Africa), M. Lynwill and E. G. Brunke; for Celestun and Sisal (Mexico), F. Sena; for Col Margherita (Italy), W. Cairns; for Ev-K2 (Nepal), I. Ammoscato; for Iskrba (Slovenia), J. Kotnik and M. Horvat; for Kodaikanal (India), R. Ramachandran; for La Seynesur Mer (France), J. Knoery; for Longobucco (Italy), F. Cofone and I. Ammoscato; for Manaus (Brazil), P. Artaxo and F. Morais; for Mt. Ailao, Mt. Changbai and Mt. Walinguan (China), X. Feng, X. Fu and H. Zhang; and for Station Nord (Greenland), C. Nordstroem and H. Skov.
References 1 I. W¨ angberg, J. Munthe, T. Berg, R. Ebinghaus, H. Kock, C. Temme, E. Bieber, T. Spain and A. Stolk, Atmos. Environ., 2007, 41, 2612–2619. 2 K. Tørseth, W. Aas, K. Breivik, A. Fjæraa, M. Fiebig, A. Hjellbrekke, C. L. Myhre, S. Solberg and K. Yttri, Atmos. Chem. Phys., 2012, 12, 5447–5481.
This journal is © The Royal Society of Chemistry 2015
Environmental Science: Processes & Impacts
3 AMAP, AMAP Assessment 2011: Mercury in the Arctic, Arctic Monitoring and Assessment Programme (AMAP), P.O. Box 8100 Dep, N-0032 Oslo, Norway, 2011. 4 F. Sprovieri, L. Gratz and N. Pirrone, E3S Web Conference, 2013. 5 J. Munthe, F. Sprovieri, M. Horvat and R. Ebinghaus, SOPs and QA/QC protocols regarding measurements of TGM, GEM, RGM, TPM and mercury in precipitation in cooperation with WP3, WP4 and WP5, GMOS deliverable 6.1, CNR-IIA, IVL, 2011. 6 R. Brown, N. Pirrone, C. van Hoek, M. Horvat, J. Kotnik, I. W¨ angberg, W. Corns, E. Bieber and F. Sprovieri, Accreditation and Quality Assurance: Journal for Quality, Comparability and Reliability in Chemical Measurement, 2010, vol. 15, pp. 359–366. 7 A. Steffen, T. Scherz, M. Olson, D. Gay and P. Blanchard, J. Environ. Monit., 2012, 14, 752–765. 8 D. Gay, D. Schmeltz, E. Prestbo, M. Olson, T. Sharac and R. Tordon, Atmos. Chem. Phys., 2013, 13, 10521–10546. 9 J. Campbell, L. Rustad, J. Porter, J. Taylor, E. Dereszynski, J. Shanley, C. Gries, D. Henshaw, M. Martin, W. Sheldon and E. Boose, BioScience, 2013, 63, 574–585. 10 M. Landis, R. Stevens, F. Schaedlich and E. Prestbo, Environ. Sci. Technol., 2002, 36, 3000–3009. 11 A. Steffen, T. Douglas, M. Amyot, P. Ariya, K. Aspmo, T. Berg, J. Bottenheim, S. Brooks, F. Cobbett, A. Dastoor, A. Dommergue, R. Ebinghaus, C. Ferrari, K. Gardfeldt, M. Goodsite, D. Lean, A. Poulain, C. Scherz, H. Skov, J. Sommar and C. Temme, Atmos. Chem. Phys., 2008, 8, 1445–1482. 12 S. Cinnirella, F. D'Amore, M. Bencardino, F. Sprovieri and N. Pirrone, Environ. Sci. Pollut. Res., 2014, 21, 4193–4208. 13 K. Ashton, RFID J., 2009, 22, 97–114. 14 S. Lindberg and W. Stratton, Environ. Sci. Technol., 1998, 32, 49–57. 15 C. Lin and S. Pehkonen, Atmos. Environ., 1999, 33, 2067– 2079. 16 H. Amos, D. Jacob, C. Holmes, J. Fisher, Q. Wang, E. C. RM Yantosca, E. Galarneau, A. Rutter, M. Gustin, A. Steffen, J. Schauer, J. Graydon, V. Louis, R. Talbot, E. Edgerton, Y. Zhang and E. Sunderland, Atmos. Chem. Phys., 2012, 12, 591–603. 17 O. Lindqvist and H. Rodhe, Tellus, 1985, 37, 136–159. 18 Tekran, Tekran, Model 2357A Principles of Operation, Tekran. 19 S. Lindberg, S. Brooks, C. Lin, K. Scott, M. Landis, R. Stevens and M. Goodsite, Environ. Sci. Technol., 2002, 36, 1245–1256. 20 F. DAmore, S. Cinnirella and N. Pirrone, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2012, 5, 1761–1771. 21 J. de la Beaujardiere, OpenGIS Web Map Service (WMS) Implementation Specication, OGC 06–042, Open Geospatial Consortium, 2010. 22 http://www.earthobservations.org/geoss.
Environ. Sci.: Processes Impacts, 2015, 17, 1482–1491 | 1491