An integrated logit model for contamination event detection in water distribution systems.

w a t e r r e s e a r c h 7 5 ( 2 0 1 5 ) 2 1 0 e2 2 3

Available online at www.sciencedirect.com

ScienceDirect journal homepage: www.elsevier.com/locate/watres

An integrated logit model for contamination event detection in water distribution systems Mashor Housh a, Avi Ostfeld b,* a b

Department of Natural Resources and Environmental Management, University of Haifa, 3498838, Israel Faculty of Civil and Environmental Engineering, Technion e Israel Institute of Technology, Haifa 32000, Israel

article info

abstract

Article history:

The problem of contamination event detection in water distribution systems has become

Received 28 September 2014

one of the most challenging research topics in water distribution systems analysis. Current

Received in revised form

attempts for event detection utilize a variety of approaches including statistical, heuristics,

18 January 2015

machine learning, and optimization methods. Several existing event detection systems

Accepted 6 February 2015

share a common feature in which alarms are obtained separately for each of the water

Available online 28 February 2015

quality indicators. Unifying those single alarms from different indicators is usually performed by means of simple heuristics. A salient feature of the current developed approach

Keywords:

is using a statistically oriented model for discrete choice prediction which is estimated

Water distribution systems

using the maximum likelihood method for integrating the single alarms. The discrete

Water quality

choice model is jointly calibrated with other components of the event detection system

Water security

framework in a training data set using genetic algorithms. The fusing process of each in-

Event detection

dicator probabilities, which is left out of focus in many existing event detection system

Logit analysis

models, is confirmed to be a crucial part of the system which could be modelled by exploiting a discrete choice model for improving its performance. The developed methodology is tested on real water quality data, showing improved performances in decreasing the number of false positive alarms and in its ability to detect events with higher probabilities, compared to previous studies. © 2015 Elsevier Ltd. All rights reserved.

1.

Introduction

Problems associated with sensor stations management in water distribution systems (WDS) have been widely explored since the 9/11/2001 events. Events such as the poisoning of water supply in Scotland (Gavriel et al., 1998) and the contamination events in Japan (Yokoyama, 2007) highlight how intentional sabotage remains a major risk to public health (WHO, 2004; Greenfield et al., 2002). Early studies in this field focused on sensor placements, incorporating deterministic and stochastic optimization

* Corresponding author. Tel.: þ972 4 8292782; fax: þ972 4 8228898. E-mail address: [email protected] (A. Ostfeld). http://dx.doi.org/10.1016/j.watres.2015.02.016 0043-1354/© 2015 Elsevier Ltd. All rights reserved.

techniques, as well as graph-theory algorithms for optimizing one or more objectives, such as detection likelihood, expected contaminated water volume, affected population, and capital cost (e.g., Kessler et al., 1998; Ostfeld and Salomons, 2004; Berry et al., 2006; Krause et al., 2008; Preis and Ostfeld, 2008; Xu et al., 2008). In fact, sensor placement is the most explored problem in the field of WDS security, featuring over ninety studies (Hart and Murray, 2010). Nevertheless, most sensor placement models suggested in the literature use the notion of “perfect sensor”, assuming that if a sensor measures any concentration of a contaminant, it will detect it immediately and with

w a t e r r e s e a r c h 7 5 ( 2 0 1 5 ) 2 1 0 e2 2 3

complete certainty. In reality, however, revealing a contaminant presence is a complex task. Since the number of possible contaminants is countless; it is not possible to construct a sensor which can detect all possible contaminations. Some attempts were made to develop sensors capable of identifying unique contaminations according to their exclusive properties [e.g., the use of light scattering for the detection of a spectral signature (Adams and Mccarty, 2007)]. However, the outsized variety of pollutants makes it impossible to deal with all possibilities. Laboratory analysis utilizing grab sampling may be able to identify a wide variety of pollutants, but is limited in its ability to detect contamination in a relevant time framework (EPA, 2005a). In light of the above, the direct approach that attempts to detect specific contaminant identification is found to be impractical. To cope with this challenge a surrogate approach of contamination detections is suggested (EPA, 2005a; EPA, 2005b; Hall et al., 2007). The surrogate approach suggests that information from convectional water quality sensors (e.g. measurements of electrical conductivity, residual chlorine, pH, etc.) can provide an early indication of possible pollution by analyzing irregularities in monitored variables, and their interplay. Recent studies adapted the surrogate approach for contamination event detection (EPA, 2010 through CANARY; Perelman et al., 2012; Arad et al., 2013; Oliker and Ostfeld, 2014a, 2014b). Those attempts used a variety of approaches for water quality event detection, including statistical, heuristics, machine learning, and optimization methods for analyzing monitored data to detect anomalous changes from baselines and indicate possible contamination. Nevertheless, several of the currently developed event detection system (EDS) models share a common structure for exploring the time series of each water quality parameter, through which: (a) the value of the next time step is predicted; (b) outliers from expected behavior are identified; and (c) probabilities of event occurrences are calculated. As noted, the analysis is performed individually for each water quality parameter. Thus, a fusing process is needed to construct the event probability based on probability estimates obtained from each single parameter. Currently, this fusing process is performed by means of simple heuristics. In this study a statistically based fusing process is proposed as an extension of the previously developed models which use simple heuristic rules for integrating single alarms through the use of a discrete choice logit model (Hilbe, 2009) which ensembles in one framework all water quality data. The proposed methodology advantages are demonstrated through a comparison to Arad et al. (2013) and to CANARY under different possible configurations. An extended literature review on event detection related models is provided below.

2.

Literature review

On the experimental and the conceptual side of the EDSs, Byer and Carlson (2005) performed contaminant event detection experiments in both batch and pilot-scale WDSs for determining the level of detection associated with different water quality parameters. Results showed that changes in “normal”

211

water quality behavior can trigger events detection and thus provide a mechanism for an early warning system. Hall et al. (2007) compared different commercial instrumentations for water quality event detection, showing that adding online water quality monitoring to WDS is a dual benefit as it improves both security and water quality conventional monitoring. Yang et al. (2008) showed that the residual chlorine loss curve and its geometry can serve as useful tools for identifying the presence of a contaminant presence in a WDS. Yang et al. (2009) complemented Yang et al. (2008) through testing an event adaptive detection, identification and warning methodology based on residual chlorine measurements on a pilotscale pipe flow experiment of eleven chemical and biological contaminants. The residual chlorine measurements were quantitatively related to contaminant-chlorine reactivity through forensic discrimination diagrams. Helbling and VanBriesen (2009) developed a model for predicting the behavior of chlorine within the WDS following a microbial contamination event. The model was used to simulate a series of microbial contamination events in a small community WDS. Schwartz et al. (2014) simulated the organophosphate pesticides chlorpyrifos and parathion chemical contaminants behavior in a water distribution system for enhancing event detection modeling. Results indicated that injection of these substances can be identified through a rapid decrease in chlorine as well as a drop in alkalinity concentration, and a small decrease in pH. On the algorithmic side of the EDSs, Koch and McKenna (2011) used a random space-time point process, Kulldorff's scan test to statistically identify significant clusters of detections. The methodology was tested through EPANET simulations. Murray et al. (2011) used Bayesian Belief Networks (BBNs) for event detection of E. coli contamination using the surrogate parameters of pH, conductivity, and turbidity. Lee et al. (2012) developed a multi-criteria decision analysis method based on an analytic hierarchy process for optimizing the selection of indicator/contaminant input sets for selecting water quality sensor types in water distribution systems. Raciti et al. (2012) described the utilization of an event detection algorithm on real time data of a distribution system through data mining and clustering techniques employed from information infrastructure security. Liu et al. (2013) developed and demonstrated an event detection methodology based on a decomposition scheme for filtering water quality time series into sequences of intrinsic mode functions. Hou et al. (2013a, 2013b, 2013c) developed several models for event detection. Hou et al. (2013a) described an event detection model based on an RBF neural network integrated with wavelet analysis. The suggested algorithm resulted lower false alarm rates and higher probability of detection comparing to time series increments. Hou et al. (2013b) developed a method for contamination events detection based on three interconnected stages: the model initially predicts future water quality parameters using an autoregressive model, secondly a probabilistic scheme assigns probabilities to the time series of water quality residuals, and finally events are fused through the DempstereShafer evidence theory. Hou et al. (2013c) presented the implementation of Hou et al. (2013a, 2013b) on few cities in China through the development of an event-driven water quality early warning

212

w a t e r r e s e a r c h 7 5 ( 2 0 1 5 ) 2 1 0 e2 2 3

and control system platform. He et al. (2013) introduced an event detection method based on a multi-parameters fusion algorithm combined with a fuzzy C-means clustering algorithm. In recent studies, Lambrou et al. (2014) described the application of algorithms for fusing online multi-quality water sensor measurements at local level. The tested sensors were built of several in-pipe electrochemical and optical lightweight sensors which are suitable for large scale deployments, thus enabling a sensor network approach. Williamson et al. (2014) described the development and implementation of an event detection platform on a real water distribution system in the Netherlands. The platform was tested through the detection of several water quality events. Oliker and Ostfeld (2014a) developed a weighted Support Vector Machine (SVM) model for event detection. Oliker and Ostfeld (2014b) used a machine learning model for water quality event detection based on a multivariate analysis combined with an unsupervised minimum volume ellipsoid scheme for events classification. Mounce et al. (2014) suggested pattern matching techniques and binary associative neural networks for water quality event detection and hydraulics pressure and flow matching. Burchard-Levine et al. (2014) developed an early warning system through a link between artificial neural networks and a genetic algorithm for water quality event detection. Liu et al. (2014) proposed an event detection algorithm based on data from laboratory contaminant injection experiments. Results showed that the method could detect a contaminant injection of 0.01 mg/L after nine minutes of introducing it in the system. Liu et al. (2015) suggested a non-dominated sorting genetic algorithm for solving the event detection problem based on the lab experiments conducted by Liu et al. (2014). The model improved Liu et al. (2014) as it was capable of detecting contaminants intrusions at a concentration of 0.008 mg/L after only one minute of injection. Overviews on event detection modeling were conducted by Rosen and Bartrand (2013) and by Zhao et al. (2014) concentrating on defining the gaps between research and reality, and pointing out new possible research directions.

between predicted and observed parameters can indicate possible contamination events. Given error for six different water quality indicatorsErrori ci2I, I ¼ {Chlorine, Electric Conductivity, pH, Temperature, Total Organic Carbon, and Turbidity} the DTM follows the following three steps: 1. The DTM starts by classifying the errors as normal or outliers to distinguish the system's normal operation from contamination events. This classification is performed by defining different error thresholds for every time step such that data outside the threshold is considered an outlier. The thresholds are dynamic and change as a function of time to reflect changes in noise in the data at different times. The threshold dynamics of each indicator are controlled by five different variables which are calibrated during the training phase of the method using a Genetic Algorithm (GA) (Holland, 1975; Goldberg, 1989). The GA strives to maximize the performance of the outlier classification process quantified by the sum of the True Positive Rate (TPR) and the True Negative Rate (TNR) of the outlier's classification process for each one of the indicators, TPRoi and TNRoi , respectively. 2. After outlier identification, the DTM proceeds to event identification for each quality indicator (i.e. calculation of event probability). In the DTM this stage is done for each of the indicators independently (i.e. a single indicator event identification), thus the event declaration is based solely on one quality indicator (e.g. event based on measurement of chlorine). For each one of the indicators a single indictor's event probability is calculated based on the sequence of the classified outliers. Initially, the probability of a contamination event is assumed to be rare, where with each new observation of outliers, the probability of an event is updated using sequential Bayes' rule (Equation (1)). This rule depends on the optimal TPRoi , TNRoi obtained in the training phase of the previous step. An indictor's event is declared when the probability exceeds a predefined probability threshold.

PðEt Þ ¼

3.

Methodology

The present methodology, entitled integrated logit detection (ILD), extends the previous event detection model of Arad et al. (2013), entitled dynamic threshold method (DTM), through optimally integrating the detection from all different water quality parameters into the event detection framework. In what follows a short description of the DTM methodology is provided, then the extensions incorporated in the ILD method are explored, along with the issues it tries to resolve.

3.1.

Dynamic threshold method

The input for the DTM is the error between predicted water quality parameters and their observed values. The predicted parameters are obtained using Artificial Neural Networks (ANNs) trained on uncontaminated conditions to predict uncontaminated water quality behavior thus large errors

f

PðEt jOt Þ if Residualt is Outlier PðEt Ot Þ if Residualt is non Outlier

where PðEt jOt Þ ¼

TPR PðEt1 Þ TPR PðEt1 Þ þ FPR ½1 PðEt1 Þ

P Et Ot ¼

ð1 TPRÞ PðEt1 Þ ð1 TPRÞ PðEt1 Þ þ ð1 FPRÞ ½1 PðEt1 Þ

(1)

wherePðEt Þ is the probability of an event at time t, Ot and Ot are outlier and non-outlier at time t, respectively, TPR and FPR are true and false positive rates obtained from the training phase, respectively, and PðEt jOt Þ, PðEt Ot Þ are the conditional probabilities of a contamination event given that the residual is classified as an outlier/normal, respectively. 3. The output from the previous steps is an event/noneevent declaration from each one of the water quality indicators per time step; this is because steps 1 and 2 are performed for each indicator independently. Contamination may only affect part of the monitored quality parameters, thus in the DTM the events declaration from the individual indicators

w a t e r r e s e a r c h 7 5 ( 2 0 1 5 ) 2 1 0 e2 2 3

are unified to construct the probability of an event based on all single event probabilities obtained in step 2. This unified probability is used to declare an event when it exceeds a predefined probability threshold value. The DTM uses a simple approach in which the unified probability is calculated based on the number of events obtained after solving all the single indicator event identification problems.

3.2.

Integrated logit detection

As in the case for the DTM the input for the ILD method are errors for different water quality indicatorsErrori ci2I. Similar to the DTM, the ILD uses dynamic thresholds which are controlled by five different parameters. However, unlike the DTM, in which these five parameters are calibrated to maximize the performance of the outlier classification process for each one of the parameters (TPRoi þ TNRoi ); here, we claim that a more proper objective should be defined on the event's classification performance which is obtained from the unified probability in Step 3, i.e. TPRe þ TNRe . Namely, on the overall performance of the contamination detection algorithm during the training phase rather than the outlier's classification process which compromises only one step of the algorithm. The definition of this new objective does not come without challenges, note that the new suggested objective TPRe þ TNRe is an integrated performance measure for all indicators compared to TPRoi þ TNRoi which is defined for each indicator i, independently. Thus unlike the DTM in which six calibration problems are solved independently for each of the six indicators, in the ILD, because the integrated measure depends on all indicators, one single calibration problem is defined in the training phase to find the five controlling variables for each one of the six indicator simultaneously; i.e. the calibration optimization problem is performed with 30 decision variables in the ILD compared to six optimization problems each with five decision variables on the DTM. Additionally, the evaluation of the integrated performance measure in the ILD requires the calculation of the unified event probability. Unlike the training phase of the DTM which only contains the outliers' classification step, the ILD training phase undergoes all three steps of: outliers' classification, single event identification, and unified event identification. Another extension of the DTM suggested by the ILD method is related to the way the unified event probability is derived. The unified event probability in the DTM is simply defined by the fraction of single declared events from the total numbers of indicators (six). In contrast, the ILD fits a discrete choice prediction model with two choices: event and nonevent, considering the single events declaration as “expert opinions” on the true choice. Each of the single experts contributes differently to a utility function ue which represents their joint opinion as follows: ue ¼ b0 þ

X

bi Ei

(2)

i2I

where b0, bi are unknown parameters and Ei is declared single event taking value of 1 for event and 0 for none-event.

213

The probability to declare an event in the discrete choice model depends on the joint opinion of the experts and is defined using a logistic function as follows: Pe ¼

1 1 þ eue

(3)

where Pe is the unified probability of an event. The unknown parameters of the above discrete choice prediction model are estimated using a maximum loglikelihood procedure during the training phase. Note that for each evaluation in the GA during the search for the optimal 30 controlling variables, a different Ei will be obtained. Thus in each evaluation of the GA, an inner optimization problem (maximum log-likelihood problem) is solved.

4.

Application

Real data collected from a water utility in the US (available from the CANARY database) are used in this study to compare the ILD and the DTM performance. The data are collected over four months with 5-min intervals under normal operation conditions without contamination events. These data include six water quality parameters, consisting of: total chlorine (mg/ L), electrical conductivity (EC) (mS/cm), pH (), temperature ( C), total organic carbon (TOC) (ppb), and turbidity (NTU). Software programs are attached as supplementary materials. Contamination event simulation procedure is used to mimic contamination events. These simulated events are imposed on normal operating conditions of the CANARY database. Here we follow the same event simulation procedure as suggested in Klise and McKenna (2006) and implemented in Perelman et al. (2012) and Arad et al. (2013). The simulated events vary in magnitude, direction, and length for each of the water quality parameter. Obviously, the higher the magnitude and the length of the imposed simulated events, the easier for the event detection algorithm to detect it. Three types of contamination events characterized by magnitude are tested, including: (a) random impact events where the magnitude is random; (b) low impact events; (c) mixed impact events where 30% of the events are forced to be low and the remaining is randomly generated. Because the generation of the events is random we have used the same random events as used in Arad et al. (2013), as such the comparison between the two methods is valid. The dataset was divided into two subsets. The first subset contains two thirds of the data for training phase, and the second subset contains the remaining one third for testing the developed methods and resemble real-time conditions where the data are not observed when making the event/non-event classification. We have maintained the same partition as was done in Arad et al. (2013) to facilitate the comparison. Both methods use given error for the six different water quality indicators Errori ci2I as input. These errors are obtained from six ANN models, one for each water quality parameter, trained on normal data without contamination events. For each water quality parameter, its corresponding ANN model has input variables of all other water quality parameters in addition to the values of the target parameter from the previous time step:

214

w a t e r r e s e a r c h 7 5 ( 2 0 1 5 ) 2 1 0 e2 2 3

bti ¼ ANNi xtcj2I ; xt1 ci2I x i

ct

(4)

t

bi is predicted quality parameter, xti is measured water where x quality parameter, I is a set defined asI ¼ I fig. Since the ANNs are trained on normal condition data, large errors between predicted and measured values can indicate possible events. The prediction errors are defined as: t

bi Errorti ¼ xti x

4.1.

ci2I

ct

(5)

Illustrative comparison example

In what follows is a step by step comparison between the DTM and the ILD. Fig. 1 shows a snapshot for the algorithm during the last GA evaluation, comparing the training phase of the two methods. The DTM training phase is much simpler as shown in Fig. 1.a. Since the DTM objective function is defined on the outliers' level (i.e.TPRoi þ TNRoi ), only the outliers' identification stage is performed. An optimization problem is defined for each water quality parameter, thus the GA finds the best five variables that define the dynamic threshold functions for each one of the water quality parameters. For each evaluation of the GA, new dynamic thresholds are defined, the GA attempts to find the best combination of the five variables that maximizes the outliers' identification where the outlier is declared whenever the measured data violates the dynamic thresholds. Fig. 1.a.1, shows the best dynamic threshold found for chlorine (CL) at the end of the optimization process focusing on a time period that contains three events; which will be used for demonstration purposes throughout this subsection. Within the optimization process the GA tried different dynamic thresholds i.e. produced different versions of the graphs in Fig. 1.a.1, but converged to the best one shown in the figure. The results in Fig. 1.a.1 show the lower the error in the model the tighter the dynamic thresholds. This is attributed to the fact that dynamic thresholds are constructed to account for different noise levels in the measurements. As such, when the noise is high (high standard deviation) it requires higher values for a measurement to be considered as an outlier [e.g. 19,000e19,500 (minutes)], where in low standard deviation of the errors the dynamic thresholds are tighter and it is requires little variation from normal data values for the data to be considered as an outlier [e.g. 19,500e20,000 (minutes)]. The training phase of the IDT is more complex. The optimization process attempts to find the best dynamic threshold control variables for all water quality parameters, simultaneously. This is because the GA objective function is defined on the event level as opposed to the outliers' level in the DTM. Specifically, it is not enough to identify outliers in the IDT, outliers should be further analyzed to determine whether they should be declared as event or non-event. Declaration of an event is not only based on a single water quality parameter, but it is an integrated process that utilizes the information gained from all water quality parameters. In each evaluation of GA, new dynamic thresholds are defined for all the water quality parameters at once, the GA

attempts to find the best combination of the five variables for all six indicators (total of 30 decision variables) that maximizes the performance of the event identification. For each evolution of the 30 decision variables: (a) dynamic thresholds are defined, outliers are identified, and the TPRoi ; TNRoi are calculated (Fig. 1.b.1); (b) Once TPRoi ; TNRoi are known they are used in the sequential Bayes equation (Equation (4)) to calculate the probability for an event based on each one of the water quality parameters (Fig. 1.b.2); (c) the probability of each single parameter is then used as explanatory variables to fit a Logit model that attempts to predict the integrated event probability using a maximum likelihood procedure; (d) after the integrated probability is calculated using the fitted Logit model, it is compared against a varying probability threshold in which the event probability during normal condition is expected to be low (e.g. 0.4) while during the event condition is expected to be high (e.g. 0.9) (Fig. 1.b.3); (e) the violation of the integrated probability from the probability threshold is used to classify the data into events and normal condition as shown in Fig. 1.b.3 and the TPRe ; TNRe are calculated. All steps (a-e) are performed for every evaluation of the optimization process as depicted in Fig. 1.b which shows a snapshot of the last evaluation during the optimization process. After the training phase the event detection system works in an online phase in which indicators data is repeatedly evaluated in real-time for each new observation (Fig. 2), thus the detection at time t can only rely on the data up to time t. The DTM uses the optimal 30 decision variables (obtained from six optimization problems) to classify the data into outliers/non-outliers observations, this outlier classification results in the optimal TPRoi ; TNRoi which are then used to apply the sequential Bayes rule (Fig. 2.a.2) for each of one of the water quality parameters. The single parameter probability is then compared to a constant probability threshold to determine the classification of the data as event/non-event for each one of the water quality indicators (Fig. 2.a.3). After classifying events and non-events for each of the indicators the results are used to derive a unified event probability. The unified event probability is calculated based on the number of events obtained from the single indicator event classification by simply dividing the number of events obtained with the number for total quality indicators (i.e. six). The integrated probability is then compared against a constant probability threshold to determine the event/non-event conditions as shown in Fig. 2.a.4. Unlike the DTM, the ILD uses the optimal 30 decision variables obtained from a single optimization problem in addition to the optimal Logit coefficient obtained in the training phase (Fig. 1.b). Similar to the DTM the ILD uses the optimal values of the 30 decision variables to classify the data as outliers/non-outliers, and the optimal TPRoi ; TNRoi to apply the sequential Bayes rule for each of one of the water quality parameters to obtain the single parameter probability. The optimal Logit model with the optimal coefficients is then used to find the integrated event probability using the six single parameters' probability as explanatory variables. The integrated probability is then compared to a constant threshold

w a t e r r e s e a r c h 7 5 ( 2 0 1 5 ) 2 1 0 e2 2 3

Fig. 1 e Comparison of the DTM and the ILD stages during the train phase.

215

216

w a t e r r e s e a r c h 7 5 ( 2 0 1 5 ) 2 1 0 e2 2 3

Fig. 2 e Comparison of the DTM and the ILD stages during the test phase. (Fig. 2.b.4) to determine the classification between event and non-event. Note that unlike the training phase where the threshold values vary between low vales in normal condition and high values in event condition, in the online phase the

probability threshold is constant. This is because in the online phase there is no information about when the events occur as opposed to the training phase where the history of event occurrence is used to train the event detection system.

w a t e r r e s e a r c h 7 5 ( 2 0 1 5 ) 2 1 0 e2 2 3

4.2.

Application 1

This section compares the performance of the ILD with the DTM, using the same simulated random events of the three types: (a) random; (b) low and (c) mixed random-low, as generated in Arad et al. (2013). The performance of the methods is compared based on the testing (online) phase after training both methods using the same training/testing partition. Both methods used the same ANN model's structure as described in Equation (4) which use a feed-forward backpropagation network with one hidden layer (twenty neurons) and one output layer. The networks were trained with tansigmoid transfer function and linear transfer function in the hidden and the output layers, respectively. The Neural Network Toolbox of Matlab is used to construct, train and test the ANN. Given that both EDSs uses the same ANNs differences in the results correspond to the event detection system algorithm itself and not to the estimation model used. For the training phase both methods used a GA optimization to find the optimal control variables, the DTM solves six optimization problems with five decision variables each while the IDT solves one optimization problem 5 6 ¼ 30 decision variables. In the DTM, the optimization toolbox of Matlab is used with a population size of 30, analyzed across 50 generations for each of the six optimization problems. In the ILD, because the optimization problem is larger (has more decision variables), we have increased the population size while maintaining the same ratio to the number of decision variables of the problem, i.e. population size of 180.

217

Fig. 3 illustrates ILD performance for low simulated events. The figure shows the event probability from single indicators and the integrated event probability. The probability threshold for the events is set to 0.9. Each of the first six subfigures represents one water quality parameter, and the probability of a contamination event based solely on that parameter. The performance of the single indicator alarms (i.e. violation of the probability threshold) varies; still the number of false alarms is high for each individual indicator. The integrated probability alarm unifies the probabilities from the six individual alarms using the optimum Logit model, to build an integrated event probability which is then classified based on a 0.9 probability threshold to event and normal conditions. This information fusion process improves the performance of the method and decreases the number of false alarms (false positives). Fig. 3, shows that the final event detection results from the ILD revealed the ten artificial contamination events while having only one false alarm at time step 2700. The values for the optimal control variables obtained in the offline phase are presented in Figure S2 at the supplementary files. Fig. 4 compares the integrated event probability for the ILD and the DTM for low simulated events. In Arad et al. (2013) two integrated probability thresholds where tested, 3/6 which corresponds to three or more single indicator alarms are raised (i.e., Decision Rule 1) and 5/6 which corresponds to five or more single indicator alarms are raise (i.e., Decision Rule 2). Here we test an additional probability threshold of 0.9 (i.e., Decision Rule 3). Comparing the event probability from the

Fig. 3 e Events detection from individual indicators and unified Logit probability.

218

w a t e r r e s e a r c h 7 5 ( 2 0 1 5 ) 2 1 0 e2 2 3

Fig. 4 e Event probability from: (a) DTM (b) ILD method.

two methods in Fig. 4 shows that the ILD is capable of detecting events with high probability, as opposed to the DTM which has relatively small probabilities during the true event. That is, one can allow high probabilities thresholds in the IDT, which in turn can reduce the chance for false alarms. It should be noted that high values of event probability threshold is advantageous as it gives the decision maker confidence in the raised alarms' reliability. Fig. 5 summarizes the integrated event detection based on the three probabilities thresholds for both methods. The first row in depicts the simulated events imposed (i.e. true state) while each of the six following rows represents the state based on two methods and the three different thresholds (i.e. three

decision rules: DR1, DR2, DR3). The ILD identifies the entire ten events in all three decision rules with decreasing number of false alarms when increasing the probability threshold. For example, in DR3 one false alarm is obtained in box “a”. The DTM identifies all ten contamination events when probability threshold is low in DR1 at the expense of nine false alarms as shown in box “c”. For DR3 there are no false alarms in the DTM, but only two out of the ten contamination events are identified (box “b”). On the hand, the ILD could detect all events at the expense of one false alarm. As noted previously the ILD is capable of detecting events with high probability compared to the DTM, this is why the decision rules in Arad et al. (2013) are based on low thresholds

Fig. 5 e Events detection from the DTM and the IDT with three different decision rules.

219

w a t e r r e s e a r c h 7 5 ( 2 0 1 5 ) 2 1 0 e2 2 3

such as 0.5. Nevertheless, the best performance of the ILD under DR3 of ten true positives and one false positive dominates the best performance of the DTM obtained in DR1 of ten true positive and nine false positives. Fig. 6 depicts the Receiver Operating Characteristic (ROC) curve of both methods. The ROC curve is a representation for the tradeoff between false positive and true positive for every possible probability threshold. The results demonstrate that the ILD outperforms the DTM since it closer to the perfect system at point (0, 1) and it can produce higher true positive rate for the same level of the false positive rate. Noteworthy that the DTM will always results in a piecewise linear ROC curve as seen in Fig. 6. This is because it uses the results of the events classification from the single water quality indicators to build the unified probability. Thus the unified probability for an event can only take seven different values ranging from no indicator declared events up to all six indicators declared event (i.e. possible probability values of zero to one by steps of sixth). Table 1 compares the results for different simulated events of different types. For the random type events the results are reported as the average performance for the same random ten runs used as in Arad et al. (2013). The values in the table represent true positives and false positives of detected contamination events in each of the three decision rules defined previously. Bold numbers represent the best performance of each method based on true positive priority. The results show that the ILD outperforms the DTM in all event types, for example, with simulated events of type mixed, both methods are capable of detecting all events, the DTM accomplishes this with seven false alarms while the ILD does it with only two false alarms. The ILD is cable of detecting all ten events in all settings while the number of false alarms decreases with increasing probability threshold. As noticed before the best performance of the ILD is with high probability threshold of 0.9 while the best performance of the DTM is with low probability threshold of 0.5. This observation is still valid for all events types as shown in Table 1.

Fig. 6 e ROC curve comparing the performance of the two methods on low type events.

Table 1 e Results comparison for different event types. Method

Random

Alarm type

IDT DTM

Mixed

IDT DTM

Low

IDT DTM

TP FP TP FP TP FP TP FP TP FP TP FP

Probability threshold 0.5

0.83

0.9

10 6 10 5 10 10 10 7 10 14 10 9

10 3 9 0 10 3 8 0 10 4 4 0

10 2 6 0 10 2 5 0 10 1 2 0

Note: TP ¼ true positive; FP ¼ false positive; DTM ¼ dynamic threshold method; ILD ¼ integrated logit detection.

4.3.

Application 2

The performance of the ILD approach is also compared to that of CANARY. CANARY ships with seven datasets from different monitoring station. In this comparison we used the dataset of station A in which the time interval between measurements is five minutes (EPA, 2010). It is unknown whether this dataset contains real events, thus it could not be used to calibrate the EDS (both CANARY and ILD). Therefore, any anomalies in the raw data are assumed to be caused by “normal” background changes i.e. unrelated to contamination event. Following the assumption that the dataset does not contain real events; we superimpose simulated events as described by EPA (2010, p. 34e35). We assumed that events follow Gaussian distribution shape with amplitude of 0.5 times the standard deviation of the signals. The duration of an event is chosen as 36 time steps (three hours for the selected station), the first event begins at time step 1501 and the subsequent events are added every of 1200 time steps (100 h) as described in EPA (2010, p. 34e35). As such the events duration considered in this test case is shorter than in Arad et al. (2013) where eight hours events duration is considered. It is expected that shorter events would be harder to detect. Fig. 7 shows an example of simulated events imposed in the chlorine signal. Disregarding the information we gave above regarding the shape of the simulated events, it is difficult to tell which of the two signals in Fig. 7a and b includes the artificial noise (it is clear that the signals are different but which of the two signals is “normal” and which is not?). Fig. 7c overlays the two signals to highlight the simulated events. Noteworthy is the last event in Fig. 7c which is superimposed on a “normal background spike”. We compare the performance of CANARY under different algorithms and different configuration parameters (i.e. parameters that control the behavior of the chosen CANARY's algorithms). A recent report by the US EPA (EPA, 2014) identified four control parameters that are particularly important for users to select carefully. These four critical configuration parameters are: (a) history window e the number of historical data points used to calculate the variability the signal; (b) outlier threshold e the maximum standard score, above it the

220

w a t e r r e s e a r c h 7 5 ( 2 0 1 5 ) 2 1 0 e2 2 3

Fig. 7 e Simulated event: (a) data without simulated events (b) data with simulated events (c) comparison of a and b.

observation is declared as an outlier; (c) Binomial Event Discriminator (BED) window e the number of data points over which outliers are examined for declaration of events; (d) event threshold e the probability that must be exceeded to consider anomaly as an event. CANARY does not include an auto-calibration module for selecting the optimal parameters to achieve the best performance; instead, users must manually specify values in the input file. Nevertheless, EPA (2014) provides guidelines for configuring the parameters of Multivariate Nearest Neighbor (MVNN) and Linear Prediction Coefficient Filter (LPCF) algorithms within CANARY. These guidelines suggest both a Rule-of-Thumb (RoT) set of parameters and a Simple Optimization Protocol (SOP) which outlines a test for 48 parameter combinations per algorithm (a total of 96 combinations for testing both the MVNN and the LPCF). These 96 combinations are to be manually checked to analyze the performance of the CANARY. We have compared ILD to CANARY with the two algorithms for both the RoT set of parameters and optimal set of parameters obtained from SOP as reported in EPA (2014). However, because the ILD utilizes an automatic calibration module and because interactions between the different parameter are not fully captured in the SOP, we saw fit to include another comparison with CANARY in which it under goes exhaustive parameter search by an automatic procedure. For this purpose we have developed a GA-CANARY link to optimize the performance of CANARY. GA-CANARY consists of a script the can programmatically change the configuration of CANARY's parameters, read its output and calculate the

confusion matrix measures. A schematic representation of GA-CANARY is depicted in Fig. 8. In GA-CANARY, we have used the same objective function as in the offline phase of the ILD, namely maximizing the sum of the TP and TN. The per decision variable computation budget is maintained as in ILD to facilitate a fair comparison, therefore a population size of 25 and 50 generations are used for finding the optimal four parameters of CANARY. GA-CANARY is developed in Matlab as it is the case for CANARY. As such the code developed here (all codes are attached) could be added to CANARY as an add-in so that CANARY users can use it for automatic tuning of the four critical parameters discussed in EPA (2014).

Fig. 8 e GA-Canary design scheme.

221

w a t e r r e s e a r c h 7 5 ( 2 0 1 5 ) 2 1 0 e2 2 3

The three different parameters configurations of CANARY obtained from: (a) rule-of-thump; (b) simple optimization protocol; (c) optimal GA-CANARY, are given in Table 2 for both the LPCF and MVNN algorithms. These three configurations are tested against one month of data for station A (from 08/07/ 2007 to 8/08/2007). For GA-CANARY and the ILD method we have only used one month for calibration (from 08/06/2007 to 8/07/2007). We limited the calibration to one month of the dataset to cope with the high computation time required by CANARY to execute long periods (average of 2.5 s per day). We believe that the high computation time is because CANARY is programed to work in real-time situations in which it has to processes each data point individually, thus limiting the use of efficient vectorized operations within Matlab (Birkbeck et al., 2007). The TP and FP performance of the different parameters configuration of CANARY compared to the ILD is summarized in Table 2. The last column in the table reports the number of true events detected out of the seven simulated events within the tested month (from 08/07/2007 to 8/08/2007). The results show that the MVNN performed better than the LPCF under both the RoT and SOP conditions as it simultaneously increases TP and decreases the FP. In GA-CANARY the MVNN obtained much lower FP but at a cost of decreasing the number of TP events as such it is subjective to determine which is better LPCF or MVNN. Comparing GA-CANARY performance versus RoT and SOP, shows that GA-CANARY outperforms RoT with the LPCF algorithm selected, but with all other options a tradeoff exists between the TP and the FP, as such it depends on the user valuation for the TP-FP balance. GA-CANARY optimizes the RoT under LPCF by increasing the history window and reducing the event threshold, in this way it could both reduce the FP from 19 to 15 and increase the TP from 3 to 4. Given this analysis for the performance of CANARY and the attempt to improve its performance by coupling it with GA automatic calibration procedure, we see that ILD outperforms CANARY under all the tested settings since it could detect all the seven events without any false alarms.

5.

Conclusions

An event detection system (EDS) is developed and demonstrated on real water quality data with simulated artificial events. The suggested EDS utilizes dynamic thresholds for outliers classification and the Bayesian sequential probability

to construct event probability from each single water quality parameters as suggested in Arad et al. (2013). Three noticeable features distinguish the current developed EDS from preceding EDS methods: 1. While the training stage of previously developed methods attempts to maximize the outlier classification performance, this model seeks a maximum performance of the event's classification process. The results demonstrated herein show that this objective is beneficial over earlier methodologies for the calibration of the EDS control variables. 2. Unlike other methods, the EDS control variables are calibrated herein simultaneously for all water quality parameters. This calibration outcome is demonstrating a better performance, yet it is more computationally intensive. However, since this calibration process is conducted offline, there are no tight restrictions on the computational required resources and time. Hence, the increase of the computational complexity does not impose a substantial limitation, given that computational resources are available. 3. Unlike several previously developed models which fuse the individual alarms from the water quality parameters by means of simple heuristics, this EDS utilizes a Logit model which is estimated through the maximum likelihood method for unifying event probabilities from the water quality parameters. Each water quality parameter is considered individually with its own impact regarding the contamination event probability. A linear utility function, which uses the individual water quality probabilities as input variables, is used to inform the discrete choice model on the selection between event and non-event occurrences. This discrete choice model is jointly calibrated with other components of the EDS framework through a genetic algorithm. The enhancements above led to a significant improvement over the previous method in Arad et al. (2013). Not only that the current developed method could achieve better event detection, it is also capable of detecting events with higher probability threshold, namely giving high event probabilities (e.g. 0.9) in case of a true event. Table 1 showed that the best performance in the DTM is when the probability threshold is set to relatively small values of 0.5, whereas the ILD works best with high threshold probabilities of 0.9. This is an

Table 2 e Comparison between Canary variants and ILD.

Canary

RoT* SOP** GA

ILD

LPCF MVNN LPCF MVNN LPCF MVNN

HW

OT

BED

ET

FP

TP

432 432 432 432 497 488

1.4 1.4 1 1 1.394 1.456

10 10 10 10 10 39

0.99 0.99 0.945 0.828 0.875 0.955

19 12 26 20 15 4 0

3 4 6 6 4 1 7

Note: * EPA (2014, P. 55), ** EPA (2014, P. 67), HW¼History Window, OT¼Outlier Threshold, BED¼Binomial Event Discriminator, ET ¼ Event Threshold, FP¼False Positive, TP ¼ True Positive.

222

w a t e r r e s e a r c h 7 5 ( 2 0 1 5 ) 2 1 0 e2 2 3

important property of the current ILD method as it allows for high probabilities thresholds, which in turn reduce the likelihood for false alarms, and thus provide the decision maker with more confidence in the raised alarms' reliability. The developed method is also compared to CANARY under different parameters' configuration. The results show that ILD outperforms the performance of CANARY for the tested simulated events. We have also introduced a GA-CANARY link for CANARY auto-calibration; this tool could be used as an add-in for CANARY users to find the optimal parameters' configuration that fits their system. In this study, we have set the calibration objective for GA-CANARY to maximize the sum for TP and TN (as we have in the ILD). Nevertheless, there is no guarantee that this objective function is the best one for CANARY to yield the best performance. Therefore, we believe that more research is needed to investigate GA-CANARY performance under different calibration objectives. The developed method is associated with data collected from a single monitoring station location, without any consideration of joint information from multiple sensors. Nevertheless, we believe that it is simpler now to develop a multiple sensor EDS on the basis of the ILD as opposed to other methods. This is because the event detection of each sensor is now lumped in a one event probability as opposed to multiple single indicator event probabilities in other methods. Further research is warrant to extend the ILD for integrating multiple sensor information into event detection methodologies. Full program codes and metadata for implementing the offline and online phases of the method as well as the code for GA-CANARY are provided in the supplementary material.

Acknowledgments The Technion part of this study was supported by the Technion Funds for Security research, by the joint Israeli Office of the Chief Scientist (OCS) Ministry of Industry, Trade and Labor (MOITAL), and by the Germany Federal Ministry of Education and Research (BMBF), under project number GR 2443.

Appendix A. Supplementary data Supplementary data related to this article can be found at http://dx.doi.org/10.1016/j.watres.2015.02.016.

references

Adams, J.A., Mccarty, D., 2007. Real-time on-line monitoring of drinking water for waterborne pathogen contamination warning. Int. J. High Speed Electron. Syst. 17 (4), 643e659. Arad, J., Housh, M., Perelman, L., Ostfeld, A., 2013. A dynamic thresholds scheme for contaminant event detection in water distribution systems. Water Res. 47 (5), 1899e1908. Berry, J.W., Hart, W.E., Phillips, C.A., Uber, J.G., Watson, J.P., 2006. Sensor placement in municipal water networks with temporal integer programming models. J. Water Resour. Plan. Manag. 132 (4), 218e224.

Birkbeck, N., Levesque, J., Amaral, J.N., 2007. A dimension abstraction approach to vectorization in Matlab, code generation and optimization. In: CGO '07 International Symposium on, pp. 115e130. http://dx.doi.org/10.1109/ CGO.2007.1, 11e14 March 2007. Burchard-Levine, A., Liu, S., Vince, F., Li, M., Ostfeld, A., 2014. A hybrid evolutionary data driven model for river water quality early warning. J. Environ. Manag. 143, 8e16. Byer, D., Carlson, K.H., 2005. Real-time detection of intentional chemical contamination in the distribution system. J. Am. Water Works Assoc. 97 (7), 130e141. EPA, 2005a. WaterSentinel: Online Water Quality Monitoring as an Indicator of Drinking Water Contamination. Available online at: http://www.epa.gov/watersecurity/pubs/ watersentinel_wq_monitoring.pdf (accessed 15.01.15.). EPA, 2005b. WaterSentinel: System Architecture. Available online at: http://www.epa.gov/watersecurity/pubs/watersentinel_ system_architecture.pdf (accessed 15.01.15.). EPA, 2010. Water Quality Event Detection Systems for Drinking Water Contamination Warning Systems: Development, Testing, and Application of CANARY. U.S. Environmental Protection Agency, Washington, DC. EPA/600/R-10/036. http:// www2.epa.gov/homeland-security-research/models-toolsand-applications-homeland-security-research (accessed 15.01.15.). EPA, 2014. Configuring Online Monitoring Event Detection Systems. U.S. Environmental Protection Agency, Washington, DC. EPA/600/R-14/254, 2014. http://www2.epa.gov/homelandsecurity-research/models-tools-and-applications-homelandsecurity-research (accessed 15.01.15.). Gavriel, A.A., Landre, J.P., Lamb, A.J., 1998. Incidence of mesophilic aeromonas within a public drinking water supply in north-east Scotland. J. Appl. Microbiol. 84 (3), 383e392. Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, New York. Greenfield, R.A., Brown, B.R., Hutchins, J.B., Iandolo, J.J., Jackson, R., Slater, L.N., Bronze, M.S., 2002. Microbiological, biological, and chemical weapons of warfare and terrorism. Am. J. Med. Sci. 323 (6), 326e340. Hall, J., Zaffiro, A., Marx, R.B., Kefauver, P., Krishman, R.E., Haught, R., Herrmann, J.G., 2007. On-line water quality parameters as indicators of distribution system. J. Am. Water Works Assoc. 99 (1), 66e77. Hart, W.E., Murray, R., 2010. Review of sensor placement strategies for contamination warning systems in drinking water distribution systems. J. Water Resour. Plan. Manag. 136 (6), 611e619. He, H.-M., Hou, D.-B., Zhao, H.-F., Huang, P.-J., Zhang, G.-X., 2013. A multi parameters fusion algorithm for detecting anomalous water quality. J. Zhejiang Univ. 47 (4), 735e740. Helbling, D.E., VanBriesen, J.M., 2009. Modeling residual chlorine response to a microbial contamination event in drinking water distribution systems. J. Environ. Eng. 135 (10), 918e927. Hilbe, M., 2009. Logistic Regression Models. Chapman & Hall/CRC Press, ISBN 978-1-4200-7575-5. Holland, J.H., 1975. Adaptation in Natural and Artificial Systems. The University of Michigan Press, Ann Arbor. Hou, D.-B., Chen, Y., Zhao, H.-F., Huang, P.-J., Zhang, G.-X., 2013a. Water quality anomaly detection method based on rbf neural network and wavelet analysis. Transducer Microsyst. Technol. 32 (2), 138e141. Hou, D.-B., He, H.-M., Huang, P.-J., Zhang, G.-X., Loaiciga, H., 2013b. Detection of water-quality contamination events based on multi-sensor fusion using an extended Dempster-Shafer method. Meas. Sci. Technol. 24 (055801), 18. Hou, D.-B., Song, X.-X., Zhang, G.-X., Zhang, H.-J., Loaiciga, H., 2013c. An early warning and control system for urban,

w a t e r r e s e a r c h 7 5 ( 2 0 1 5 ) 2 1 0 e2 2 3

drinking water quality protection: China's experience. Environ. Sci. Pollut. Res. Int. 20 (7), 4496e4508. Kessler, A., Ostfeld, A., Sinai, G., 1998. Detecting accidental contaminations in municipal water networks. J. Water Resour. Plan. Manag. 132 (4), 192e198. Klise, K.A., McKenna, S.A., 2006. Multivariate application for detecting anomalous water quality. In: Proceedings of the 8th Annual Water Distribution Systems. Koch, M.W., McKenna, S.A., 2011. Distributed sensor fusion in water quality event detection. J. Water Resour. Plan. Manag. 137 (1), 10e19. Krause, A., Leskovec, J., Guestrin, C., VanBriesen, J., Faloutsos, C., 2008. Efficient sensor placement optimization for securing large water distribution networks. J. Water Resour. Plan. Manag. 134 (6), 516e526. Lambrou, T.P., Anastasiou, C.C., Panayiotou, C.G., Polycarpou, M.M., 2014. A low-cost sensor network for realtime monitoring and contamination detection in drinking water distribution systems. IEEE Sens. J. 14 (8), 2765e2772. Lee, A., Francisque, A., Najjaran, H., Rodriguez, M.J., Hoorfar, M., Imran, S.A., Sadiq, R., 2012. Online monitoring of drinking water quality in a distribution network: a selection procedure for suitable water quality parameters and sensor devices. Int. J. Syst. Assur. Eng. Manag. 3 (4), 323e337. Liu, Y., Hou, D., Huang, P., Zhang, G., 2013. Multi-scale water quality contamination events detection based on sensitive time scales reconstruction. In: Proceedings of the 2013 International Conference on Wavelet Analysis and Pattern Recognition, Tianjin, 14e17 July, 2013, pp. 235e240. Liu, S., Che, H., Smith, K., Chen, L., 2014. Contamination event detection using multiple types of conventional water quality sensors in source water. Environ. Sci. Process. Impacts 16 (8), 2028e2038. Liu, S., Che, H., Smith, K., Chen, C., 2015. A method of detecting contamination events using multiple conventional water quality sensors. Environ. Monit. Assess. 187, 4189. Mounce, S.R., Mounce, R.B., Jackson, T., Austin, J., Boxall, J.B., 2014. Pattern matching and associative artificial neural networks for water distribution system time series data analysis. J. Hydroinf. 16 (3), 617e632. Murray, S., Ghazali, M., McBean, E.A., 2011. Real-time water quality monitoring: assessment of multi-sensor data using Bayesian belief networks. J. Water Resour. Plan. Manag. 138 (1), 63e70. Oliker, N., Ostfeld, A., 2014a. A coupled classification-evolutionary optimization model for contamination event detection in water distribution systems. Water Res. 51 (15), 234e245. Oliker, N., Ostfeld, A., 2014b. Minimum volume ellipsoid classification model for contamination event detection in water distribution systems. J. Environ. Model. Softw. 57, 1e12.

223

Ostfeld, A., Salomons, E., 2004. Optimal layout of early warning detection stations for water distribution systems security. J. Water Resour. Plan. Manag. 130 (5), 377e385. Perelman, L., Arad, J., Housh, M., Ostfeld, A., 2012. Event detection in water distribution systems from multivariate water quality time series. Environ. Sci. Technol. 46 (15), 8212e8219. http:/dx. doi.org/10.1021/es3014024. Preis, A., Ostfeld, A., 2008. Multiobjective contaminant sensor network design for water distribution systems. J. Water Resour. Plan. Manag. 134 (4), 366e377. Raciti, M., Cucurull, J., Nadjm-Tehrani, S., 2012. Anomaly detection in water management systems. In: Lopez, J., Setola, R., Wolthusen, S.D. (Eds.), Critical Infrastructure Protection. Springer, pp. 98e119. Rosen, J.S., Bartrand, T., 2013. Using online water quality data to detect events in a distribution system. J. Am. Water Works Assoc. 105 (7), 22e26. Schwartz, R., Lahav, O., Ostfeld, A., 2014. Integrated hydraulic and organophosphate pesticide injection simulations for enhancing event detection in water distribution systems. Water Res. 63, 271e284. WHO, 2004. Public Health Response to Biological and Chemical Weapons WHO Guidance, second ed. World Health Organization, Geneva, Switzerland. Available online at: http:// www.who.int/csr/delibepidemics/biochemguide/en/. (accessed 15 January 2015). Williamson, F., van den Broeke, J., Koster, T., Koerkamp, M.K., Verhoef, J.W., Hoogterp, J., Trietsch, E., de Graaf, B.R., 2014. Online water quality monitoring in the distribution network. Water Pract. Technol. 9 (4), 575e585. Xu, J., Fischbeck, P., Small, M.J., VanBriesen, J., Casman, E., 2008. Identifying sets of key nodes for placing sensors in dynamic water distribution networks. J. Water Resour. Plan. Manag. 136 (2), 378e385. Yang, Y.J., Goodrich, J.A., Clark, R.M., Li, Y.S., 2008. Modeling and testing of reactive contaminant transport in drinking water pipes: chlorine response and implications for online contaminant detection. Water Res. 42 (6), 1397e1412. Yang, Y.J., Haught, R.C., Goodrich, J.A., Li, Y.S., 2009. Real-time contaminant detection and classification in a drinking water pipe using conventional water quality sensors: techniques and experimental results. J. Environ. Manag. 90 (8), 2494e2506. Yokoyama, K., 2007. Our recent experience with sarin poisoning in Japan and pesticide users with references to some selected chemicals. Neurotoxicology 28 (2), 364e373. Zhao, H., Hou, D., Huang, P., Zhang, G., 2014. Water quality event detection in drinking water network. Water Air Soil Pollut. 225, 2183.

A coupled classification - evolutionary optimization model for contamination event detection in water distribution systems.

Integrated hydraulic and organophosphate pesticide injection simulations for enhancing event detection in water distribution systems.

Factors contributing to the contamination of hospital water distribution systems by legionellae.

myBlackBox: Blackbox Mobile Cloud Systems for Personalized Unusual Event Detection.

Street choice logit model for visitors in shopping districts.

An integrated passive-flow microfluidic biosensor with organic photodiodes for ultra-sensitive pathogen detection in water.

An integrated soil-crop system model for water and nitrogen management in North China.

An integrated system dynamics model developed for managing lake water quality at the watershed scale.

The accumulation of radioactive contaminants in drinking water distribution systems.

Systems biology as an integrated platform for bioinformatics, systems synthetic biology, and systems metabolic engineering.

Nitrate, Nitrite, and Ammonium Variability in Drinking Water Distribution Systems.

Detection of water contamination from hydraulic fracturing wastewater: a μPAD for bromide analysis in natural waters.

An Integrated Simulation Module for Cyber-Physical Automation Systems.

Toward an integrated software platform for systems pharmacology.

Risk factors for contamination of domestic hot water systems by legionellae.

Presence of Cryptosporidium parvum and Giardia lamblia in water samples from Southeast Asia: towards an integrated water detection system.

Integrated systems for exosome investigation.

Caged mudsnail Potamopyrgus antipodarum (Gray) as an integrated field biomonitoring tool: exposure assessment and reprotoxic effects of water column contamination.

An Integrated Pain Clinic Model.

Event oriented dictionary learning for complex event detection.

An integrated giant magnetoimpedance biosensor for detection of biomarker.

Towards an integrated understanding of gut microbiota using insects as model systems.

Identification and characterization of steady and occluded water in drinking water distribution systems.

An Integrated Hydraulic-Hormonal Model of Conifer Stomata Predicts Water Stress Dynamics.