This article was downloaded by: [UZH Hauptbibliothek / Zentralbibliothek Zürich] On: 27 December 2014, At: 09:37 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Environmental Technology Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/tent20

Comparing feed-forward versus neural gas as estimators: application to coke wastewater treatment a

a

b

Iván Machón-González , Hilario López-García , Jesús Rodríguez-Iglesias , Elena Marañónb

b

Maison , Leonor Castrillón-Peláez & Yolanda Fernández-Nava

b

a

Department of Electrical Engineering, Computing and Systems Electronics , Polytechnic School of Engineering, University of Oviedo , Gijon Campus, 33203 , Gijon , Spain b

Department of Chemical Engineering and Environmental Technology, Polytechnic School of Engineering , University of Oviedo , Gijon Campus, 33203 , Gijon , Spain Accepted author version posted online: 12 Oct 2012.Published online: 05 Nov 2012.

To cite this article: Iván Machón-González , Hilario López-García , Jesús Rodríguez-Iglesias , Elena Marañón-Maison , Leonor Castrillón-Peláez & Yolanda Fernández-Nava (2013) Comparing feed-forward versus neural gas as estimators: application to coke wastewater treatment, Environmental Technology, 34:9, 1131-1140, DOI: 10.1080/09593330.2012.737863 To link to this article: http://dx.doi.org/10.1080/09593330.2012.737863

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Environmental Technology, 2013 Vol. 34, No. 9, 1131–1140, http://dx.doi.org/10.1080/09593330.2012.737863

Comparing feed-forward versus neural gas as estimators: application to coke wastewater treatment Iván Machón-Gonzáleza∗ , Hilario López-Garcíaa , Jesús Rodríguez-Iglesiasb , Elena Marañón-Maisonb , Leonor Castrillón-Peláezb , Yolanda Fernández-Navab

Downloaded by [UZH Hauptbibliothek / Zentralbibliothek Zürich] at 09:37 27 December 2014

a Department

of Electrical Engineering, Computing and Systems Electronics, Polytechnic School of Engineering, University of Oviedo, Gijon Campus, 33203 Gijon, Spain; b Department of Chemical Engineering and Environmental Technology, Polytechnic School of Engineering, University of Oviedo, Gijon Campus, 33203 Gijon, Spain (Received 2 July 2012; final version received 3 October 2012 ) Numerous papers related to the estimation of wastewater parameters have used artificial neural networks. Although successful results have been reported, different problems have arisen such as overtraining, local minima and model instability. In this paper, two types of neural networks, feed-forward and neural gas, are trained to obtain a model that estimates the values of nitrates in the effluent stream of a three-step activated sludge system (two oxic and one anoxic). Placing the denitrification (anoxic) step at the head of the process can force denitrifying bacteria to use internal organic carbon. However, methanol has to be added to achieve high denitrification efficiencies in some industrial wastewaters. The aim of this paper is to compare the two networks in addition to suggesting a methodology to validate the models. The influence of the neighbourhood radius is important in the neural gas approach and must be selected correctly. Neural gas performs well due to its cooperation– competition procedure, with no problems of stability or overfitting arising in the experimental results. The neural gas model is also interesting for use as a direct plant model because of its robustness and deterministic behaviour. Keywords: neural network; neural gas; feed-forward; activated sludge; coke wastewater; nitrate

1. Introduction Coke wastewater contains substantial amounts of certain toxic compounds such as thiocyanate, cyanide and ammonium salts and organic compounds such as phenols, aromatic nitrogenated compounds and polycyclic aromatic compounds. The treatment of industrial coke wastewater was studied in a three-step activated sludge at the laboratory scale [1]. The first step was anoxic for the removal of nitrates, followed by an oxic step during which biodegradation of phenols and thiocyanates took place, and by a second oxic step to oxidize ammonium nitrogen to nitrate. The final effluent, with nitrates, was recirculated to the head of the plant to the denitrification process. Although the main purpose of placing the denitrification step at the head of the process was to avoid adding an external carbon source, the denitrifying microorganisms hardly used the organic compounds present in coke wastewater and methanol had to be added to achieve high denitrification efficiencies [1]. On the other hand, when denitrification is placed at the head of the process, the volume of the effluent that is not recirculated cannot undergo denitrification and the concentration of nitrates in the final effluent tends to increase. ∗ Corresponding

author. Email: [email protected]

© 2013 Taylor & Francis

In the present paper, a feed-forward (FF) neural network (NN) and a neural gas (NG) network are trained to calculate the concentration of NO3 –N for each methanol dosage under certain conditions. The aim of the study is to compare the two networks in addition to suggesting a methodology to validate the models. The influence of the training parameters and the experimental results are discussed. NNs are able to infer relationships from input data so as to form a model without any preconceived knowledge of rules about the physical system. They obtain general relationships by means of individual examples carrying out different tasks such as classification, prediction, clustering and pattern recognition. The benefits of NNs come to the fore with complex non-linear problems defined by noisy data where rules are difficult to formulate and precise numerical accuracy is unnecessary, e.g. in industrial process control. Mechanistic models to simulate wastewater treatment plants (WWTPs) are often formulated as a set of ordinary differential equations (ODEs). However, according to [2], a major drawback of these models is their considerable CPU time consumption in order to simulate the behaviour of the plant. Artificial neural networks (ANNs) reduce simulation time even when the time needed to generate training data

Downloaded by [UZH Hauptbibliothek / Zentralbibliothek Zürich] at 09:37 27 December 2014

1132

I. Machón-González et al.

and for training the ANN is included [2], thus enabling a rapid evaluation of WWTP performance over a broad range of plant operating conditions. The potential of using the ANN model as part of a real-time process control system is possible in some cases [3]. Lengthy simulation time can be partially alleviated by simplification of the mechanistic models [4], e.g. replacing partial differential equations by ODEs, removing some streams from the original plant structure or replacing the complex models by empirical models. An extremely important advantage of NNs is their capacity to generalize. They can correctly react to new data that is only approximately similar to the original training data. Generalization is necessary because real-world data is composed of noisy, distorted and often incomplete signals. The lack of formal background to the optimization of NN architectures is a very important drawback [5]. The success of applying NNs relies on correct design optimization of the network parameters [6]. The design of a NN model also has another disadvantage: the risk of overfitting during the training process. The solution is to reduce the effective complexity of the model [7]. In particular, ANNs have been widely applied for prediction purposes in WWTPs and environmental engineering over the past two decades [8–10], although the methods for implementing NN models are yet to be well defined. This fact is pointed out in [11] where the findings of numerous papers related to the estimation of water parameters in river systems are reviewed. Moreover, multi-layer perceptrons (MLPs) are still the most popular model architecture, while FF networks were almost exclusively used in the 1990s, as reported in [12]. However, this type of ANN usually presents problems of local minima during the learning and the training methodology, e.g. the network architecture, is not yet well established. The NG algorithm [13] is an unsupervised learning algorithm that considers cooperation–competition computation. Both cooperation and competition between

Figure 1.

Schematic representation of the biological treatment plant.

the map units allow the algorithm to avoid local minima problems [14]. An energy cost function is available to formulate the algorithm. NG has no output space and this fact has conditioned its application to data visualization. One of the critical aspects of NNs is their black box format. A set of rules which explains the dynamic behaviour of the NN model can be hard to obtain. The hidden weights of FF supervised NNs can be considered as unobservable variables, whereas each weight can be considered as the centre of an input space Voronoi region in unsupervised prototype-based algorithms. In addition, NNs do not describe the causes among variables, describing only the correlations. Several techniques for automatic rule extraction are reviewed in [15]. Nonetheless, it is possible to translate the NN model into a system of mathematical equations and classical statistical techniques do not provide causal information either. An important drawback is the demand for proper data acquisition, as the final application may fail if the data are insufficient. NNs are dependent on the quality and amount of data available. Gathering a representative process data set is a requirement for NNs. It must be possible to gather a sufficient sample of representative data; otherwise, it may be difficult or impossible to apply the model correctly. 2. Biological treatment plant A schematic representation of the biological plant employed is shown in Figure 1. In order to optimize working conditions, different recirculation ratios of the final effluent to the first step were employed (ranging between one and three) and different methanol dosages were added to the wastewater (varying between 0 and 1.2 L/m3 as shown in Figure 2). Both factors are very important to control the proper operation of a process that includes nitrogen removal [16]. Figure 2 shows the variation of NO3 –N in the influent and effluent components and the efficiency obtained. Also the

Environmental Technology Methanol dosages, recirculation and denitrification HRT 3.5 Effluent 1 Effluent 2 Effluent 3

350

200 150

Downloaded by [UZH Hauptbibliothek / Zentralbibliothek Zürich] at 09:37 27 December 2014

100 50 0

Figure 2.

20

40

60 Time (days)

80

100

45

2.5

40

2

35

1.5

30

1

25

0.5

20

0

20

40

60 Time (days)

80

100

15

Dynamic trend of the key process variables.

operational working conditions employed in the three-step biological treatment are displayed in Figure 2. A complete description of the influent and effluent components, the analytical methods employed and the biological plant can be found in [1]. The best performance of the plant was obtained using a methanol dosage of 1.2 L/m3 and a recirculation ratio equal to 3. In the final effluent, very low nitrate concentration was obtained, ranging between 20 and 38 mg NO3 –N/L. 3.

3 Methanol (L/m3) & Recirculation

250

3

NO- -N (mg/L)

300

50 Recirculation Methanol HRT1

Hydraulic residence time (h)

Nitrate

1133

Data discussion and selection of input variables

As stated in the introduction, NNs can provide solutions without any preconceived knowledge of the problem. This is, however, quite an optimistic opinion. In practice, expert knowledge will be required to obtain a good model, especially with regards to variable selection and data preprocessing. The procedure starts with data selection, analysis and handling. Working on this topic involves different aspects of programming, statistics and signal processing. The first and usually longest step in developing a NN model is the creation of a data set. Moreover, selection of variables and data preprocessing should be done appropriately in order to achieve a good NN model. In general, the training set must include representative data that enable the NN model to recognize new examples in the finished application, but only if they resemble the training patterns. Statistical techniques, such as Pearson’s correlation coefficient, are useful in order to select the most important process variables to train the model. The strength of the correlation between variables can help with data inspection. If one input and one output are highly correlated, that input variable is considered for inclusion as a training variable. Likewise, two strongly correlated inputs might suggest that only one is needed. It is important not to confuse correlation with causation. When two variables are correlated, there may or may not be a causal connection, and

this connection may moreover be indirect. Correlation can only be interpreted in terms of causation if the variables under investigation provide a logical basis for such an interpretation. Additional methods allow the correlation between two variables to be determined, e.g. graphically by means of a scatter diagram. This analytical approach for model input selection has the disadvantage of only measuring linear dependence between variables in the global data set. The lack of formal techniques for non-linear methods to assess the relative relevance of independent variables appears widely in the literature [17,18]. Most studies employ linear selection techniques, so that nothing prevents different sets of variables from being more relevant in non-linear terms. However, several authors make use of alternative methods. A rigorous procedure would be to carry out a nonlinear clustering of the input space, e.g. a self-organizing map [19,20], and then to choose the representative inputs from each cluster applying the same study as above. However, the same inputs will be considered in this paper due to available expert knowledge so as to develop the study with all of the necessary assumptions. Although the main purpose of placing the denitrification step at the head of the process was to avoid adding an external carbon source, the denitrifying microorganisms did not use the organic compounds present in coke wastewater [1]. However, placing the denitrification step at the head of the process allows us to achieve a dilution of the wastewater that improved the thiocyanate degradation in the aerobic reactor and, therefore, this configuration was chosen as the optimum. Major removals of thiocyanate, phenols and chemical oxygen demand (COD) were achieved in the second reactor (aerobic) and no biodegradation or very little was achieved in the first reactor (anoxic). Owing to these reasons, methanol was added as an external carbon source. As a consequence, concentration of thiocyanate, phenols and COD were not considered when applying the

Downloaded by [UZH Hauptbibliothek / Zentralbibliothek Zürich] at 09:37 27 December 2014

1134

I. Machón-González et al.

NN. The methanol dosage and the recirculation ratio were increased in order to obtain a lower nitrate concentration in the final effluent. The best performance of the plant was obtained in the last stage, with very low nitrate concentrations being obtained in the final effluent (ranging between 20 and 38 mg NO3 –N/L). These were lower than the minimum theoretical values in relation to the recirculation ratio employed (40–50 mg NO3 –N/L). This may be due to ammonia desorption/stripping phenomena during the process. Furthermore, uncontrolled denitrification is also likely to occur in the settling tanks as a result of the presence of biomass, low oxygen concentrations and long hydraulic retention time (HRT) [21]. The nitrate removal efficiencies observed were accordingly higher than maximum theoretical values. Although in many cases developing an activated sludge model based on ODEs is not very complex, the use of ANN techniques is considered here for the aforementioned reasons. Taking all of the previous considerations into account, the training data set variables were: the methanol dosage and the concentration of nitrates in the final effluent of the treatment plant (Effluent 3), i.e. the previous output value, yk−1 , is used to estimate the current output value, yk . The training data belongs to the working zone in which the recirculation ratio is equal to 3 and this lasts 66 days. The sampling time is approximately 3 h 10 min. Data preprocessing may be necessary by means of mathematical transformations aimed at capturing cycles, trends or any other relationship. Common techniques include calculating sums, differences, differentials, inverses, powers, roots, logarithms, averages, moving averages and Fourier transforms. Any signal-processing or feature-extraction technique can be used. Similarly, one NN may prepare data for another by, for instance, clustering the data before classification. In the present case, the original data vectors, v, were normalized according to vnj =

vj − μj σj

(1)

to obtain a normalized set of data vectors, vn ; where vj , μj , σj are the original data, the mean value and the standard deviation of variable j, respectively. This data transformation leads to a normalized distribution with zero mean and unitary variance. All of the variables are thus treated by the NN with the same importance. The model validation was carried out by means of a cross-validation procedure. The cross-validation was done using the k-fold method, where the data is randomly split into k disjoint sets or folds of the same size. Then k training and validation iterations are carried out using a different fold of the data for validation and the remaining k − 1 sets for training at each iteration. Finally, the training and cross-validation errors are averaged over k iterations. Considering 10 folds (k = 10) is commonly accepted in data mining.

The MATLAB programming language was used to implement the NG algorithm in a script written by the authors. The FF model was obtained using the toolbox available in MATLAB . 4. FF NN approach The architecture of the NN is composed of a single hidden layer with a hyperbolic tangent as the activation function and a single neuron with a linear activation function as the output layer. The hyperbolic tangent function of the hidden layer enables the network to learn non-linear relationships. The linear activation function of the output layer enables the output network to take any value. This network topology can be used as a general approximator for any function that has a finite number of discontinuities whenever the hidden layer has a sufficient number of neurons and a non-linear activation function [22,23]. However, other studies [24,25] have reported that many cases are difficult to solve with one single hidden layer and it is recommendable to use several layers to approximate complex functions with fewer neurons. The next step consists of training the NN using the Levenberg–Marquardt algorithm [26,27]. This algorithm uses a second-order gradient without computing the Hessian matrix. If the cost function can be formulated as a squared sum, then the Hessian matrix is approximated as H = J T · J , where J is the Jacobian matrix. The updating rule of the weight wi is indicated in wi = −

JT · e , JT · J + μ · I

(2)

where I is the identity matrix, e is the estimation error and the parameter μ increases as the cost function rises with a small updating of the weights quite similar to the typical gradient descent algorithm; otherwise, it decreases. The aim is to train the algorithm following the Newton method, as this is faster and more accurate than the gradient descent procedure. The Levenberg–Marquardt algorithm is usually used as the faster training rule for NNs of small–medium size. Its implementation in MATLAB is very efficient as the solution of the matrix equation is a built-in function. The main drawback is its high memory cost, because the Jacobian matrix dimensions are N × W , where N is the number of training instances and W is the total number of weights and bias in the network. The mean squared error (MSE) is useful to determine the number of hidden neurons. A low number of neurons does not provide sufficient parameters to train the NN correctly. An excessive number of neurons leads to overtraining problems and its computational cost is higher. The higher the number of weights, the lower the training error. In many cases, however, although the testing error decreases at the beginning, it rises subsequently due to problems of overtraining. Therefore, an optimum number of weights must be

Environmental Technology -4

x 10

1135

-4

20 epochs

4

x 10

20 epochs

training data set cross-validation data set

2

training data set cross-validation data set 2

0 1

3 -4 4 x 10

5

6

7 25 epochs

8

9

10

0

11

6 training data set cross-validation data set

3 -5 4 x 10

5

6

7 25 epochs

8

9

10

11

training data set cross-validation data set

4

0.5

Mean value of MSE

Downloaded by [UZH Hauptbibliothek / Zentralbibliothek Zürich] at 09:37 27 December 2014

0 6

3 -4 4 x 10

5

6

7 30 epochs

8

9

10

11

training data set cross-validation data set

4 2 0

1.5

3 -4 4 x 10

5

6

7 8 35 epochs

9

10

11

Standard deviation of MSE

2 0 1.5

3 -3 4 x 10

5

6

7 30 epochs

9

10

11

training data set cross-validation data set

1 0.5 0 4

3 -4 4 x 10

5

6

7 35 epochs

training data set cross-validation data set

1

8

8

9

10

11

training data set cross-validation data set 2

0.5 0 1

3 -4 4 x 10

5

6

7 40 epochs

8

9

10

11

training data set cross-validation data set

0

3 -4 4 x 10

5

6

7 40 epochs

8

9

10

11

training data set cross-validation data set

2

0.5

0

3

4

5

6

7

8

9

10

11

Number of neurons in the hidden layer

Figure 3.

0

3

4

5

6

7

8

9

10

11

Number of neurons in the hidden layer

MSE as a function of the number of hidden units and training epochs in the FF NN approach.

found. Moreover, an excessive number of epochs can cause overtraining. Thus, it is necessary to carry out a validation procedure. The number of neurons in the hidden layer is increased gradually performing several tests (10 assays were carried out in this study) in the same way as in [28], varying the maximum number of epochs. The results of the training and cross-validation errors are shown in Figure 3 as the average and variance of these assays. The number of neurons in the hidden layer was chosen equal to five to avoid overtraining problems. Robustness takes precedence over the accuracy of the network. One single neuron is in the output layer, as described previously. 5. NG approach NG is an unsupervised prototype-based method [13] in which the prototype vectors are the weights of the network and carry out a partition of the input data space. The training algorithm is based on an energy cost function. This cost function, ENG , is formulated as  ENG = hσ (vj , wi ) · d(vj , wi ), (3) ij

according to the Euclidean metric. The notation used for the squared Euclidean distance is d(vj , wi ) = vj − wi  = (vj − wi )2 . A neighbourhood function



k(v, wi ) hσ (v, wi ) = exp − σ (t)

(4)

 (5)

is needed to implement the algorithm. In this case, there are no topological restrictions as in the SOM algorithm [29]. The rank function k(v, wi ) ∈ 0, . . . , m − 1 represents the rank distance between prototype wi and data vector v. The minimum distance takes the value 0 and the rank for the maximum distance is equal to m − 1, where m is the number of map units or prototypes and σ (t) is the neighbourhood radius. The neighbourhood radius σ (t) was chosen to decrease exponentially according to   σtmax t/tmax σ (t) = σt0 · , (6) σt 0 where t is the epoch step, tmax is the maximum number of epochs and σt0 was chosen as the half number of map

1136

I. Machón-González et al.

units (σt0 = m/2), as in [30]. The decrease goes from an initial positive value, σt0 , to a smaller final positive value, σtmax . In addition, σtmax = 0.0001 in order to minimize the quantization error at the end of the training. The learning rule for the sequential version of the algorithm is obtained using a gradient descent on ENG by means of derivation of the energy cost function (3) with respect to the prototype vectors, wi ,

yi = α(t) hσ (v, wi ) (y − yi ),

ai = α(t)hσ (v, wi ) (y − yi − ai · (v − wi ))(v − wi ) (11) respectively, and are obtained by gradient descent of this energy function (9) in a similar way as described previously for (7). The main training parameters are the number of prototypes or neurons, the number of epochs and the initial neighbourhood radius. Their values must be adjusted in the knowledge that the higher the number of neurons and epochs, the lower the quantization error, although the higher the computational cost. Furthermore, the initial neighbourhood radius must be high enough to guarantee cooperative training between the neurons in order to avoid local minima, but it must not be too high to allow competition to take place between neurons. Otherwise, the quantization error would be high. The idea is to begin with a high degree of cooperation while the competition appears smoothly along the training epochs. This can be visualized in the typical quantization error versus epochs graph, in which the quantization error decreases slowly at the beginning of the training and finally decays faster, when competition takes place. Figure 4 shows the quantization error and the MSE of estimation versus the number of prototypes with σt0 = m/2 and tmax = 50. The results are averaged after 10 algorithm runs. Obviously, the higher the number of neurons, the lower the quantization error, although the MSE of estimation almost ceases to decrease for a number of units higher than 50 in the cross-validation data set. Subsequently, a NG network of 50 prototypes was chosen to be trained several times varying the maximum number of epochs, tmax . The results of the quantization and estimation errors versus tmax is shown in Figure 5. The quantization error in the crossvalidation data set seems to increase above 60 epochs and

∂ENG . ∂wi

wi = α(t)hσ (v, wi )(v − wi ).

(7)

5.1. Supervised learning Supervised learning with NG is possible by means of local linear mapping over each Voronoi region defined by prototype vector wi . A constant yi and a vector ai with the same dimension as wi is assigned to each neuron i. The goal is to approximate the function y = f (v) with yˆ (v). The training thus becomes supervised and the data set contains input– output pairs of data vector v and variable y as the objective function. The estimation is carried out using yˆ (v) = yi∗ + ai∗ · (v − wi∗ ),

(10)

and

It is expressed as follows using the squared Euclidean metric, where α(t) is the learning rate:

(8)

where i∗ is the neuron i with its closest wi to data vector v, i.e. the best matching unit. Prototypes wi are trained in the same way as in (8). An energy cost function is formulated by averaging the MSE over each Voronoi region defined by wi according to  ENGsup = hσ (vj , wi ) · (yj − yi − ai · (v − wi ))2 . (9) ij

-4

0.05

3.5

x 10

training data set cross-validation data set

0.045

training data set cross-validation data set 3

0.04 MSE estimation

2.5 quantization error

Downloaded by [UZH Hauptbibliothek / Zentralbibliothek Zürich] at 09:37 27 December 2014

wi = −α(t)

The learning rules for yi and ai are shown in

0.035 0.03 0.025

0.5

0.015

Figure 4.

1.5 1

0.02

0.01 20

2

25

30

35 40 45 number of prototypes

50

55

60

0 20

25

30

35 40 45 number of prototypes

50

Mean quantization error and MSE of estimation versus the number of prototypes in the NG approach.

55

60

Environmental Technology

1137

-4

0.03

8

x 10

training data set cross-validation data set

training data set cross-validation data set

MSE estimation

quantization error

6 0.025

0.02

4

0.015 20

Figure 5.

40

60 number of epochs

80

0 20

100

40

60 number of epochs

80

100

Mean quantization error and MSE of estimation versus the maximum number of epochs in the NG approach. -4

0.018

1 training data set cross-validation data set

x 10

training data set cross-validation data set

0.9

0.017 0.8 0.016

0.7 MSE estimation

quantization error

Downloaded by [UZH Hauptbibliothek / Zentralbibliothek Zürich] at 09:37 27 December 2014

2

0.015

0.014

0.6 0.5 0.4 0.3

0.013

0.2 0.012 0.1 0.011

Figure 6.

1

2

3 4 5 6 λ parameter of neighborhood radius

7

8

0

1

2

3 4 5 6 λ parameter of neighborhood radius

7

8

Mean quantization error and MSE of estimation versus the neighbourhood radius in the NG approach.

the estimation can be considered as satisfactory above this number of epochs. The model selection thus yields a NG net with 50 prototypes to be trained over 50–60 epochs. Different training runs were carried out varying σt0 in order to check the correct selection of the neighbourhood radius, considering σt0 = m/λ, with λ = {1, 2, . . . , 8}. The influence of this parameter on the estimation can be seen in Figure 6; λ values above 3 make the training competitive, increasing the MSE. The quantization error of the cross-validation increases for the same reason, when the units are close to the training data, but far from the validation data set. Something similar occurs in Figure 5, with tmax higher than 60. Therefore, the validated NG model is composed of 50 prototypes and trained for 60 epochs (tmax = 60) using an initial neighbourhood radius equal to 25 (σt0 = m/2). 6. Results The values of NO3 –N (mg/L) in Effluent 3 are plotted in Figure 8 against the methanol dosages after simulation with

both trained NN models, increasing the methanol dosages after reaching the steady state with a methanol dosage of 0.95 L/m3 . Although the data set is quite limited, this steady state can be considered to be appropriate. The plant operating conditions remain constant, maintaining the nitrate concentration at a constant level, as can be seen in Figure 7. This equilibrium stage lasts from the 57th to the 64th day, as it can be seen on the left-hand side of Figure 7. Both models estimate this steady-state value at around 50 mg/L. The NG model fits the data closer than the FF, which approximates the plant dynamics in a smoother way. In principle, such an accurate fitting would imply problems of generalization and a typical FF net could yield overfitting, but these undesirable effects did not appear in the NG results. Thus, the final values of nitrate concentration in the effluent from the treatment plant are estimated for each methanol dosage and correspond to a recirculation ratio R = 3. The final FF net was trained using a subset for model validation in order to stop the learning if the validation error

I. Machón-González et al. 1.03

90 0.97

60

25 20

0.95

1.05 Nitrate Effluent3 (NG) Methanol Nitrate Effluent3 (FF)

10

45

50

55

0

0.93 65

60

0.95 70

80

90

Time (days)

100

110 120 Time (days)

130

140

160

Function approximator of nitrates and estimation interpolation of the models. 1.25

140

1.2

100

1.15

80

1.1

60

1.05

40

1

20

0.95

0 40

60

80

100

120 Time (days)

140

160

180

200

3

Nitrate Effluent3 (NG) Methanol Nitrate Effluent3 (FF)

-

NO3-N (mg/L)

120

Figure 8.

150

Methanol (L/m )

Figure 7.

1

5

50

Downloaded by [UZH Hauptbibliothek / Zentralbibliothek Zürich] at 09:37 27 December 2014

1.1

3

30

15

70

40 40

1.15

35 0.99

80

40

3

100

1.01

-

NO3-N (mg/L)

110

1.2

45

NO -N (mg/L)

Nitrate Effluent3 (NG) Methanol Nitrate Effluent3 (FF) Nitrate experimental data

120

50

3

130

Methanol (L/m )

140

Methanol (L/m )

1138

0.9

Dynamic trend of nitrates obtained by simulation of both models varying the methanol dosages.

does not decrease after 5 epochs. Although a low number of neurons in the hidden layer of the FF model was considered in order for robustness to take precedence over accuracy, problems of stability arose in numerous trials of the FF model simulation. These problems are related to the divergences in the FF model response. However, the NG model does not present problems of stability and is much more deterministic. The interpolation capabilities and the dynamic response of the models are shown on the right-hand side of Figure 7. The dynamics seem to be automatically adjusted by the neural model as a first-order system in which the static gain and the time constant are different for each methanol dosage, thus providing the model with non-linear abilities.

A key parameter in the formulation of the NG model is the neighbourhood radius. As stated in the previous section, correct selection is needed here due to the fact that parameter σ influences the cooperation–competition procedure of the training, affecting the quantization error and the MSE of estimation. Selecting a neighbourhood radius greater than the optimal value results in slow dynamics of the NG model according to the experimental results of the simulation, whereas the smaller the value, the greater the MSE of estimation, as can be seen on the right-hand side of Figure 6. An optimum number of weights must be selected in a FF network, as the higher the number of neurons in the hidden layer, the greater risk of overfitting. However, increasing the number of prototypes in the NG network does not

Environmental Technology

Downloaded by [UZH Hauptbibliothek / Zentralbibliothek Zürich] at 09:37 27 December 2014

produce this undesirable effect, although the computational cost is higher. 7. Conclusions Although developing an activated sludge model based on ODEs is recommendable and not very complex, the use of ANN techniques is considered in this case due to ammonia desorption/stripping phenomena and uncontrolled denitrification in the settling tanks. As a consequence, the nitrate removal efficiencies observed were higher than maximum theoretical values. The aim of this paper was to formulate a model able to predict the concentration of NO3 –N depending on the methanol dosages. A compromise between the minimization of the effluent compound and the consumption of methanol can thus be obtained in order to achieve savings in the external carbon source. A study of the data from the WWTP and a linear correlation analysis between the process variables was carried out. Most of the results are obvious and expected. The set of training variables seems to be highly determined by the data analysis and the collaboration of expert staff. Two different types of ANN were employed to construct the model and these were subsequently compared. The results suggest a number of ideas to obtain a validated model while pointing out certain drawbacks: • Both models seem to yield good results. However, FF usually presents problems of local minima and the FF training methodology is not yet well defined, as is well documented in the literature. Problems of stability arose in the simulation procedure. NG performs well due to its cooperation–competition procedure, with no problems of stability or overfitting arising in the experimental results. • The greater the number of neurons in a FF network, the greater the risk of overfitting. Increasing the number of map units in a NG network will not produce this undesirable effect, although the computational cost is higher. Furthermore, the neighbourhood radius is a key training parameter of the NG approach and must be correctly selected. • The NG model is interesting for use as a direct model in a predictive control scheme because of its robust behaviour. Using a batch procedure may be advisable to more quickly obtain the direct plant model [31]. Acknowledgements Our warmest thanks are expressed to The European Coal and Steel Community (ECSC) for their financial support in research projects, supporting ‘Advanced Process Control for Biological Water Treatment Plants in Steelworks’ and ‘Implementation of sensor based on-line control of pickling lines (SensorControlPilot)’ via agreement numbers ECSC-7210-PR-235 and RFSP-CT2007-00046, respectively.

1139

References [1] E. Marañón, I. Vázquez, J. Rodríguez, L. Castrillón, and Y. Fernández, Coke wastewater treatment by a three-step activated sludge system, Water Air Soil Pollution 192 (2008), pp. 155–164. [2] B. Ráduly, K.V. Gernaey, A.G. Capodaglio, P.S. Mikkelsen, and M. Henze, Artificial neural networks for rapid WWTP performance evaluation: Methodology and case study, Environ. Modell. Software 22 (2007), pp. 1208–1216. [3] A. Gamal El-Din, and D.W. Smith, Modeling a full-scale primary sedimentation tank using artificial neural networks, Environ. Technol. 23 (2002), pp. 479–496. [4] P.A. Vanrolleghem, L. Benedetti, and J. Meirlaen, Modelling and real-time control of the integrated urban wastewater system, Environ. Modell. Software 20 (2005), pp. 427–442. [5] A. Vellido, P.J.G. Lisboa, and J. Vaughan, Neural networks in business: a survey of applications (1992–1998), Expert Syst. Appl. 17 (1999), pp. 51–70. [6] K. Fanning, K.O. Cogger, and R. Srivastava, Detection of management fraud: A neural network approach, Intell. Syst. Accounting Finance Managem. 4 (1995), pp. 113–126. [7] K. Feldman and J. Kingdon, Neural networks and some applications to finance, Appl. Math. Finance 2 (1995), pp. 17–42. [8] E.A. Perpetuo, D.N. Silva, I.R. Avanzi, L.H. Gracioso, M.P.G. Baltazar, and C.A.O. Nascimento, Phenol biodegradation by a microbial consortium: application of artificial neural network (ANN) modelling, Environ. Technol. 33 (2012), pp. 1739–1745. [9] X. Li, M.H. Nour, D.W. Smith, and E.E. Prepas, Neural networks modelling of nitrogen export: model development and application to unmonitored boreal forest watersheds, Environ. Technol. 31 (2010), pp. 495–510. [10] R.D. Tyagi and Y.G. Du, Kinetic model for the effects of heavy metals on activated sludge process using neural networks, Environ. Technol. 13 (1992), pp. 883–890. [11] H.R. Maier, A. Jain, G.C. Dandy, and K.P. Sudheer, Methods used for the development of neural networks for the prediction of water resource variables in river systems: Current status and future directions, Environ. Modell. Software 25 (2010), pp. 891–909. [12] H.R. Maier and G.C. Dandy, Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications, Environ. Modell. Software 15 (2000), pp. 101–124. [13] T.M. Martinetz, S.G. Berkovich, and K.J. Schulten, Neuralgas network for vector quantization and its application to time-series prediction, IEEE Trans. Neural Networks 4 (1993), pp. 558–569. [14] M.E. Tipping and C.M. Bishop, Mixtures of probabilistic principal component analyzers, Neural Computat. 11 (1999), pp. 443–482. [15] R. Andrews and J. Diederich, Rules and networks, Proceedings of the Rule Extraction From Trained Artificial Neural Networks Workshop (AISB’96), Queensland University of Technology, 1996. [16] J.A. Baeza, D. Gabriel, and J. Lafuente, Improving the nitrogen removal efficiency of an A2/O based WWTP by using an on-line Knowledge Based Expert System, Water Res. 36 (2002), pp. 2109–2123. [17] K.Y. Tam and M.Y. Kiang, Managerial applications of the neural networks: The case of bank failure predictions, Management Sci. 38 (1992), pp. 926–947. [18] H. Jo, I. Han, and H. Lee, Bankruptcy prediction using casebased reasoning, neural network and discriminant analysis, Expert Syst. Appl. 13 (1997), pp. 97–108.

Downloaded by [UZH Hauptbibliothek / Zentralbibliothek Zürich] at 09:37 27 December 2014

1140

I. Machón-González et al.

[19] H. López and I. Machón, Self-organizing map and clustering for wastewater treatment monitoring, Engng Appl. Artif. Intell. 17 (2004), pp. 215–225. [20] I. Machón and H. López, End-point detection of the aerobic phase in a biological reactor using SOM and clustering algorithms, Engng Appl. Artif. Intell. 19 (2006), pp. 19–28. [21] J.L. Campos, M. Sánchez, A. Mosquera-Corral, R. Méndez, and J.L. Lema, Coupled BAS and anoxic USB system to remove urea and formaldehyde from wastewater, Water Res. 37 (2003), pp. 3445–3451. [22] G. Cybenko, Approximation by superpositions of a sigmoidal function, Maths Control Signals Syst. 2 (1989), pp. 303–314. [23] E.J. Hartman, J.D. Keeler, and J.M. Kowalski, Layered neural networks with Gaussian hidden units as universal approximations, Neural Computat. 2 (1990), 210–215. [24] I. Flood and N. Kartam, Neural networks in civil engineering. I: Principles and understanding, J. Comput. Civil Eng. 8 (1994), pp. 131–148.

[25] B. Cheng and D.M. Titterington, Neural networks: a review from a statistical perspective, Statist. Sci. 9 (1994), pp. 2–54. [26] K. Levenberg, A method for the solution of certain problems in least squares, Q. Appl. Maths 2 (1944), pp. 164–168. [27] D. Marquardt, An algorithm for least-squares estimation of nonlinear parameters, SIAM J. Appl. Maths 11 (1963), pp. 431–441. [28] I. Machón, H. López, J. Rodríguez-Iglesias, E. Marañón, and I. Vázquez, Simulation of a coke wastewater nitrification process using a feed-forward neuronal net, Environ. Modell. Software 22 (2007), pp. 1382–1387. [29] T. Kohonen, Self-organizing maps, 3rd extended edn, Springer, Berlin, 2001. [30] B. Arnonkijpanich, A. Hasenfuss, and B. Hammer, Local matrix adaptation in topographic neural maps, Neurocomputing 74 (2011), pp. 522–539. [31] M. Cottrell, B. Hammer, A. Hasenfuss, and T. Villmann, Batch and median neural gas, Neural Networks 19 (2006), pp. 762–771.

Comparing feed-forward versus neural gas as estimators: application to coke wastewater treatment.

Numerous papers related to the estimation of wastewater parameters have used artificial neural networks. Although successful results have been reporte...
476KB Sizes 0 Downloads 0 Views