342

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 2, FEBRUARY 2012

Brief Papers Adaptive Computation Algorithm for RBF Neural Network Hong-Gui Han, Member, IEEE, and Jun-Fei Qiao, Member, IEEE

Abstract— A novel learning algorithm is proposed for nonlinear modelling and identification using radial basis function neural networks. The proposed method simplifies neural network training through the use of an adaptive computation algorithm (ACA). In addition, the convergence of the ACA is analyzed by the Lyapunov criterion. The proposed algorithm offers two important advantages. First, the model performance can be significantly improved through ACA, and the modelling error is uniformly ultimately bounded. Secondly, the proposed ACA can reduce computational cost and accelerate the training speed. The proposed method is then employed to model classical nonlinear system with limit cycle and to identify nonlinear dynamic system, exhibiting the effectiveness of the proposed algorithm. Computational complexity analysis and simulation results demonstrate its effectiveness. Index Terms— Adaptive computation algorithm, modelling, nonlinear systems, radial basis function neural networks.

I. I NTRODUCTION The modelling of the nonlinear dynamical systems has received considerable attention in the recent years, since it is an indispensable step toward controller design of nonlinear systems [1]. In many practical situations, however, it is infeasible to obtain an accurate mathematical model of the system due to the lack of knowledge of some parameters. Thus, the controller should take an appropriate action to counteract the presence of uncertainties. For the controller to perform well, the main objective of the dynamical system identification is to construct a model, based on the experimental observations, that reproduces the dynamics of the underlying system as faithfully as possible. Due to their simple topological structure and universal approximation ability, radial basis function (RBF) neural networks have been widely used in nonlinear system modelling Manuscript received May 20, 2011; revised October 26, 2011; accepted October 29, 2011. Date of publication December 19, 2011; date of current version February 8, 2012. This work was supported in part by the National 863 Scheme Foundation of China under Grant 2009AA04Z155 and Grant 2007AA04Z160, the National Science Foundation of China under Grant 61034008 and Grant 60873043, the Ph.D. Program Foundation from the Ministry of Chinese Education under Grant 200800050004, the Beijing Municipal Natural Science Foundation under Grant 4092010, and the Funding Project for Academic Human Resources Development under Grant PHR(IHLB)201006103. The authors are with the College of Electronic and Control Engineering, Beijing University of Technology, Beijing 100124, China (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2011.2178559

and control [2]–[7]. For an RBF network, the adjustable parameters are the centers, widths and output weights. In modelling of the dynamic systems with the use of the RBF neural network, the parameter optimizations are important issues [8]. In order to own the suitable parameters, there are many online (e.g., [9]) and off-line algorithms for training RBF neural networks. For the off-line algorithms, some researchers adopt a two-stage parameter optimization procedure, i.e., unsupervised learning of centers and widths of hidden neurons, followed by separate supervised learning of the linear output weights [10], [11]. In addition, some others utilize supervised learning methods to optimize all the network parameters [12]–[14]. Among these algorithms the gradient-based backpropagation (BP) training algorithms [15] and the recursive least squares (RLS) training algorithms [16] are perhaps the most popular. It is well known that the BP training algorithms may have a slow convergence in practice, and the searching for the global minimum point of a cost function may be trapped at local minima during gradient descent [17]. Also, if a network has large bounded input disturbances, the global minimum point may not be found. Therefore, the fast error convergence and strong robustness of the neural network with the BP algorithms may not be guaranteed. Compared to the BP algorithms, the RLS algorithms have a faster convergence speed. However, the RLS algorithms involve more complicated mathematical operations and require more computational resources than the BP algorithms [18]. In order to avoid the aforementioned problems, some fast and advanced training algorithms have been developed. Kaynak et al. [19] proposed a sliding mode control-based BP training algorithms. This adaptive learning scheme has been used to train the neural networks with good convergence and robustness. Nevertheless, though some estimates have appeared, a complete and accurate analysis of the computational complexity is lacking. Jiang et al. [20] proposed a variablelength sliding window blockwise least squares (VLSWBLS) algorithm which can outperform the RLS with forgetting factors. The VLSWBLS has both good tracking ability for abrupt parameter changes and high accuracy for parameter estimate at the steady-state. However, the computational burden of the VLSWBLS is still heavy. Recently, Li and Peng [21] proposed a fast recursive algorithm (FRA) for nonlinear dynamic system identification using linear-in-theparameters models. And lately they developed a continuous forward algorithm (CFA) for both network construction and parameter optimization [22]. Although both of the FRA and the CFA algorithms lead to significantly improved modelling performance, they require significantly reduced memory storage and computational complexity than conventional optimizing methods. Little has been done to examine the convergence.

2162–237X/$26.00 © 2011 IEEE

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 2, FEBRUARY 2012

Aiming at developing a fast, accurate and convergent parameter optimization algorithm, a new algorithm, which is called adaptive computation algorithm (ACA), is presented in this brief. The proposed method uses the novel computation which can reduce memory usage and computational complexity. Besides, the convergence of the ACA is guaranteed by the Lyapunov criterion. The modelling error is uniformly ultimately bounded (UUB) and the output weight can be convergent to the “optimal” weight. Through the theoretical analysis and simulation experiments, the model performance can be significantly improved through the ACA. The outline of this brief is as follows. Section II introduces the RBF neural networks, it also describes the nonlinear system modelling using the RBF networks. Section III discusses and analyzes the ACA by implementing the Lyapunov criterion. Section IV presents computational complexity analysis. The experimental results on the performance of the proposed algorithm versus other similar algorithms are given in Section V. Finally, Section VI concludes this brief. II. P ROBLEM F ORMULATION The nonlinear dynamical systems considered in this brief are described by the following differential equation (multiinput and single output) [4]: y˙ (t) = f (y(t), x(t)) y(t0 ) = y0

(1)

where y(t) and x(t) are the output and input of the dynamical system at time t, respectively. The function f (·, ·)is assumed to be unknown. To clearly and easily discuss the convergence, we express (1) in the following form: y˙ (t) = −y(t) + g(y(t), x(t)) y(t0 ) = y0 and

(2)

343

y

Nonlinear systems

x

e

− RBF network

Fig. 1.

+

yˆ −



(3)

A single-output extended RBF neural network with K hidden layer neurons can be described by [23] y˙ˆ (t) = − yˆ (t) + g(y(t), ˆ x(t))

(4)

where y(x) denotes the output of the extended RBF neural network, and g is g(y(t), ˆ x(t)) =

K 

wk θk (y(t), x(t))

(5)

k=1

where u = (y(t), x(t))T is the input of the network, u ∈  M×1 , M is the number of input variables, W = [w1 , w2 , …, w K ] is the connecting weights between the hidden neurons and the output layer, θ k (u) is the output value of the kth hidden neuron, and  2 (6) θk (u) = e(−u−μk  σk ) μk denotes the center vector of the kth hidden neuron, and ||u-μk || is the Euclidean distance between u and μk , σ k is the radius or width of the kth hidden neuron. θ (y(t), x(t)) = [θ1 , θ2 , . . . , θ K ]T ∈  K ×1 is the hidden neurons’ output matrix, and K is the number of hidden neurons.



Principle of the ACA process.

Then, network training is required to optimize μ, σ , and w to minimize the sum squared-error (SSE) E(t) = ( yˆ (t) − y(t))T ( yˆ (t) − y(t))

(7)

as well as decrease the modelling error e(t) of the RBF neural network to zero e(t) = yˆ (t) − y(t) (8) y(t) is the output of the neural network and y(t) is the system output for the current input sample x at time t. III. ACA For our proposed ACA, the principle of the process is depicted in Fig. 1. The training algorithm for μ and σ is similar to the CFA in [22], and it is omitted here. The output y(t) of the dynamical system is calculated by the input x(t). The RBF network with multi-inputs and one output will be used to model the unknown function, equivalently, the unknown dynamical system (1). The function g(t) can be rewritten as [24] g(t) ˆ = Wθ(y(t), x(t))

g(y(t), x(t)) = y˙ (t) + y(t) = y(t) + f (y(t), x(t)).

+

(9)

where W = [w1 , w2 , . . . , w K ] ∈ 1×K is a weight matrix, θ is the hidden neurons’ output matrix. An RBF neural network has a simple network structure in terms of the direction of information flow. Since the performance of an RBF neural network is heavily dependent on its parameters adjusting, research has focused on fast and effective methods that can be used to train the parameters of three-layered RBF neural networks. If there exists W∗ such that g(y(t), x(t)) = W∗ θ (y(t), x(t))

(10)

where each element of W∗ is a constant. The adaptation parameter error is defined as  = W − W∗ . Before the new ACA is introduced, a parameter matrix θˆ is given as θˆ θ T = I. Having established parameter matrix θˆ , the new ACA can now be derived. The training rule for the weight W is ˙ T (t) = ηθ (y(t), x(t))e(x(t)) − λθˆ (y(t), x(t))e(x(t)) (11) W where η > 0 is the learning rate for connecting weights, λ > 0 is the penalty coefficient. The proposed algorithm for RBF neural network training can be summarized in Table I.

344

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 2, FEBRUARY 2012

TABLE I D ETAILS OF T RAINING P ROCESS For all samples % Forward computation for all hidden neurons calculate outputs of hidden neurons; end calculate RBF neural network output; calculate error; % Backward computation for all weights of the hidden neurons adapt the weights; end end

From (6), we have

  V˙ (e) ≤ − e2 + e W∗ − W ≤ − e (e − α)

%(6) %(4) %(8)

%(11)

Remark 1: In the ACA described above, if λ = 0, this weight update algorithm (12) can be shown as the generalized least mean square (LMS) algorithm [15]. In fact, the smaller η is, the more accurate the LMS performance and the longer the memory span over where the LMS algorithm remembers past data will be. However, the convergence rate of the algorithm is slow. Our motivation has been to achieve faster convergence and longer memory span without sacrificing the simplicity of LMS algorithm. Note that the modelling error is defined by (8). Combining (2) and (7), the modelling error dynamics are obtained e(t) ˙ = y˙ˆ (t) − y˙ (t) = − yˆ (t) + g(y(t), ˆ x(t)) + y(t) − g(y(t), x(t)) = − e(t) + g(y(t), ˆ x(t)) − g(y(t), x(t))

(12)

where e(t0 ) = e0 . In our main theoretical result, the following assumptions are made. A1) The training samples u are the bounded sequences of independent identically distributed random vectors. A2) The weight matrix W is bounded. Lemma 1: Suppose that assumption A2) is valid, then ||W∗ –W || < α. Proof: Because each element of W∗ is a constant. According to assumption A2), there exists a positive real number α for || W∗ –W || < α. The proof is simple and omitted. Theorem 1: Given the modelling error dynamics defined by (12), for which A1) and A2) are satisfied, if ||e(t0 )|| < e p (e p > 0 is a given real number), there exists a positive real number el , then ||e(t)|| < el . The approximation error e(t) is UUB. Proof: Consider now the Lyapunov function candidate V (e) = (1/2)e2. The time derivative V of the evaluation on the solutions of (12) is V˙ (e) = ee˙ = e(−e + (g(y(t), ˆ x(t)) − g(y(t), x(t)))) 2



= −e + e(W θ (y(t), x(t)) − Wθ (y(t), x(t)))   ≤ − e2 + e W∗ θ (y(t), x(t)) − Wθ(y(t), x(t))   (13) = − e2 + e W∗ − W θ (y(t), x(t)) .

(14)

which implies the following [25, p. 211]. 1) If ||e(t)|| > α, V˙ (e) < 0, and therefore ||e(t)|| < e p . 2) If ||e(t)|| = α, V˙ (e) = 0, and therefore ||e(t)|| = e p . 3) If ||e(t)|| < α, V˙ (e) > 0, the solution may reach the sphere ||e(t)|| = α or ||e(t)|| > α at some time. However, when ||e(t)|| = α or ||e(t)|| > α, V˙ (e) will be non-positive, and therefore ||e(t)|| ≤ e p . Therefore, the solutions of the differential equation (12) governing the approximation error dynamics are uniformly bounded. Theorem 2: Let W∗ be the “optimal” constant weight matrix such that g(y(t), x(t)) = W∗ θ (y(t), x(t)), the RBF weight adaptation strategy is defined as (12), then the origin of the augmented (e, )-space is stable and e(t) → 0 as t → ∞. Proof: Because  ∈  K ×1 , its Euclidean norm satisfies  = 2

1 K  

  ϑi2j = trace T

(15)

i=1 j =1

and we have 1 K     d  ϑi j ϑ˙ i j =2 trace T dt i=1 j =1   ˙T = 2trace 

(16)

˙ (differential d to be made roman). ˙ = d/dt = W. where  To prove the convergence of the proposed algorithm, we consider the following Lyapunov function candidate    1 2 1 T e + trace  . (17) V (e, ) = 2 η The error dynamics equation (12) can be represented as e(t) ˙ = −e(t) + g(y(t), x(t)) − g(y(t), ˆ x(t)) = −e(t) + W∗ θ (y(t), x(t)) − Wθ (y(t), x(t)) = − e(t) + (W∗ − W)θ (y(t), x(t)) = −e(t) − θ (y(t), x(t)).

(18)

Then, the time derivative of V is evaluated as follows:   1 ˙T V˙ (e, ) = ee˙ + trace  η   1 ˙T = e(−e − θ (y(t), x(t))) + trace W η 2 = −e − eθ (y(t), x(t))   1 ˙T . (19) + trace W η From the definition and the property of the trace operator, the following equation can be obtained: eθ (y(t), x(t)) = trace (eθ (y(t), x(t))) = trace(θ (y(t), x(t))e).

(20)

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 2, FEBRUARY 2012

TABLE II C OMPUTATIONAL C OST OF D IFFERENT A LGORITHMS FOR

It can be seen that due to its re-calculating feature, the complexities of the CFA and the MGS are O(N 2 ). Meanwhile, the recursive nature enables the complexity of VLSWBLS to be reduced considerably from O(N 2 ) in the CFA to O(N) in each estimate update. However, the computation burden of the VLSWBLS remains to be heavier than that of the ACA. Remark 3: In practice, since L is far greater than 1 (a.k.a, L >> 1,) the computation for ACA can lead to the reduction of the computation time. Therefore, obviously, the computational effort of ACA is significantly reduced.

O UTPUT W EIGHT W Algorithms ACA MGS CFA VLSWBLS

Computation Cost M×K×N M×K×N×N M×K×N×N M × K × N × L2

L is the length of sliding window, 1 < L

Adaptive computation algorithm for RBF neural network.

A novel learning algorithm is proposed for nonlinear modelling and identification using radial basis function neural networks. The proposed method sim...
371KB Sizes 0 Downloads 1 Views