IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 2, FEBRUARY 2012

247

Concurrent Subspace Width Optimization Method for RBF Neural Network Modeling Wen Yao, Xiaoqian Chen, Yong Zhao, and Michel van Tooren

Abstract— Radial basis function neural networks (RBFNNs) are widely used in nonlinear function approximation. One of the challenges in RBFNN modeling is determining how to effectively optimize width parameters to improve approximation accuracy. To solve this problem, a width optimization method, concurrent subspace width optimization (CSWO), is proposed based on a decomposition and coordination strategy. This method decomposes the large-scale width optimization problem into several subspace optimization (SSO) problems, each of which has a single optimization variable and smaller training and validation data sets so as to greatly simplify optimization complexity. These SSOs can be solved concurrently, thus computational time can be effectively reduced. With top-level system coordination, the optimization of SSOs can converge to a consistent optimum, which is equivalent to the optimum of the original width optimization problem. The proposed method is tested with four mathematical examples and one practical engineering approximation problem. The results demonstrate the efficiency and robustness of CSWO in optimizing width parameters over the traditional width optimization methods. Index Terms— Concurrent subspace width optimization, coordination, decomposition, radial basis function neural network.

I. I NTRODUCTION

F

OR A COMPLEX SYSTEM (e.g., spacecraft), it is generally computationally prohibitive to implement system design optimization, as the system simulation or disciplinary analysis is computationally very expensive and timeconsuming. An effective way to solve this problem is to utilize approximation models to replace high-fidelity models so as to make a trade-off between accuracy and efficiency [1], [2]. The radial basis function neural network (RBFNN) model is one of the most suitable methods in approximating highly nonlinear functions [3]. Especially for cases with samples obtained by deterministic computer experiments, which are

Manuscript received March 21, 2011; revised October 26, 2011; accepted October 26, 2011. Date of publication January 3, 2012; date of current version February 8, 2012. This work was supported in part by the National Natural Science Foundation of China under Grant 50975280 and Grant 61004094, the Program for New Century Excellent Talents in University of Ministry of Education of China under Grant NCET-08-0149, the Fund of Innovation by Graduate School of National University of Defense Technology under Grant B090102, and the Hunan Provincial Innovation Foundation for Postgraduates. W. Yao, X. Chen, and Y. Zhao are with the College of Aerospace and Materials Engineering, National University of Defense Technology, Changsha 410073, China (e-mail: [email protected]; [email protected]; [email protected]). M. van Tooren is with the Faculty of Aerospace Engineering, Delft University of Technology, Delft 2629HS, The Netherlands (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2011.2178560

identical each time the simulations are repeated, RBFNN interpolation is feasible and applicable. In the classical RBFNN interpolation model [4], [5], the RBF centers are defined the same as the training sample points. With a given training data set, the adjustable options to enhance the RBFNN performance include RBF forms and the associated width parameters and weights. The theoretical investigation and practical results suggest that the choice of the basis function is not crucial to the RBFNN performance [3]. Besides, with fixed basis function forms and width parameters, weights can be analytically calculated under interpolation conditions. Hence, width parameters are of most importance in determining the approximation accuracy. As summarized in [2], present research on width design and optimization mainly includes three categories. 1) The widths of all the RBFs are set to the same value, which can be estimated with heuristic methods, for example, estimated according to the average of the Euclidean distances between training data points [3], [6], defined to ensure the regression matrix to be strictly diagonally dominant [7], or obtained with optimization [8], [9]. 2) The widths are treated differently and designed with heuristic methods. Widely used approaches include estimating the width of each center according to the distances between this center and its neighbor points [10]– [12], or directly computing each width to ensure the regression matrix for weight estimation to be of diagonal dominance feature [13]. Another popular method is to divide the training set into several clusters, and assign a width value to each cluster based on the spatial distribution of the centers in this cluster [14], such as the standard deviation [15], [16]. But in [2], it is demonstrated that not only the spatial distribution features, but also the nonlinearity characteristics of the accurate model have great effect on the width estimation. Thus, it is proposed to estimate the width of each center based on the local spatial distribution of the training data around it and the second derivative of the target model at this center point [2]. 3) The widths of all the RBFs are treated as independent variables and designed respectively with supervised learning or optimization methods, for example, online learning with extended Kalman filter and offline learning with Levenberg–Marquardt estimator [17], gradient descent learning procedure [18], [19], RBFNN adaptation procedure based on the evolution of units and fuzzy rules [20], minimum description length method

2162–237X/$31.00 © 2012 IEEE

248

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 2, FEBRUARY 2012

[21], continuous forward algorithm [22], genetic algorithms (GA) [23]–[27], differential evolution approach [28], particle swarm optimization (PSO)-based hybrid learning algorithm [29], and so on. The first category is the easiest to implement with a single width parameter, but this simplification could greatly limit the accuracy of RBFNN. The second category uses heuristic methods to estimate widths by exploiting the spatial distribution features of the training set and the nonlinearity features of the target model. However, the estimation can only be considered a better width scheme with more knowledge obtained about the target model compared with a blindly given width scheme without any a priori knowledge. The third category is the most complicated but also the most promising method to obtain the optimal widths and build the most accurate RBFNN model. Nevertheless, the optimization result is greatly affected by the optimization solver, which may be initial baseline sensitive and fail to converge to the global optimum, especially when the optimization variable number is large and the objective function is highly nonlinear. GA and PSO have good global searching capabilities, but require intensive computation especially for the complex network with high dimensional search space. To improve optimization efficiency, the heuristically estimated width scheme can be used as the baseline for optimization [2]. A more effective way is to reduce the optimization variable number by grouping the training samples into several clusters and setting the widths of centers in the same cluster to the same value. However, this multivariable optimization may still be computationally expensive if a large number of training points are involved, as it entails large-scale matrix manipulation, which may be easily ill-conditioned. Hence, it is motivated to study an effective and viable method to solve the complex width optimization problem. In this paper, a width optimization method, concurrent subspace width optimization (CSWO), is proposed based on a decomposition and coordination strategy. The large-scale optimization problem is decomposed into several subproblems, so-called subspace optimization (SSO), each of which has a single optimization variable and smaller training and validation data sets to manipulate. Furthermore, these SSOs can be solved concurrently to further reduce computational time. With a system coordination strategy, the optimization of SSOs can converge to a consistent optimum, which is equivalent to the optimum of the original width optimization problem. The rest of this paper is organized as follows. In Section II, the RBFNN interpolation model is introduced briefly, and the mathematical formulation of width optimization is stated. It is noteworthy that the width optimization of Gaussian basis function is studied in this paper for illustration. The proposed CSWO method can be readily applied to other basis functions as well. In Section III, the CSWO method is described, especially the decomposition and coordination algorithms are expounded, and the equivalence between the optimums of CSWO and the original width optimization problem is proved. In Section IV, four mathematical tests and one practical engineering test are used to verify the proposed method, and the results are discussed in detail. This paper is closed with

some concluding remarks, as well as highlights for future research. II. RBFNN I NTERPOLATION M ODEL AND THE W IDTH O PTIMIZATION P ROBLEM A. RBFNN Interpolation Model In a classical RBFNN interpolation model [4], [5], the network has one hidden layer. The output of each hidden neuron with Gaussian basis function is defined as   −r (x, ck )2 , 1 ≤ k ≤ NT φk (x, ck ) = exp σk2  n 1 2   2 x i − cki r (x, ck ) = x − ck  = (1) i=1

where x ∈ R n is the input vector with the i th element denoted as x i , ck ∈ R n are the RBF centers, which are also known as neuron centers in the hidden layer, and NT is the number of centers. φk (·) is a Gaussian basis function from R + to R with respect to the radial distance r between x and ck , and σk is the width parameter associated with φk (·). The output of the network is the linear sum of the outputs of the hidden neurons. So, the function f (x): R n → R is approximated as fˆ(x) =

NT 

wk φk (x, ck )

(2)

k=1

where wk are the output layer weights. Herein, only the scalar output case is considered, which can be easily extended to multi-output case if the output elements are independent. In this interpolation model, the RBF centers are the same as the training points, and the RBFNN output at each center is the same as its known function value as (3) fˆ(ck ) = f (ck ), 1 ≤ k ≤ NT . NT , (2) Given the training set T = {(ck , yk ) : yk = f (ck )}k=1 and (3) for all the training samples can be concisely expressed in the matrix form as

w = FT ⎡

⎤ φ2 (c1 , c2 ) · · · φ NT (c1 , c NT ) ⎢ φ2 (c2 , c2 ) · · · φ NT (c2 , c NT ) ⎥ ⎥ ⎢ =⎢ ⎥ .. .. .. ⎦ ⎣ . . . φ1 (c NT , c1 ) φ2 (c NT , c2 ) · · · φ NT (c NT , c NT ) φ1 (c1 , c1 ) φ1 (c2 , c1 ) .. .

w = [w1 w2 · · · w NT ]T FT = [ f (c1 ) f (c2 ) · · · f (c NT )]T .

(4) NV f (xq )}q=1 ,

Given the validation set V = {(xq , yq ): yq = the RBFNN model is trained with the criterion to minimize the regularized sum-of-squares error defined as Obj = (1 − λ)eT e + λwT w

(5)

where λ is the regularization parameter, w is the weight vector defined in (4), and e is the error vector defined as e = FV − Fˆ V FV = [ f (x1 ) f (x2 ) · · · f (x NV )]T Fˆ V = [ fˆ(x1 ) fˆ(x2 ) · · · fˆ(x NV )]T.

(6)

YAO et al.: CONCURRENT SUBSPACE WIDTH OPTIMIZATION METHOD FOR RBF NEURAL NETWORK MODELING

The second term of Obj in (5) is used to penalize large weights so as to avoid over-fitting, and the regularization parameter λ can be adapted to improve the smoothness and accuracy of the approximation model [9], [30]. In this paper, λ is fixed as a constant during approximation modeling for simplicity. With given width settings, (4) becomes a linear equation and the weight vector w can be computed directly either with the inversion of  or with other matrix operations, for example, lower-upper triangular (LU) or Cholesky decomposition. Hence, the adjustable parameters in RBFNN are widths σk , and RBFNN training is essentially a width optimization problem. B. Width Optimization Problem For complex highly nonlinear approximation problems, a large number of training samples are needed to provide sufficient information to approximate the target model. If the width of each center is treated as an independent variable, the number of optimization variables is the same as that of RBFNN centers, which results in great optimization complexity. To simplify the optimization problem, the general method in the literature is to reduce the optimization variable number, for example, setting all the widths with the same value and transform (9) into a single-variable optimization problem. However, using a single width to describe the target model is far from enough to meet the accuracy requirement, especially when the training samples are not uniformly distributed or the nonlinearity of the model changes greatly in different regions [2]. Considering that the neighboring centers located in the same region generally share similar spatial distribution characteristics and model nonlinearity features, it is more feasible to group the centers into several clusters and assign the same width value for the centers within the same cluster. Thus, the optimization variable number of (9) is reduced to the cluster number, and the approximation accuracy and modeling efficiency can be balanced. Assume that the training data set is grouped into N S nonoverlapping clusters, and each cluster is denoted as T Si = {cq }, T = {T Si }, i = 1, 2, . . . , N S , q ∈ IT Si

(7)

where IT Si is the index set of cluster T Si with NT Si elements, j and IT Si is the j th element of IT Si representing the index number of the j th center in T Si . The index numbers in IT Si are consistent with those of the centers numbered in the training set T . For example, if T Si = {c1 , c5 , c7 } , then IT Si = {1, 5, 7}, and IT2 Si = 5. The index sets for all the clusters compose the clustering scheme denoted as IT S = NS {IT Si }i=1 . In this paper, the criterion to cluster the training data set is to minimize the sum of the distances between the training points and their corresponding cluster centroid over N S clusters, and the k-means clustering method is employed [31]. Herein, the centroid of each cluster is defined such that each of its coordinate is the average of coordinates of the same dimension of points in this cluster. With IT S , the centers in the same cluster i are assigned with the same width σ Si as ∀k ∈ IT Si , σk = σ Si , i = 1, 2, . . . , N S .

(8)

249

Hence, the optimization variables only include N S widths NS , which can effectively parameters denoted as σ S = {σ Si }i=1 reduce the optimization complexity. With the given training data set T , the validation data set V , and the clustering scheme IT S , the RBFNN modeling is actually a constrained width optimization problem stated as find

σ S = [σ S1 σ S2 . . . σ S N S ]

min

Obj = (1 − λ)eT e + λwT w

s.t.

σ S ∈ S σ = [σ1 σ2 . . . σ NT ] ∀k ∈ IT Si , σk = σ Si , i = 1, 2, . . . , N S (σ )w = FT , e = FV − Fˆ V (σ , w)

(9)

where  S is the design space of σ S . For Gaussian basis function, Obj is an even function with respect to σ S , so only the positive values are considered to reduce the optimization search region. The upper limits of σ S are defined such that the matrix  is not severely ill-conditioned so as to obtain stable weight vector by solving (4). At each search point during optimization, a set of width values is given, based on which the weight parameters are calculated and the RBFNN model is built to check the optimization objective with respect to the regularized approximation accuracy. With a fixed number of optimization variables, the computational cost is mainly attributed to the calculation of the weight vector by solving (4) at each search point, especially when the size of  is large. For an optimization problem with N S variables and NT centers, the computational complexity [32] can be estimated as

 C = O NT3 × O (g(N S )) (10) where O(NT3 ) is the approximate complexity of calculating weight vector with (4) (either with −1 or Cholesky/LU decomposition), O(g(N S )) is the approximate objective function evaluation number in optimization to obtain the optimum, which depends on the specific optimization solver and the optimization problem features (variable number, convex or concave, etc.). Generally, for the same optimization problem with the same solver, objective evaluation number g(·) increases dramatically as the variable number increases. From (10) it can be clearly seen that the computational cost can be greatly reduced with smaller NT and N S . Generally, NT and N S are fixed to build the RBFNN model with required accuracy. Therefore, it is motivated to reduce the computational cost by decomposing the large-scale optimization problem into several subproblems, each of which has less optimization variables and smaller matrix to manipulate. With certain coordination strategy, the optimization of the subproblems can converge to a consistent optimum, which is equivalent to the optimum of the original width optimization problem (9). III. CSWO M ETHOD To improve the computational efficiency in solving the width optimization problem stated in (9), a CSWO method is proposed based on a decomposition and coordination strategy. The large-scale optimization problem is decomposed into

250

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 2, FEBRUARY 2012

several SSO problems, which can be solved concurrently under guidance from system coordination so as to converge to a consistent optimum. In contrast to CSWO, the original width optimization is termed as system width optimization (SWO) as all the width parameters are optimized together within one optimization formulation. A. Decomposition Assume the original width optimization problem is decomposed into N S SSOs. The training set is also grouped into N S nonoverlapping clusters, and each cluster is denoted as (7). For the center c j of the cluster T Si , the network output of this center can be divided into two parts: the weighted sum of outputs from the hidden neurons with centers in the cluster T Si , and the weighted sum of outputs from the hidden neurons with centers outside the cluster T Si , which can be stated as   fˆ(c j ) = f (c j ) = wk φ(c j , ck ) + wk φ(c j , ck ). k∈ / I T Si

k∈IT Si

(11) The first term on the right-hand side of (11) can be viewed as the total effect of centers in T Si on c j , and the second term can be viewed as the total effect of centers outside T Si on c j . For all the centers of T Si , (11) can be written in matrix form as T Si wT Si + T S¯ i wT S¯i = FT Si T Si = (φ j k ) NT Si ×NT Si , j, k = 1, 2, . . . , NT Si ⎛  2 ⎞     cI j − cI k  ⎟ ⎜−   T Si  ⎟ T Si ⎜ = exp ⎜ φ j k = φ cI j , cI k ⎟ 2 T Si ⎠ ⎝ T Si σ Si

(12)

(13)

 T S¯ i = T S¯ i1 T S¯ i2 · · · T S¯ i(i−1)  T S¯ i(i+1) · · · T S¯ i N

T S¯ i j = (φkl ) NT Si ×NT S j , k = 1, 2, . . . , NT Si , l = 1, . . . , NT S j ⎛  2 ⎞    ⎜ − c ITk S − c ITl S   ⎟ i j ⎟ ⎜ φkl = φ(c I k , c I l ) = exp ⎜ (14) ⎟ 2 T Si T Sj ⎠ ⎝ σS j w T Si

 = wI 1

wT S¯ i =



T Si

(17) The criterion for validation set clustering is to group the validation point xq into the cluster V Si if its distance to the centroid M Si of the training cluster T Si is the shortest among its distances to the centroids of all the training clusters, which can be stated as       V Si = xq |xq ∈ V, xq − M Si  = min xq − M S j  . j =1,...,N S

(18) For the validation point xq of the cluster V Si , the network output of this point can also be divided into two parts: the weighted sum of outputs from the hidden neurons with centers in cluster T Si and the weighted sum of outputs from the hidden neurons with centers outside T Si as   wk φ(xq , ck ) + wk φ(xq , ck ). (19) fˆ(xq ) =

w I 2 · · · w I NT Si T Si

k∈ / I T Si

k∈IT Si

For all the points of the cluster V Si , (19) can be written in matrix form as V Si wT Si + V S¯i wT S¯ i = Fˆ V Si = FV Si − eV Si

(21)

V S¯ i = V S¯i1 V S¯ i2 · · · V S¯i(i−1)  V S¯i(i+1) · · · V S¯ i N S

NV Si ×(NT −NT Si )

V S¯ i j = (φkl ) NV Si ×NT S j , k = 1, 2, . . . , NV Si , l = 1, . . . , NT S j

φkl

T Si

 2 ⎞      ⎜ − x IVk S − c ITl S   ⎟ i j ⎟ ⎜ = φ xI k , cI l = exp ⎜ ⎟ V Si T Sj ⎠ ⎝ σ S2j ⎛

wTT S1 wTT S2 · · · wTT S(i−1)

 wT S j = w I 1

T Sj

wI 2

T Sj

···w

S

NT S j IT S j

T Si

eV Si = FV Si − Fˆ V Si       · · · fˆ x Fˆ V Si = fˆ x I 1 fˆ x I 2

1×(NT −NT Si )

T

V Si

j = 1, . . . , N S , j = i

      = f cI 1 f cI 2 ··· f c T Si

(22)

T



T . NT Si

IT S

i

(20)

V Si = (φ j k ) NV Si ×NT Si , j = 1, 2, . . . , NV Si , k = 1, 2, . . . , NT Si ⎛  2 ⎞      ⎜ − x IVj S − c ITk S   ⎟ i ⎟ ⎜ i = exp ⎜ φ j k = φ xI j , cI k ⎟ 2 T Si ⎠ ⎝ V Si σ Si

T

wTT S(i+1) · · · wTT S N

F T Si

V Si = {xq }, V = {V Si }, i = 1, 2, . . . , N S , q ∈ IV Si.



NT Si ×(NT −NT Si )

S

As in the clustering of the training set, the validation set is also grouped into N S nonoverlapping clusters

(15) (16)

F V Si

V Si

V Si

N IV SV Si i

      = f xI 1 f xI 2 ··· f x V Si

T N IV SV Si i

T .

(23)

Given the fixed matrices T S¯ i , V S¯i , and the weight vector wT S¯ i , as well as the training set T Si and the validation set

YAO et al.: CONCURRENT SUBSPACE WIDTH OPTIMIZATION METHOD FOR RBF NEURAL NETWORK MODELING

251

V Si , the local width optimization problem of SSOi is

Start

find σ Si min Obj Si (σ Si ) = (1 − λ)eTV Si eV Si + λwTT Si wT Si s.t. σ Si ∈  Si T Si (σ Si )wT Si = FT Si _Si eV Si = FV Si _Si − Fˆ V Si _Si (σ Si , wT Si )

Decompose the optimization problem into {SSOi}, i  1, …, Ns Initialization (0)

(24)

w

σ(0) s

t0 (0) , w TS ,  TS , VS , NN(0) , Obj(0) (0)

(0)

i

i

i

where

tt+1

FT Si _Si = FT Si − T S¯i wT S¯ i FV Si _Si = FV Si − V S¯ i wT S¯i         T ˆ ˆ ˆFV Si _Si = V Si wT Si = fˆSi x 1 f Si x I 2 · · · f Si x NV Si IV S IV S V Si i i ⎛  2 ⎞   x − cI k  N T Si ⎟ ⎜−   T Si  ⎟ ⎜  ˆf Si (x) = (25) w I k exp ⎜ ⎟. 2 T Si ⎠ ⎝ σ Si k=1 As FT Si _Si and FV Si _Si are treated as constants in (24), they can be viewed as the local target response values of the local training and validation points, respectively. Hence, the mathematical formulation of (24) is essentially the same as that of (9), and SSOi can be treated as an independent width optimization and RBFNN modeling process, which builds the local approximation model fˆSi (x) based on the width σ Si and the weight vector wT Si . It is obvious that SSOi has only one optimization variable, and the sizes of the training and validation data sets are both reduced from NT and NV to NT Si and NV Si , respectively, which can greatly simplify the optimization complexity. B. Coordination

Concurrent subspace optimization SSONs (t – 1) Given w(tTS– 1), (tTS– 1) , VS

SSO1 (t – 1) Given w(tTS– 1), (tTS– 1), VS 1

1

find

σ(t)s1

min

Obj s1 (σ(t) s1)

w(t) (t) TS1

NS

min σ(t)s1

VS1

NS

NS

σ(t)sN S Obj sN (σ(t)sN ) S S

find

(t)

TS1

1

σ(t)sN

S

w(t) (t) TSNS

TSNS

(t)

VSNS

System coordination Check convergence No

Converge? Yes End Optimum σs*

Fig. 1.

Process flow Data flow

Flowchart of CSWO method for RBFNN modeling.

C. Algorithm Procedure

NT Step 0: Initialize the training set T = {(ck , yk )}k=1 and the NV validation set V = {(xq , yq )}q=1 . The design of experiment techniques for deterministic computer experiments can be employed to comprehensively obtain information all over the input space [33]. Step 1: Decompose the original width optimization problem into N S SSO problems. N S should be defined according to the desired approximation accuracy and available computational resources. Theoretically larger N S can bring higher accuracy, but higher computational burden may be entailed resulting from more complex convergence process for coordination among SSOs. Both the training and the validation sets are divided into N S nonoverlapping clusters. Denote the decomNS NS and IV S = {IV Si }i=1 . position schemes as IT S = {IT Si }i=1 Denote the i th SSO problem as SSOi . Step 2: Initialize the widths and the coordination parameters. (0) Give a set of width values for all the clusters denoted as σ S = (0) N S {σ Si }i=1 , and calculate the weight vector w(0) by solving (0) (4). Build the RBFNN model N N (0) with σ S and w(0) , (0) and calculate the objective (5) denoted as Obj . According (0) to IT S and IV S , calculate the coordination parameters T S¯ ,

Based on the preceding decomposition and coordination strategies, the CSWO method can be summed up as the following five steps, which are depicted in Fig. 1.

based on σ S and w(0) , and pass down these parameters to its corresponding SSOi . Denote the iteration number as t = 0.

To solve SSOi , the information from other SSOs, namely T S¯ i , V S¯ i , and wT S¯ i , should be firstly given, which are called coordination parameters. This is done by a system coordination procedure. Assume that initially a set of width values for all (0) N S the clusters is given and denoted as σ (0) S = {σ Si }i=1 . Then, (0) the weight vector w can be obtained by solving (4). With (0) (0) the given decomposition schemes IT S and IV S , T S¯ , V S¯ , i

i

for the i th SSO can be calculated based on σ (0) and w(0) S T S¯ i

and w(0) according to (12) to (23), and passed down to SSOi to optimize σ Si . The N S SSOs can be solved concurrently, (1) N S }i=1 and the optimization results are denoted as σ S(1) = {σ Si and passed upward to the system. The system weight vector is recalculated as w(1) by solving (4), and the coordination (1) (1) parameters for each SSOi can be updated as T S¯ , V S¯ , i

i

, based on which the SSOs can be reformulated and and w(1) T S¯ i a new set of widths can be optimized. The steps of system coordination and concurrent SSO are repeatedly iterated until the convergence is achieved.

(0)

i

(0)

V S¯ , and wT S¯ i

(0)

i

(i = 1, . . . , N S ) with (14), (15), and (22)

252

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 2, FEBRUARY 2012

TABLE I C OMPARISON OF W IDTH O PTIMIZATION R ESULTS IN Tests 1–5 Test

NT

NS

Methoda

λ

RMSE

RO

Iterative cycle numberb

Computational timec

Expected time with parallel computationd

Test 1

30 30 30 30

1 2 2 2

SWO (GA) SWO (SQP) SWO (GA) CSWO (SQP)

0.05 0.05 0.05 0.05

0.4688 0.2562 0.0790 0.1074

0.4823 0.2929 0.1221 0.1415

– – – 3

38.71 1.100 39.487 2.836

38.71 1.100 39.487 0.945

Test 2

36 36 36 36

4 4 6 6

SWO (SQP) CSWO (SQP) SWO (SQP) CSWO (SQP)

0.05 0.05 0.05 0.05

0.2360 0.2525 0.2327 0.2578

0.6014 0.6037 0.6005 0.6052

– 4 – 4

2.902 11.077 4.072 14.382

2.902 2.769 4.072 2.397

Test 3

36 36 36 36

4 4 6 6

SWO (SQP) CSWO (SQP) SWO (SQP) CSWO (SQP)

0.05 0.05 0.05 0.05

1.5598 1.5618 1.5162 1.5588

2.1869 2.1877 2.1243 2.1763

– 4 – 4

3.431 9.652 8.839 13.016

3.431 2.413 8.839 2.169

Test 4

300 300 300 300

20 20 20 20

SWO (SQP) SWO (GA) CSWO (SQP) CSWO + SWO (SQP)

0.05 0.05 0.05 0.05

0.9915 0.9414 0.9309 0.8602

0.9945 0.9449 0.9341 0.8713

– – 3 3+1

66.664 210.923 96.411 96.411 + 124.671

66.664 210.923 4.821 4.821 + 124.671

Test 5

200 200 200 200 200 200

1 5 5 20 20 20

SWO (SQP) SWO (SQP) CSWO (SQP) SWO (SQP) CSWO (SQP) CSWO + SWO (SQP)

0.05 0.05 0.05 0.05 0.05 0.05

8.5449 8.0949 8.7930 7.0705 8.6439 5.9711

0.2561 0.2562 0.2603 0.2196 0.2583 0.1902

– – 3 – 5 5+1

3.302 15.657 23.950 225.207 51.874 51.874 + 142.347

3.302 15.657 4.790 225.207 2.594 2.594 + 142.347

a The optimization solver is given in parenthesis. CSWO + SWO means CSWO is first executed to obtain a scheme, based on which SWO is

further executed.

b An iterative cycle means a full process through the steps of decomposition and coordination. For CSWO + SWO, it is the iterative cycle number

of CSWO plus one.

c Due to the hardware limitation, the computational time of CSWO is recorded as all the SSOs being solved sequentially rather than concurrently.

For CSWO + SWO, the computational time is recorded as the time of running CSWO plus the time of running SWO.

d The computational time expected to solve the width optimization problem with parallel computation is presented. For SWO, it is the same with

the actual recorded time. For CSWO wherein all the SSOs can be solved concurrently, it is estimated by dividing the recorded time of CSWO listed in the preceding column by N S .

Step 3: Concurrent SSO. Update the iteration number as t = t + 1. Formulate SSOi at iteration t as (t )

find σ Si



  (t ) (t ) T (t ) (t ) T (t ) min Obj Si σ Si = (1 − λ) eV Si eV Si + λ wT Si wT Si (t )

s.t. σ Si ∈  Si

 (t ) T Si σ Si w(tT )Si = F(tT −1) Si _Si

 (t ) (t −1) (t ) (t ) (t ) eV Si = FV Si _Si − Fˆ V Si _Si σ Si , wT Si

i

(26)

where (t −1)

(t −1)

(t −1)

FT Si _Si = FT Si − T S¯ wT S¯ (t −1)

i

(t −1)

i

(t −1)

All the SSOi (i = 1, . . . , N S ) can be solved concurrently, (t ) N S and the results σ (tS ) = {σ Si }i=1 are passed upward to the system. Step 4: System coordination. Solve (4) to calculate the weight vector w(t ) according to σ (tS ). Update the coordination ) (i = 1, . . . , N S ) for each parameters (tT )S¯ , (tV )S¯ , and w(t T S¯

FV Si _Si = FV Si − V S¯ wT S¯ i i      T  (t ) (t ) (t ) ˆF(t ) ˆ ˆ ˆ xI 1 · · · f Si x NV Si f Si x I 2 V Si _Si = f Si IV S V Si V Si i 2 ⎞ ⎛    N T Si x − c I k   T Si ⎟ ⎜ (t ) (t ) (27) fˆSi (x) = w k exp ⎝− 2 ⎠. IT S (t ) i k=1 σ Si

i

i

SSOi based on σ (tS ) and w(t ) . Step 5: Check convergence. Build the RBFNN model N N (t ) with σ (tS ) and w(t ) . Calculate the objective (5) denoted as Obj (t ). If Obj (t ) − Obj (t −1) / Obj (t −1) ≤ ε, then the convergence condition is satisfied and the iterative optimization procedure ends. ε is a predefined positive threshold (e.g., 0.0001). σ (tS ) is the optimal width vector denoted as σ ∗S , and N N (t ) is the optimal RBFNN model denoted as N N ∗ . Otherwise, go to step 3. Assume NC iterative cycles are needed to converge to the optimum. Herein, an iterative cycle means a full process through the steps of decomposition and coordination. Then, the computational complexity of CSWO can be estimated as



 CC S W O = NC · max O NT3 S1 , . . . , O NT3 S N · O (g (1)). (28)

YAO et al.: CONCURRENT SUBSPACE WIDTH OPTIMIZATION METHOD FOR RBF NEURAL NETWORK MODELING

calculated at σ ∗S . In the rest of this proof, all the derivatives are calculated at σ ∗S . For SSO j ( j = i ), the effects of the ∗ on the response values of the training width parameter σ Si and validation points are fixed. Therefore, the local error vector and weight vector of SSO j are not functional to σ Si , stated as   T

∇σ Si (1 − λ) eTV S j eV S j + λ w∗T S j w∗T S j = 0 (33)

1 0.8

y

0.6 0.4 Training data Accurate model RBFNN (SWO + GA) RBFNN (CSWO + SQP)

0.2 0

0

0.2

0.4

0.6

0.8

 ∇σ Si T S j w∗T S j + T S¯ j w∗T S¯ − FT S j = 0. j

1

x Fig. 2. Results of CSWO and SWO in T est 1 with decomposition scheme 1 and λ = 0.05.

Since max(O(NT3 Si )) is much smaller than O(NT3 ) and O(g(1)) is much smaller than O(g(N S )) (with the same optimization solver), CC S W O can be much smaller than that of SWO estimated with (10) as long as NC is not too big to counteract the efficiency resulting from concurrent SSO with a single variable and smaller-scale matrix operation.

In this section, we prove that the local optimum of CSWO is equivalent to the local optimum of the original width optimization SWO. Definition 1: Consistent solution of CSWO. ∗ and the correDenote the optimal width of SSOi as σ Si ∗ sponding weight vector as wT Si . Combine w∗T Si into a system weight vector w∗T S = [w1∗ w2∗ · · · w∗NT ]T as j

if k = IT Si , i = 1 . . . , N S , j = 1 . . . NT Si then wk∗ = w∗ j , w∗ j IT S

σ ∗S

∗ } NS {σ Si i=1

= If CSWO, it satisfies

with

w∗T S

j =1

+

NS  j =1

∈ w∗T Si .

(35)

i

is a consistent solution of

+ T S¯ i (σ ∗S¯ )w∗T S¯ i i i

i

T Si

T Si

(37)

From definition 1, we know that for the consistent solution w∗T S is a combination of w∗T Si , hence we have

= F T Si

(38)

j =1

(31)

i

N IT ST Si i

eTV S j eV S j = eT e.

NS T   T w∗T S j w∗T S j = w∗T S w∗T S .

Theorem 1: If CSWO converges to a consistent local opti∗ } N S , this optimum is also the local optimum mum σ ∗S = {σ Si i=1 of the original width optimization problem (9). Proof: Assume CSWO converges to the consistent opti∗ } N S with corresponding weight vector w∗ mum σ ∗S = {σ Si i=1 TS composed by w∗T Si with (29). For the constrained optimization SSOi , the first-order necessary Karush–Kuhn–Tucker (KKT) optimality condition is stated as    T ∇σ Si (1 − λ) eTV Si eV Si + λ w∗T Si w∗T Si

 +μ Si · ∇σ Si T Si w∗T Si + T S¯ i w∗T S¯ − FT Si = 0 (32) μI 2 · · · μ

(36)

which means the error vector in the local objective formulation of SSOi is a subset of the error vector in the objective of the original width optimization. Therefore

j =1

(30)

i

= FV Si − Fˆ V Si

NS 

∗ V Si (σ Si )w∗T Si + V S¯ i (σ ∗S¯ )w∗T S¯ = Fˆ V Si .

where μ Si = [μ I 1

w∗T S

As with is a consistent solution of CSWO, from (30) and (31) we can know that  ∗ ∗  , w T Si eV Si = FV Si _Si − Fˆ V Si _Si σ Si

  ∗ ∗ = FV Si − V S¯ i σ ∗S¯ w∗T S¯ − V Si σ Si w T Si

which implies that for any SSOi ∗ )w∗T Si T Si (σ Si

j

i

(σ S∗ )w∗T S = FT

j =1

 μ S j · ∇σ Si T S j w∗T S j + T S¯ j w∗T S¯ − FT S j = 0.

(29)

IT S

i

(34)

Denote the Lagrangian vector for SSO j ( j = i ) as μ S j . For all the SSO j ( j = 1, . . . , N S , j = i ), add (33) and (34) multiplied with corresponding μ S j to (32), which yields ⎡ ⎤ NS NS T   ∇σ Si ⎣(1 − λ) eTV S j eV S j + λ w∗T S j w∗T S j ⎦

σ ∗S

D. Equivalence Between CSWO and SWO

∀wk∗ ∈ w∗T S ,

253

] is the Lagrangian

vector, and the derivative of matrices T Si and T S¯i are

Besides, the third summation term in (35) can be rewritten as NS  j =1

 μ S j · ∇σ Si T S j w∗T S j + T S¯ j w∗T S¯ − FT S j

= μ · ∇σ Si

j

  w∗T S − FT

(39)

where μ is directly composed by μ Si (i = 1, . . . , N S ) the same way as w∗T S being composed by w∗T Si . Substitute (37)–(39) into (35) and yield   T  ∇σ Si (1 − λ) eT e + λ w∗T S w∗T S   +μ · ∇σ Si w∗T S − FT = 0. (40) As (40) holds for all the σ Si (i = 1, . . . , N S ), it indicates that μ can be treated as a Lagrangian vector, which makes the

254

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 2, FEBRUARY 2012

1.2

Test 2: 2-D function [36]

1 0.8

y

0.6 0.4 0.2

Training data Accurate model RBFNN (λ  0.0) RBFNN (λ  0.05)

0 −0.2

0

0.2

0.4

0.6

0.8

1

x Fig. 3.

Results of CSWO in T est 1 with decomposition scheme 2.

first-order necessary KKT optimality conditions of the original width optimization problem (9) hold at σ ∗S . Therefore, the local optimum σ ∗S of CSWO is also the local optimum of the original width optimization SWO. Although Theorem 1 proves that theoretically CSWO can converge to the local optimum, which is equivalent to that of SWO, it is impractical to employ a lengthy convergent process to infinitely approach the consistent local optimum. Therefore, the actual optimal solution obtained by CSWO may not be exactly the same as that of SWO as the convergent process is ended by a predefined threshold. However, the similarity between the optimums of CSWO and SWO can be maintained with smaller convergence threshold. IV. T ESTS A. Test Problems In this section, the proposed method is verified with four mathematical function approximation problems and one practical satellite design model approximation problem. Test 1: 1-D function [2], [34] f (x) = (1 − e−2



x

) + 6xe−7x sin(10x)

−0.2e−2000(x−0.25) + 60 min(0, |x − 0.14| 2

−0.08)2 · [ln(x + 0.2) + 1.5 sin2 (85x)]x ∈ [0, 1]. (41) This function features the combination of an irregular region and a flat region. As shown in [35], given the same training sample size 30, uniformly distributed training samples are not sufficient to model the irregular region accurately, but the training set with more samples located in the irregular region and less samples in the flat region can provide good approximation as it can obtain information of the target function more effectively. Hence, with reference to [2], the training set T is defined as follows:   0.014 × (k − 1) 1 ≤ k ≤ 23 . T1 = {(ck , f (ck ))|ck = 0.4 + 0.1 × (k − 24) 24 ≤ k ≤ 30 (42) The validation set is 51 uniformly distributed data points defined as V1 = {(x q , f (x q ))|x q = 0.02 × (q − 1) 1 ≤ q ≤ 51}. (43)

f (x 1 , x 2 ) = x 1 sin(4x 1 ) + 1.1x 2 sin(2x 2 ) x 1 , x 2 ∈ [0, 3.5]. (44) The training set is composed of 36 uniformly distributed points in the design space designed by two-variable six-level full factorial design method, and the validation set is composed of 400 uniformly distributed points designed by 2-variable 20level full factorial design method. Test 3: 2-D function [37]   ! f (x 1 , x 2 ) = 20 − 20 exp −0.2 x 12 + x 22   cos(2π x 1 ) + cos(2π x 2 ) − exp 2 (45) x 1 , x 2 ∈ [−5, 30]. The training data set is composed of 36 uniformly distributed points in the design space with two-variable six-level full factorial design method, and the validation data set is composed of 400 uniformly distributed points in the design space with 2-variable 20-level full factorial design method. Test 4: 20-D function    20 2  20 i × x i sin (x i ) × sin f (x) = (0 ≤ x i ≤ π). π i=1 (46) The training data set is composed of 300 uniformly distributed points in the design space designed by optimal Latin hypercube design (LHD) method with max-min criterion [33], and the validation data set is composed of 100 uniformly distributed points in the design space also designed by optimal LHD method with max-min criterion. Test 5: 5-D satellite design model [37], [38] Mass = f (h, fc , b, l, t).

(47)

The object is a virtual earth observation small satellite, and the variables include orbit altitude h (sun synchronous circular recursive orbit), charge-coupled device camera focal length f c , body width b (the cross section is square), body height l, and side wall thickness t. The range of each variable is defined as h ∈ [500, 800] km, f c ∈ [200, 300] mm, b ∈ [500, 1000] mm, l ∈ [500, 1000] mm, and t ∈ [5, 10] mm. With a given set of these five variables, satellite mass can be estimated with conceptual design experimental equations [39]. The training data set is composed of 200 uniformly distributed points in the design space designed by optimal LHD method with maxmin criterion, and the validation data set is composed of 100 uniformly distributed points in the design space also designed by optimal LHD method with max-min criterion. As each dimension of the variable vector is of different order, the training and validation data are firstly normalized to build the approximation model. B. Results and Discussion The RBFNN modeling of the preceding five test problems are solved with both CSWO and SWO for comparison. Different optimization algorithms can be employed to solve

YAO et al.: CONCURRENT SUBSPACE WIDTH OPTIMIZATION METHOD FOR RBF NEURAL NETWORK MODELING

3.5

3.5 5

3

5

3

2.5

2.5

2 x2

x2

2 0

1.5

1

0.5

0.5 −5 0.5

1

1.5

2

2.5

3

0

1.5

1

0 0

0

3.5

−5 0

0.5

1

1

0.65

0.9

0.6

0.8

0.55

0.7

RMSE/RO

0.7

0.5 0.45

2.5

3

3.5

0.5 0.4

0.35

0.3 2 Iteration number (c)

3

4

RMSE RO

0.6

0.4

1

2 (b)

(a)

0

1.5 x1

x1

Width

255

0.2

0

1

2 Iteration number (d)

3

4

Fig. 4. Results of CSWO in Test 2 (N S = 4). (a) Accurate model. (b) RBFNN approximation model. (c) Width optimization history. (d) RMSE/RO convergence history.

optimization (9) and (26), for example, sequential quadratic programming (SQP) and GA. For CSWO, the simple line search can be applied as each SSO is a single-variable optimization problem. However, for SWO with multiple variables, line search may not be effective. To fairly compare the efficiency of CSWO and SWO, the same algorithm SQP is used as the optimization solver. The initial baseline of the width for each cluster is defined as half the average distance between the centers and the corresponding centroid of this cluster, stated as     NT Si c j − M S  i    I T Si (0) . (48) σ Si = (2NT Si ) j =1

The computer hardware to implement CSWO and SWO is the same with Intel Core i3 CPU 530 at 2.93/1.17 GHz and 2.99 GB memory. It is noteworthy that in this paper the CSWO method is realized with only one computer due to hardware limitation. Hence, the SSOs are solved sequentially rather than concurrently, which leads to lengthened computational time of CSWO that is roughly N S times the theoretically expected time with concurrent optimization. The five test problems are solved with CSWO and SWO, respectively, with different N S and λ values, and the results are listed √ in Table I. The root mean square error (RMSE) defined as eT e is given to provide information about the accuracy

of the approximation models. To be comparable with RMSE, the root of the objective (RO) is presented instead of the original objective. To better visualize the benefits of CSWO, the theoretically expected computational time with SSOs being solved in parallel is also given in Table I, which is estimated by dividing the actual recorded time of CSWO with SSOs being solved sequentially by N S under the presumption that the time of coordination and data sharing between parallel computers is negligible compared to that of SSOs. Table I shows that CSWO can converge to the optimum with no more than five iterative cycles of decomposition and coordination in all the tests, which testifies the convergence capability of CSWO with the proposed decomposition and coordination strategy. With the same training and validation sets, N S and λ values, the results of CSWO are generally very close to those of SWO, while the estimated computational time (in bold and italic face) of CSWO with parallel computation is obviously much smaller than that of SWO, especially when more training samples (more RBFNN centers and larger-scale matrix) and more clusters (more optimization variables) are involved, as shown in Tests 4 and 5. In some cases, CSWO can obtain better solutions than SWO, for example, in Test 1 CSWO with SQP is better than SWO with SQP for N S = 2. It is because SQP converges to a local optimum in the multivariable SWO optimization, whereas in the single-variable SSOs of CSWO it obtains better optimums. To locate the global optimum for comparison, GA

256

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 2, FEBRUARY 2012

30

30 20

25 20

20

25 20

15

15

15

10

10 5

x2

x2

15 10

10 5

5

0

5

0

−5 −5

−5 −5

0 0

5

10

15

20

25

30

0 0

5

10

15

(a)

25

30

(b)

10

7

9

6

RMSE/RO

8 Width

20

7 6

RMSE RO

5 4 3

5

2

4

1

3 0

1

2 Iteration number (c)

3

4

0

1

2 Iteration number (d)

3

4

Fig. 5. Results of CSWO in Test 3 (N S = 4). (a) Accurate model. (b) RBFNN approximation model. (c) Width optimization history. (d) RMSE/RO convergence history.

is applied in SWO as a benchmark method. Although the accuracy is improved by 13.7% compared with that of CSWO, the computational time is lengthened by 19 times. This verifies the efficiency of CSWO, and also indicates the advantage of single-variable optimization over multivariable optimization in terms of optimization effectiveness with the same solver. The RBFNN models of Test 1 built by SWO with GA and CSWO with SQP are plotted in Fig. 2 against the accurate model. It shows that both approximation models can match the target model well except in the region between the training points c23 and c25 . But in [2], with the same training set and cluster number, the optimal RBFNN can match the target model in all regions with a different decomposition scheme and λ value. Denote the decomposition scheme obtained with the k-means clustering method stated in Section III as I(1) T S, which is (1)

(1)

IT S1 = {i |1 ≤ i ≤ 24}, IT S2 = {i |25 ≤ i ≤ 30}.

(49)

(2) IT S ,

which is Denote the decomposition scheme in [2] as obtained by analyzing training data spatial distribution features and target model nonlinearity features (2)

(2)

IT S1 = {i |1 ≤ i ≤ 23}, IT S2 = {i |24 ≤ i ≤ 30}.

(50)

The modeling results obtained by CSWO with different decomposition schemes and λ values are listed in Table II. It shows that CSWO with I(2) T S and λ = 0.0 performs the best. The RBFNN models given by CSWO with I(2) T S and two

TABLE II C OMPARISON OF CSWO R ESULTS IN T EST O NE WITH D IFFERENT D ECOMPOSITION S CHEMES AND R EGULARIZATION PARAMETERS Decomposition scheme (1)

IT S I(2) TS (2)

IT S

Method CSWO (SQP) CSWO (SQP) CSWO (SQP)

λ 0.05 0.05 0.00

Widths [0.0177, 0.1409] [0.0155, 0.1420] [0.0245, 0.3618]

RMSE

RO

Time

0.1074

0.1415

2.836

0.0917

0.0355

4.572

0.0084

0.0084

1.817

different λ values are plotted in Fig. 3, which show that the approximation accuracy in the region between points c23 and c25 is greatly improved compared with the models built with (1) IT S . Hence, it indicates that the decomposition scheme and the regularization parameter λ have great effect on the width optimization of RBFNN modeling. In Tests 2 and 3, the accurate and the RBFNN models, the width optimization history, and the RMSE/RO convergence history of CSWO are depicted in Figs. 4 and 5, respectively. The accurate and the RBFNN models are plotted with 2-D axis and a gray scale dimension representing the model response. In Fig. 4, the approximation model is very close to the accurate one. In Fig. 5, the approximation model is accurate in most regions except for the small area near the peak. With reference to [35], [37], the accuracy around the peak can be improved with more data sampled in this irregular area.

YAO et al.: CONCURRENT SUBSPACE WIDTH OPTIMIZATION METHOD FOR RBF NEURAL NETWORK MODELING

245

256

240

254

235 252

230 225 220 215

248

210

246

250

250

Focal length (mm)

Focal length (mm)

Mass (kg)

Mass (kg)

250

205

245

256

240

254

235 252

230 225 220 215

248

210

246

250

205 244

200 500

200

650

550 600 Orbit altitude (km) (a)

Mass (kg) 260

1000 950

220

800

200

750 700

180

650 600

160

550

140

500 500

600

700

800

900

650 Mass (kg) 260

950

240

900 Body height (mm)

850

244 550 600 Orbit altitude (km) (b)

500

1000

240

900 Body height (mm)

257

220

850 800

200

750 700

180

650 600

160

550

140

500 500

1000

600

Body width (mm) (c) 3

900 800 700 Body width (mm) (d)

1000

0.9 RMSE RO

0.8

2.5

0.7 RMSE/RO

Width

2 1.5 1

0.6 0.5 0.4 0.3

0.5

0.2

0 0

1

2

3 4 Iteration number (e)

5

6

0.1 0

1

2

3

4

5

6

Iteration number (f)

Fig. 6. Results of the combination of CSWO and SWO in Test 5 (N S = 20). (a) Accurate satellite mass model and (b) RBFNN approximation model with respect to the focal length and the orbit altitude, where b = 900 mm, l = 900 mm, and t = 7 mm. (c) Accurate satellite mass model and (d) RBFNN approximation model with respect to the body width and the body height, where h = 600 km, f c = 250 mm, and t = 7 mm. (e) Width optimization history. (f) RMSE/RO convergence history.

Considering that better initial baseline can improve the optimization efficiency, it is tried to use the result of CSWO as the baseline for SWO with SQP in Test 4, and the results show that better optimum can be obtained with shorter computational time compared against SWO with GA. In Test 5, the combination of CSWO and SWO also proves to be very efficient in obtaining better accuracy. The approximation results of the combination of CSWO and SWO in Test 5 are shown in Fig. 6. Two cross sections of the RBFNN model are compared with those of the accurate model. One is the satellite mass model with respect to the focal length and the orbit altitude where

b = 900 mm, l = 900 mm, and t = 7 mm. The other is the satellite mass with respect to the body width and the body height where h = 600 km, f c = 250 mm, and t = 7 mm. For both RBFNN models, the ratio between the RMSE and the average mass is less than 5%, which is acceptable to replace the high-fidelity model in satellite design optimization. It is noteworthy that the RMSE convergence history depicted in Fig. 6 is normalized to be comparable with RO, which is also normalized during optimization, while in Table I, RMSE values of Test 5 are transformed into the original space for intuitive understanding. Test 5 illustrates the application of

258

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 2, FEBRUARY 2012

CSWO in approximating high-fidelity nonlinear models, so that the computationally intensive models in spacecraft design optimization can be replaced with the cheap RBFNN to reduce computational cost. Other applications where the input–output relationship needs to be identified or approximated, the blackbox RBFNN modeled with CSWO can also be applicable, such as modeling the input–output relationship based on discrete points (experiment data) for sensitivity analysis, fault diagnosis, dynamic nonlinear system identification in flight control, and so on. V. C ONCLUSION A width optimization method CSWO was proposed for RBFNN modeling based on a concurrent SSO strategy. This method decomposes the large-scale width optimization problem into several SSO problems, which can be solved concurrently under the guidance of system coordination so as to converge to a consistent optimum. The equivalence between the optimums of CSWO and the original width optimization problem SWO was proved. Four numerical tests and one practical engineering test verified that CSWO can quickly converge to the optimum, which is very close to the optimum of SWO with much less computational cost. Since theoretically the optimum of CSWO can infinitely approach the optimum of SWO, CSWO can save a lot of computational cost and time as each SSO is much simpler and all the SSOs can be solved concurrently; CSWO can provide a good balance between optimization effectiveness and computational complexity. Besides, each SSO is a single-variable optimization problem with much less training and validation samples as well as much smaller matrices to manipulate, which can better utilize existing optimization techniques to robustly locate the optimum. In contrast, the traditional SWO method may encounter failures in optimization convergence with a large number of optimization variables and difficult calculation of large-scale matrices. Hence, CSWO is more reliable and robust. The combination of CSWO and SWO was also tried in the tests, which demonstrated that CSWO can quickly find a near optimum width scheme, based on which SWO can obtain a better optimum more efficiently. Thus, CSWO can also be used as a first-stage optimization tool to quickly identify a good initial baseline for further accurate width optimization. Although it was shown that CSWO can quickly converge to the optimum with no more than five cycles in the tests, the convergence capability of the proposed decomposition and coordination strategy should be theoretically analyzed and proved. It was also noticed that different decomposition schemes and different regularization parameter settings have a great influence on the optimization result, which needs further investigation in the future. R EFERENCES [1] Z. Wang, X. Chen, W. Luo, and W. Zhang, Research on the Theory and Application of Multidisciplinary Design Optimization of Flight Vehicles. Beijing, China: National Defense Industry Press, 2006. [2] W. Yao, X. Chen, M. van Tooren, and Y. Wei, “Euclidean distance and second derivative based widths optimization of radial basis function neural networks,” presented at the International Joint Conference on Neural Networks, Barcelona, Spain, 2010.

[3] J. Park and I. W. Sandberg, “Universal approximation using radial basis function networks,” Neural Comput., vol. 3, no. 2, pp. 246–257, 1991. [4] M. J. D. Powell, “Radial basis function approximations to polynomials,” in Proc. 12th Biennial Numer. Analy. Conf., 1987, pp. 223–241. [5] D. S. Bromhead and D. Lowe, “Multivariable functional interpolation and adaptive networks,” Complex Syst., vol. 2, no. 3, pp. 321–355, 1988. [6] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall, 1998. [7] M. Hou and X. Han, “Constructive approximation to multivariate function by decay RBF neural network,” IEEE Trans. Neural Netw., vol. 21, no. 9, pp. 1517–1523, Sep. 2010. [8] Y. Wei, L. Xu, and X. Chen, “The radial basis function shape parameter chosen and its application in engineering,” presented at the International Conference on Intelligent Computing and Intelligent Systems, Shanghai, China, 2009. [9] M. Orr, “Optimising the widths of radial basis functions,” presented at the 5th Brazilian Symposium on Neural Networks, 1998. [10] J. Moody and C. J. Darken, “Fast learning in networks of locally-tuned processing units,” Neural Comput., vol. 1, no. 2, pp. 281–294, 1989. [11] A. Saha and J. D. Keeler, “Algorithms for better representation and faster learning in radial basis function networks,” in Advances in Neural Information Processing Systems, vol. 2. San Mateo, CA: Morgan Kaufmann, 1990, pp. 482–489. [12] C. L. Lin, J. F. Wang, C. Y. Chen, C. W. Chen, and C. W. Yen, “Improving the generalization performance of RBF neural networks using a linear regression technique,” Expert Syst. Appl., vol. 36, no. 10, pp. 12049–12053, 2009. [13] H. X. Huan, D. T. T. Hien, and H. H. Tue, “Efficient algorithm for training interpolation RBF networks with equally spaced nodes,” IEEE Trans. Neural Netw., vol. 22, no. 6, pp. 982–988, Jun. 2011. [14] M. Verleysen and K. Hlavackova, “Learning in RBF networks,” in Proc. Int. Conf. Neural Netw., Washington D.C., 1996, pp. 199–204. [15] N. Benoudjit, C. Archambeau, A. Lendasse, J. Lee, and M. Verleysen, “Width optimization of the Gaussian kernels in radial basis function networks,” in Proc. Eur. Symp. Artif. Neural Netw., Bruges, Belgium, 2002, pp. 425–432. [16] N. Benoudjit and M. Verleysen, “On the kernel widths in radial-basis function networks,” Neural Process. Lett., vol. 18, no. 2, pp. 139–154, 2003. [17] P. Singla, K. Subbarao, and J. L. Junkins, “Direction-dependent learning approach for radial basis function networks,” IEEE Trans. Neural Netw., vol. 18, no. 1, pp. 203–222, Jan. 2007. [18] D. G. Lowe, “Adaptive radial basis function nonlinearities, and the problem of generalization,” in Proc. 1st IEE Conf. Artif. Neural Netw., London, U.K., Oct. 1989, pp. 171–175. [19] J. Peng, K. Li, and D. Huang, “A hybrid forward algorithm for RBF neural network construction,” IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1439–1451, Nov. 2006. [20] A. J. Rivera, J. Ortega, I. Rojas, and A. Prieto, “Optimizing RBF networks with cooperative/competitive evolution of units and fuzzy rules,” in Proc. 6th Int. Work Conf. Artif. Nat. Neural Netw. Connection. Models Neurons, Learn. Process. Artif. Intell. Part I, London, U.K., 2001, pp. 570–578. [21] A. Leonardisa and H. Bischofb, “An efficient MDL-based construction of RBF networks,” Neural Netw., vol. 11, no. 5, pp. 963–973, Mar. 1998. [22] J. Peng, K. Li, and G. W. Irwin, “A novel continuous forward algorithm for RBF neural modeling,” IEEE Trans. Autom. Control, vol. 52, no. 1, pp. 117–122, Jan. 2007. [23] C. Harpham, C. W. Dawson, and M. R. Brown, “A review of genetic algorithms applied to training radial basis function networks,” Neural Comput. Appl., vol. 13, no. 2, pp. 193–201, 2004. [24] A. F. Sheta and K. De Jong, “Time-series forecasting using GA tuned radial basis functions,” Inform. Sci., vol. 133, nos. 3–4, pp. 221–228, 2001. [25] E. P. Maillard and D. Gueriot, “RBF neural network, basis functions and genetic algorithm,” in Proc. Int. Conf. Neural Netw., Houston, TX, 1997, pp. 2187–2192. [26] S. Chen, Y. Wu, and B. L. Luk, “Combined genetic algorithm optimization and regularized orthogonal least squares learning for radial basis function networks,” IEEE Trans. Neural Netw., vol. 10, no. 5, pp. 1239–1243, Sep. 1999. [27] J. González, I. Rojas, J. Ortega, H. Pomares, F. J. Fernández, and A. F. Díaz, “Multiobjective evolutionary optimization of the size, shape, and position parameters of radial basis function networks for function approximation,” IEEE Trans. Neural Netw., vol. 14, no. 6, pp. 1478– 1495, Nov. 2003.

YAO et al.: CONCURRENT SUBSPACE WIDTH OPTIMIZATION METHOD FOR RBF NEURAL NETWORK MODELING

[28] J. Liu and J. Lampinen, “A differential evolution based incremental training method for RBF networks,” in Proc. Conf. Genet. Evol. Comput., Washington D.C., 2005, pp. 881–888. [29] S. Noman, S. M. Shamsuddin, and A. E. Hassanien, “Hybrid learning enhancement of RBF network with particle swarm optimization,” in Foundations of Computational, Intelligence, vol. 1. Berlin, Germany: Springer-Verlag, 2009, pp. 381–397. [30] S. Chen, E. S. Chng, and K. Alkadhimi, “Regularised orthogonal least squares algorithm for constructing radial basis function networks,” Int. J. Control, vol. 64, no. 5, pp. 829–837, May 1996. [31] G. A. F. Seber, Multivariate Observations. Hoboken, NJ: Wiley, 1984. [32] A. Drozdek, Data Structures and Algorithms in C++, 2nd ed. Pacific Grove, CA: Brooks/Cole, 2004. [33] A. A. Giunta, Jr., S. F. Wojtkiewicz, and M. S. Eldred, “Overview of modern design of experiments methods for computational simulations,” presented at the 41st Aerospace Sciences Meeting and Exhibit, Reno, NV, 2003, no. AIAA-2003-649. [34] G. Li and S. Azarm, “Maximum accumulative error sampling strategy for approximation of deterministic engineering simulations,” presented at the 11th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, Portsmouth, VA, 2006, no. AIAA-2006-7051. [35] W. Yao and X. Chen, “A sequential radial basis function neural network modeling method based on partial cross validation error estimation,” presented at the 5th International Conference on Natural Computation, Tianjing, China, 2009. [36] R. L. Haupt and S. E. Haupt, Practical Genetic Algorithms. New York: Wiley, 1998. [37] W. Yao, X. Chen, and W. Luo, “A gradient-based sequential radial basis function neural network modeling method,” Neural Comput. Appl., vol. 18, no. 5, pp. 477–484, 2009. [38] W. Yao, J. Guo, X. Chen, and M. van Tooren, “Utilizing uncertainty multidisciplinary design optimization for conceptual design of space systems,” presented at the 8th Conference on Systems Engineering Research, Hoboken, NJ, 2010. [39] J. R. Wertz and W. J. Larson, Space Mission Analysis and Design, 3rd ed. El Segundo, CA: Microcosm Press, 1999.

Wen Yao received the B.Sc. and M.Sc. degrees in aerospace engineering from National University of Defense Technology, Changsha, China, in 2005 and 2007, respectively, where she is currently a Ph.D. student. She was a Visiting Scholar with the Faculty of Aerospace Engineering, Delft University of Technology, Delft, The Netherlands. Her current research interests include approximation modeling using radial basis function neural networks, approximation based optimization, uncertainty based multidisciplinary design optimization, and their applications in aerospace vehicle design.

259

Xiaoqian Chen received the M.Sc. and Ph.D. degrees in aerospace engineering from National University of Defense Technology, Changsha, China, in 1997 and 2001, respectively. He is currently a Professor with the College of Aerospace and Materials Engineering, National University of Defense Technology. His current research interests include spacecraft systems engineering, advanced digital design methods of space systems, and multidisciplinary design optimization.

Yong Zhao received the Ph.D. degree in aerospace engineering from National University of Defense Technology, Changsha, China, in 2006. He is currently an Associate Professor with the College of Aerospace and Materials Engineering, National University of Defense Technology. His current research interests include micro-satellite engineering, fractionated spacecraft design, and multidisciplinary design optimization.

Michel van Tooren received the Ph.D. degree in composite fuselage design from the Delft University of Technology, Delft, The Netherlands, in 1993. He is currently a Manager of the New Concept Development, Fokker Aerostructures, Hoogeveen, The Netherlands. He combines his work in industry with a part-time appointment with the Faculty of Aerospace Engineering, Delft University of Technology, where he was appointed a Professor in the group of Systems Integration Aircraft in 2003. He has been involved in many aerospace and nonaerospace product development projects. His group is well known for its work in multidisciplinary design optimization, aircraft designs, knowledge-based engineering, and truck aerodynamics.

Concurrent subspace width optimization method for RBF neural network modeling.

Radial basis function neural networks (RBFNNs) are widely used in nonlinear function approximation. One of the challenges in RBFNN modeling is determi...
1MB Sizes 0 Downloads 1 Views