1266

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 8, AUGUST 2013

Quantum-Based Algorithm for Optimizing Artificial Neural Networks Tzyy-Chyang Lu, Gwo-Ruey Yu, Member, IEEE, and Jyh-Ching Juang, Member, IEEE

Abstract— This paper presents a quantum-based algorithm for evolving artificial neural networks (ANNs). The aim is to design an ANN with few connections and high classification performance by simultaneously optimizing the network structure and the connection weights. Unlike most previous studies, the proposed algorithm uses quantum bit representation to codify the network. As a result, the connectivity bits do not indicate the actual links but the probability of the existence of the connections, thus alleviating mapping problems and reducing the risk of throwing away a potential candidate. In addition, in the proposed model, each weight space is decomposed into subspaces in terms of quantum bits. Thus, the algorithm performs a region by region exploration, and evolves gradually to find promising subspaces for further exploitation. This is helpful to provide a set of appropriate weights when evolving the network structure and to alleviate the noisy fitness evaluation problem. The proposed model is tested on four benchmark problems, namely breast cancer and iris, heart, and diabetes problems. The experimental results show that the proposed algorithm can produce compact ANN structures with good generalization ability compared to other algorithms. Index Terms— Classification problem, mapping problem, quantum neural network (QNN).

I. I NTRODUCTION

A

RTIFICIAL neural networks (ANNs) have been applied widely in many areas, such as system modeling [1], prediction [2], classification [3], and pattern recognition [4]. The design of an ANN typically involves finding an appropriate network structure and training the connection weights. The structure design is crucial in the application of ANNs because it significantly affects a network’s information processing capabilities [5]–[7]. In general, too large a network may tend to overfit the training data and affect the generalization capability, whereas too small a network may not even be able to learn the training samples due to its limited representation capability. In addition, a fixed structure of overall connectivity between neurons may not provide the optimal performance within a given training period [8]. Therefore, recently, some attention Manuscript received September 25, 2011; revised January 20, 2013; accepted February 10, 2013. Date of publication April 23, 2013; date of current version June 28, 2013. T.-C. Lu is with the Advanced Institute of Manufacturing with Hightech Innovations, National Chung Cheng University, Chia-Yi 62102, Taiwan (e-mail: [email protected]). G.-R. Yu is with the Department of Electrical Engineering and Advanced Institute of Manufacturing with High-Tech Innovations, National Chung Cheng University, Chia-Yi 62102, Taiwan (e-mail: [email protected]). J.-C. Juang is with the Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Taiwan (e-mail: [email protected]. edu.tw). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2013.2249089

has been given to the problem of how to construct a suitable network structure for a given task. With little or no prior knowledge about the problem, one usually determines the network structure by means of a trialand-error procedure. However, this depends heavily on the user experience and may require intensive human interaction and computational time. Research has been conducted on constructive algorithms [9], [10] (i.e., start with the smallest possible network and gradually add neurons or connections) and destructive algorithms [11], [12] (i.e., start with the largest possible network and delete unnecessary neurons and connections) to aid the automatic design of structures. However, as indicated in [13], the structural hill-climbing methods in these algorithms are susceptible to being trapped at structural local optima. The search of the optimal network structure is known to be a complex, non differentiable, and multi modal optimization problem, making evolutionary algorithms (EAs) a better candidate for the task than the constructive and destructive algorithms [14]. In the structure design, EAs are employed in two ways: to evolve the structures only [15] and to evolve both the structures and the connection weights simultaneously [13]. In general, when EAs are used to determine the structures only, the training error is calculated by performing a random initialization of weights, which are then determined by a learning algorithm such as a back propagation [16]. However, as indicated in [17], the training results depend on the random initial weights and the choice of learning algorithm. Hence, the same representation (genotype) of structure may have quite different levels of fitness. This one-tomany mapping from genotype to the actual networks (phenotypes) may induce noisy fitness evaluation and misleading evolution. To reduce the detrimental effect of the one-to-many mapping problem, many researchers have paid attention to the simultaneous optimization of network structure and connection weights: Leung et al. [18] presented an improved genetic algorithm to tune the structure and parameters simultaneously. Tsai et al. [19] used a hybrid Taguchi-genetic algorithm to solve the problem of tuning both the network structure and parameters. Angeline et al. [13] suggested that genetic algorithms are not well suited for evolving networks and proposed an evolutionary program, called GeNeralized Acquisition of Recurrent Links (GNARL), to acquire both the structure and weights for recurrent networks. Ludermir et al. [20] presented a methodology to combine simulated annealing and tabu search for the simultaneous optimization of multilayer perceptron (MLP) network weights

2162-237X/$31.00 © 2013 IEEE

LU et al.: ALGORITHM FOR OPTIMIZING ARTIFICIAL NEURAL NETWORKS

5

1

1

Node 1

Node 2 2

7

3

8

0010 0111 0011 1000 0101 0001 Fig. 1.

5

Node 2

Node 1 7

2

8

3

0111 0010 1000 0011 0001 0101

Deceptive mapping showing the permutation problem.

and architectures. Li et al. [21] used an improved particle swarm optimization (PSO) algorithm to learn an ANNs free parameters (weights and bias) and used a binary PSO algorithm to evolve the architecture, simultaneously. Although the framework of evolving both structures and weights can avoid the one-to-many mapping problem, there still exists another structural-functional-mapping problem, which is usually called the permutation problem or the competing conventions problem [22]–[24]. It is mainly caused by the many-to-one mapping from the genotypes to the phenotypes [17], [24]. Fig. 1 illustrates an example of two ANNs that are functionally identical, yet have different representations. As stated in [25], if two such networks are well fitted and mated, the offspring will probably be far worse than any of the parents, since duplicates of some hidden neurons will be present in both of them while other useful neurons will be removed. Some researchers have thus avoided crossover and only adopted mutations in the evolution of structures [26], [27]. However, this also raises some issues about the efficiency of evolution for some problems [28]. To avoid catastrophic crossover, neuroevolution of augmenting topologies (NEATs) (NEATs [29], [30]) use the innovation numbers to track the historical origin of each gene. Whenever a new gene appears via mutation, NEAT receives a unique innovation number, which can be viewed as a chronology of the genes produced during evolution. When crossing over, the genes in both genomes with the same innovation number are lined up. Genes that do not match are inherited from the fitter parent. This approach makes it possible for NEAT to minimize the chance of catastrophic crossover without conducting an expensive topological analysis [29]. In addition, in most EAs, the fitness value governs the evolutionary search [17], [31]. During each generation, the individuals of the population are ranked according to the fitness function, and those with better fitness are more probable to become parents in the next generation. Thus, the ability to evolve into a better ANN relies mainly on the survival of the good structure of the network. However, in the case of the simultaneous evolution of structures and connection weights, a good fitness value does not necessarily represent the quality of the structure. Often, the fitness of a network with a good structure and bad weights may be worse than the fitness of a network with a bad structure but a good set of weights [32]. As a result, some potential structures may be discarded if only fitness is used.

1267

To alleviate these problems, a quantum-based neural network (QNN) is proposed in this paper. Similar to the general EAs, the QNN is characterized by the individuals, the evaluation function, and the population dynamics. However, instead of binary, numeric, or symbolic representation, the QNN uses probabilistic quantum bit representation to codify the network; that is, the connectivity bits do not indicate the actual existence of connections, but the probability of their existence. Besides, each weight space is decomposed into subspaces in terms of quantum bits. When measuring the network fitness, a concrete structure (represented by binary bits) is made by observing the quantum states. To obtain a weight value, a candidate subspace is selected first by also observing the quantum states and then a real number is randomly generated in this subspace. The fitness value is used to refine the quantum states. In other words, when the network has good fitness in the current generation, the probability of the corresponding connections and subspaces being adopted in the next generation is increased. In contrast, if the network has bad fitness, the probability of the corresponding connections and subspaces being adopted in the next generation is decreased. The QNN has distinct advantages over general EAs when it is used in evolving the networks. First, each individual in the QNN is composed of quantum bits that represent the probability of good connections and weight subspaces rather than a specific structure and weight values. So, if the network has bad fitness owing to the one-to-many or many-to-one mapping problem, QNN modifies the quantum states rather than discarding this network. Thus, the risk of throwing away a potential structure or weight values is mitigated. Second, instead of the crossover and mutation, the QNN uses observation to create a new network. Thus, the negative impact of the permutation problem is reduced. Third, a partitioning strategy is used to find the near-optimal connection weights. It explores each weight space region-by-region and rapidly finds the promising subspace for further exploitation. This is helpful to provide a set of appropriate weights when evolving the network structure and alleviate the noisy fitness evaluation problem. The rest of this paper is organized as follows. Section II briefly reviews the QEA. Section III describes the details of the QNN for the design of evolutionary ANNs. Section IV shows the results of the proposed model applied to four realworld problems. Finally, Section V summarizes this paper. II. R EVIEW OF QEA The QEA was first presented by Han et al. [33] to solve the knapsack problems. The method uses quantum bits to represent the individuals, and searches for the optimum by observing the quantum states. The advantage of the QEA is that it can work with small population sizes without running into premature convergence [34]. The QEA is also noted for its simplicity in the implementation and potential in solving large-scale problems. Variations and applications of the QEA can be found in [35]–[39]. In this paper, QEA is employed to solve the optimization problems of network structure and weights. This section briefly reviews the basic concept and procedure of QEA.

1268

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 8, AUGUST 2013

A. Representation and Observation Quantum bits, which differ from traditional bits, use probability to represent binary information. A characteristic of quantum bit representation is the ability to represent a linear superposition of “1” and “0” states probabilistically. A quantum bit individual containing a string of q quantum bits can be defined as   α1 α2 · · · αq (1) β1 β2 · · · βq where 0 ≤ αi ≤ 1, 0 ≤ βi ≤ 1, (αi )2 + (βi )2 = 1, and i = 1, 2, . . ., q. (αi )2 is the probability that the i th quantum bit will be found in state “1” and (βi )2 is the probability that the i th quantum bit will be found in state “0.” Since (αi )2 +(βi )2 = 1, (1) can be simplified as   α1 α2 · · · αq . (2) The observation is a process that produces a binary string b from (2), which operates as follows. For a quantum bit individual with q quantum bits, generate a q random number vector r = [ r1 r2 · · · rq ], where 0 ≤ ri ≤ 1, i = 1, 2, . . . , q; the corresponding bit in b takes “1” if ri ≤ (αi )2 , or “0” otherwise. The advantage of the quantum bit representation is that it can represent any superposition of states. For instance, a quantum bit √individual with 3 quantum bits such as √ √ 1/ 2| 3/2|1/ 3 can be represented as 1 1 1 1 1 √ |000+ |001+ √ |010+ √ |011+ √ |100 2 2 3 2 6 2 2 2 3 1 1 1 (3) + √ |101 + |110 + √ |111 2 2 6 2 2 which means that the probabilities of individual √ √ binary strings |000, . . . , |111 are (1/(2 3))2 , . . . , (1/(2 2))2 , respectively [33]. Consequently, a quantum bit individual with q quantum bits can represent 2q binary strings of information. B. Rotation Gate The rotation gate is a quantum gate [33], [36] that is adopted herein as a variation operator to update the quantum bits in (2). Fig. 2 shows the operation of the rotation gate. For a quantum bit individual with q quantum bits, the i th quantum bit αi , i = 1, 2, . . . , q, is updated as follows:   αi   αi = [cos (θi ) − sin (θi )] . (4) 1 − (αi )2 In function minimization problems, θi can be formulated as follows: 1) if the fitness value 1 − ε. III. QNN A. Network Architecture The basic network considered in this paper is a generalized MLP (GMLP) network, which consists of an input layer, an output layer, a number of hidden nodes, and associated interconnections. Let X and Y be the input and output vectors, respectively. The GMLP network with m inputs, n h hidden nodes, and n outputs is characterized by the equations [17] 1≤i ≤m xi = X i , ⎛ ⎞ i−1  wi j x j ⎠ , xi = f ⎝

m < i ≤ m + nh + n

j =1

Yi = x i+m+nh ,

1≤i ≤n

(6)

where wi j is the weight of the connection from node j to node i and f is the sigmoid function f (z) =

1 . 1 + e−z

(7)

LU et al.: ALGORITHM FOR OPTIMIZING ARTIFICIAL NEURAL NETWORKS

1269

where αi , i = 1, 2, . . . , cmax , is quantum bit. The connectivity vector C utilizes cmax quantum bits to represent the probabilities of 2cmax structures. The connection weight W is expressed as   (12) W = Q w1 , Q w2 , . . . , Q wcmax where each Q wi , i = 1, 2, . . . , cmax , is assumed  to contain k quantum bits or Q wi = αi,1 | αi,2 | · · · | αi,k . Thus, each weight space is divided into 2k subspaces and Q wi is used to represent the probability of the subspaces that render good weight values. The specific realization associated with each quantum bit of the weighting is governed by a Gaussian random number generator with mean μi, j and variance (σi, j )2 , N(μi, j , σi, j ), where j = 1, 2, . . . , 2k . Fig. 3.

D. Quantum-Based Algorithm

GMLP [17].

The representation of a GMLP can be seen in Fig. 3. The maximum number of connections is (n h + n)(n h + n − 1) (8) cmax = m(n h + n) + 2 where the first term on the right-hand side is related to the number of connections from the input nodes to the hidden and output nodes, and the second term is related to the number of connections from the hidden nodes to the output nodes. Thereby, the selection of the appropriate network is equivalent to finding the optimal weighting vector s whose components are the weightings wi j with maximum dimension cmax . B. Objective Function In this paper, the winner-takes-all classification rule is used; that is, the output with the highest activation determinates the class. The network error for the pattern t is defined as  1, if Tˆ (t) = T (t) e(t) = (9) 0, if Tˆ (t) = T (t) where Tˆ (t) and T (t) are the assigned class and the true class of the pattern t, respectively. The objective function F(s), which represents the percentage of incorrectly classified patterns, is F(s) =

D 1  e(t) D

(10)

t =1

where D is the number of patterns. The problem thus becomes the determination of the optimal s such that the objective function F(s) is minimized. C. Representation of Solutions In this paper, quantum bits are employed to represent the probabilities of various network connectivity and connection weights. More precisely, in the QNN, each solution is composed of C, which contains a string of quantum bits that represent the network connectivity, and W , which contains the quantum bits that represent the connection weights. The vector C is expressed as   (11) C = Q c = α1 | α2 | · · · | αcmax

The essence of the QNN is to manipulate the quantum bits instead of candidate solutions for refinement in the optimization process. From one generation to another, the quantum bits are updated; that is, the probabilities are refined so that the overall probability of finding the optimal network is increased. To enhance the overall search speed, a population of individuals is employed. More precisely, the population is divided into G structure subpopulations, with each subpopulation containing L identical structure individuals. Each subpopulation searches for L optimal connection weight individuals under the same structure. The step of the QNN in the population is depicted in Fig. 4. Similar to many evolutionary search algorithms, the QNN allows individuals in the population to undertake exchange operations. The QNN for the classification problem comprises in Algorithm 1. Details of the initialization, observation, update, and exchange are described below. In the √ initialization stage, all quantum bits are initialized as 1/ 2, implying that the probability of being 0 or 1 for each bit is 1/2. The mean μi, j can be initialized randomly from the domain of the corresponding subspace or set at the midpoint of its subspace for simplicity. The standard deviation (SD) σi, j is initialized as 0.1 times the subspace width. In the structure design, Q c is observed to generate a binary string bc , where “1” indicates the presence of a connection and “0” indicates its absence. If the i th connection is present, Q wi is further observed to generate a binary string bwi , and the j th subspace in the i th connection weight space is selected when j = d(bwi ) + 1, where d(bwi ) is the decimal representation of the binary string bwi . Once the subspace is determined, a trial variable wi is generated from the Gaussian distribution with mu i, j and σi, j . An example of creating a candidate network is depicted in Fig. 5 (for simplicity, the maximum number of connections is set to 3, and k is set to 2, which means that each weight (1) space is divided into 22 subspaces). In the example, Q c , (1,1) (1,1) (1,1) Q wi , and N(μi, j , σi, j ), i = 1, 2, 3, j = 1, 2, 3, 4, are (1)

initialized and the observation result of Q c is assumed to lead to bc(1) = [ 0 1 1 ]; that is, the 2nd and 3rd connections (1,1) (1,1) are present. The follow-on observations of Q w2 and Q w3 (1,1) (1,1) result in bw2 = [ 0 1 ] and bw3 = [ 1 1 ] so that the 2nd

1270

Fig. 4.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 8, AUGUST 2013

Step of the QNN in the population.

TABLE I

(l)

PARAMETER S ETTINGS FOR QNN Parameter

Value

Subpopulations (G) Subpopulation size (L) θ Quantum bits in Q w (k) ε τ σ Weight exchange period Structure exchange period

3 30 0.05π 4 0.005 0.8 0.0125 5 10

∗ ∗ > Fsubpopulation , implying that the if minl∈{1,2,...,L} Findividual L individuals with the identical structure observed by Q c do not render a better function value; then, Q c is modified according to the discrepancy between the observation result bc and the prestored best structure bc∗ . On the other hand, if ∗(l) ∗ ∗ minl∈{1,2,...,L} Findividual ≤ Fsubpopulation , bc∗ and Fsubpopulation (l)

∗ are updated as bc and minl∈{1,2,...,L} Findividual , respectively. ∗ Also, if F(s) ≤ Findividual and the i th connection is present, ∗ ∗ , and the mean μ Findividual , bw i, j are updated as F(s), bwi , i and wi , respectively. Furthermore, the SD σi, j is decreased so that future exploitation can be more focused. To this end, σi, j in the subspace is modified as

σi, j = σi, j τ and 4th subspaces are selected, respectively. Finally, Gaussian random number generators N(−0.25, 0.05) and N(0.75, 0.05) are used to render real values of −0.19 and 0.95, respectively. In the operation, the rotation gate is adopted to   update update Q wi i=1,2,...,cmax and Q c . In the evolutionary process, ∗ and the i th connection is present, if F(s) > Findividual Q wi is modified according to the discrepancy between its ∗ . Similarly, observation result bwi and the prestored best bw i

(13)

where τ is a parameter for controlling the convergence rate. Among individuals, an exchange operation, which is similar to the migration operation in QEA, is employed to induce a variation of the quantum states. The operation depends on the preselected exchange periods (includes weight and structure). If  the weight exchange condition is satisfied, Q wi i=1,2,...,cmax in the subpopulation are swapped randomly. If the structure exchange condition is satisfied, Q c among the subpopulations

LU et al.: ALGORITHM FOR OPTIMIZING ARTIFICIAL NEURAL NETWORKS

Qc(1) = 1 [

]: Observation result

Qw(1,1) = 1 1

2 1

[0

2 1

1271

2 1]

1

Qw(1,1) = 1 2

2

2 1

2 1

Qw(1,1) = 1 3

2

[01] 1st subspace 2nd subspace 3rd subspace 4th subspace

1st subspace

2nd subspace

2 1

2

[11] 4th subspace 1st subspace 2nd subspace 3rd subspace

3rd subspace 4th subspace

N ( −0.75,0.05) N ( −0.25,0.05) N (0.25,0.05) N (0.75,0.05) N ( −0.75,0.05) N ( −0.25,0.05) N (0.25,0.05) N (0.75,0.05) N ( −0.75,0.05) N ( −0.25,0.05) N (0.25,0.05) N (0.75,0.05)

−0.19

0.95

s = (-, − 0.19,0.95) (1)

(1)

Fig. 5. Example of creating a candidate network. In the example, the observation of Q c is assumed to lead to bc = [011]; that is, the 2nd and 3rd (1,1) (1,1) (1,1) (1,1) connections are present (drawn with a solid line). The follow-on observations of Q w2 and Q w3 result in bw2 = [01] and bw3 = [11] so that the 2nd and 4th subspaces are selected (drawn with a solid line), respectively. Finally, Gaussian random number generators N (−0.25, 0.05) and N (0.75, 0.05) are used to render real values of −0.19 and 0.95, respectively. TABLE II P ERFORMANCE OF THE QNN FOR THE I RIS D ATA S ET (A LL R ESULTS W ERE AVERAGED OVER 100 I NDEPENDENT RUNS )

nh 2 3 4 5 6 7 8 10 12 14

Maximum Number of Connections 30 39 49 60 72 85 99 130 165 204

Error Rate (%) Mean 0 0.03 0 0.03 0.03 0 0.03 0.05 0.03 0.07

Training SD Best 0 0 0.19 0 0 0 0.27 0 0.19 0 0 0 0.19 0 0.26 0 0.19 0 0.35 0

Worst 0 1.33 0 2.67 1.33 0 1.33 1.33 1.33 2.67

are swapped randomly. The exchange operation allows the QNN to escape from local optima. IV. N UMERICAL E XPERIMENTS AND R ESULTS This section presents the QNNs performance for four wellknown benchmark classification problems, namely breast cancer and iris, heart, and diabetes problems. These problems, from the University of California Irvine Machine Learning Repository, are widely used in studies on ANNs and machine learning [17], [40]. Detailed descriptions of these problems can be obtained from [41] and [42].

Mean 1.70 1.59 1.55 1.52 1.36 1.38 1.39 1.21 1.31 1.31

Testing SD Best 0.99 0 0.91 0 0.84 0 0.71 0 0.89 0 0.72 0 0.66 0 0.78 0 0.84 0 0.75 0

Worst 5.67 3.94 3.66 3.35 6.08 3.49 3.51 3.24 4.86 3.14

Number of Connections 14.75 18.63 23.68 28.67 34.90 40.40 47.64 62.35 78.31 98.80

TABLE III C OMPARISON OF THE QNN S R ESULTS FOR THE I RIS D ATA S ET W ITH T HOSE O BTAINED FOR O THER M ODELS IN T ERMS OF AVERAGE T ESTING E RROR Model

Author

Mean Testing Error (SD) (%)

QNN AMGA HMOEN_L2 NCA OMNN OC1-SA HMOEN_HN

M. M. Islam et al. [43] C. K. Goh et al. [42] M. M. Islam et al. [44] T. B. Ludermir et al. [20] E. Cantú-Paz et al. [44] C. K. Goh et al. [42]

1.21(0.78) 1.89 (-) 2.00(1.84) 2.16(0.18) 4.62(-) 6.10(0.5) 9.97(3.13)

A. Experimental Setup In this paper, the data sets of each problem are divided into three sets: 50% for training, 25% for validation, and 25% for testing. The training set is used to train and modify the structure and weights of the ANN. The validation

set is used to determine the final architecture from the well-trained individuals, and the testing set is used to measure its generalization ability. The parameters of the QNN shown

1272

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 8, AUGUST 2013

Algorithm 1 Quantum-Based Algorithm? 1) Initialize the population by specifying the quantum sets (g) (g,l) Q c and Q wi of each individual and selecting the (g,l) (g,l) mean μi, j and standard deviation σi, j , where g = 1, 2, . . . , G, l = 1, 2, . . . , L, i = 1, 2, . . . , cmax , and j = 1, 2, . . . , 2k . 2) Set generation as 1. 3) for each subpopulation g = 1, 2, . . . , G do (g) 3.1) Observe Q c to give a binary string bc . ∗(g) ∗(g) 3.2) If bc is empty, store bc as bc . 3.3) for each individual l = 1, 2, . . . , L do 3.3.1) for each weight i = 1, 2, . . . , cmax do If the i th bit in bc is 1 (i.e., the i th connection is present) then (g,l) • Observe Q wi to give a binary string bwi . ∗(g,l) ∗(g,l) • If bwi is empty, store bwi as bwi . • Determine the j th subspace as a candidate subspace according to j = d(bwi ) + 1. • Generate a weight value wi from (g,l) (g,l) N(μi, j , σi, j ). end if end for 3.3.2) Obtain a solution s and compute the objective function F(s). ∗(g,l) 3.3.3) If generation is 1, store F(s) as Findividual . ∗(g,l) 3.3.4) If F(s) > F  individual, (g,l) ; update Q wi  (g,l) i=1,2,...,cmax ∗ , else update bw i i=1,2,...,cmax   (g,l) (g,l) (g,l) (μi, j , σi, j ) i=1,2,...,cmax , s∗ , and j =1,2,...,2k

(g,l)

∗ . Findividual end for ∗(g,l) 3.4) If generation is 1, store arg minl∈{1,2,...,L} Findividual (g) ∗ as Fsubpopulation . (g,l)

∗ 3.5) If arg minl∈{1,2,...,L} Findividual (g)

4) 5) 6) 7)

(g)

>

(g)

∗ Fsubpopulation , (g)

∗ update Q c ; else update bc∗ and Fsubpopulation . end for If the termination condition is satisfied, go to Step 7. If the exchange condition is satisfied, perform exchange operation. Increment generation by 1 and return to Step 3. ∗(g) Choose arg ming∈{1,2,...,G} Fsubpopulation as the minimum

objective function value and the corresponding s ∗ the optimal solution.

(g,l)

as

in Table I are used for all the problems to show the robustness of the model regarding the parameter settings. Note that the number of the quantum bits in Q c is related to the maximum number of connections, which is determined by the number of inputs, outputs, and hidden nodes. The number of quantum bits in Q w (i.e., k) governs the partitioning of each weight space and affects the subspace size. A large number of quantum

bits imply a small subspace size and a strong possibility of escaping from a local optimum. However, the computational complexity increases with the number of subspaces. In this paper, k is set to 4, which means that each weight space is divided into 24 subspaces. θ affects the convergence speed of a quantum bit. Convergence may occur prematurely if this parameter is set too high [38]. A value from 0.01π to 0.05π is recommended for the magnitude of θ , although it depends on the problem. In the experiments, θ was set to 0.05π. ε is applied to quantum bit αi to make the probability (αi )2 converge to either ε or 1 − ε rather than zero or unity. When (αi )2 converges to ε (or 1 − ε), it means that the i th quantum bit will be found in state “1” with probability ε (or 1 − ε) and state “0” with probability 1 − ε (or ε). So, ε can be regarded as a mutation probability and should usually be set fairly low (0.005–0.01 is a good choice). If it is set too high, the search will turn into a primitive random search. In the experiments, ε was set to 0.005. τ is the multiplication factor that governs the convergence rate of the SD. Convergence may occur prematurely if this parameter is set too low. A value from 0.8 to 0.95 was found to be suitable. In the experiments, it was set to 0.8. The number of hidden nodes n h is manually chosen as 2, 3, 4, 5, 6, 7, 8, 10, 12, and 14 to test the learning performance (the cases of n h = 9, 11, and 13 are omitted to save space). For all experiments, the evolution is repeated until 2000 generations is reached. To assess the performance of the algorithm, 100 runs are conducted. In each run, the testing error of the best model in terms of the validation error is stored. The error rate and the number of connections, obtained by taking the average and the SD of the 100 records, are also computed. B. Experimental Results and Comparisons 1) Classification of the Iris Data Set: This data set contains three classes of 50 instances each (total number of instances is 150), where each class refers to a type of iris plant. One class is linearly separable from the other two, which are not linearly separable from each other. The number of attributes is four, all of which are real values. In the experiment, the data set items are randomly chosen with 90 instances for training, 15 instances for validation, and the rest (45 instances) for testing. Table II shows the classification results of the QNN for various numbers of hidden nodes (n h ). As can be seen, the QNN achieves good training performance for almost all values of n h . A mean training error of 0 was obtained for n h = 2, 4, and 7. For the testing set, the best models in terms of validation error achieved a minimum testing error of 0 with all values of n h . Furthermore, the best mean testing error is 1.21%, and the n h corresponding average number of connections is 62.35 (the number of connections of a fully connected network is 130), which is approximately a 52% reduction in links. Although a direct comparison with other approaches is difficult because the algorithms and methods of obtaining the generalization of the models are different, it is interesting to compare the results in terms of testing error obtained in other publications. Table III shows the results from some existing approaches and the best result of the QNN in terms of the

LU et al.: ALGORITHM FOR OPTIMIZING ARTIFICIAL NEURAL NETWORKS

1273

TABLE IV P ERFORMANCE OF THE QNN FOR THE C ANCER D ATA S ET (A LL R ESULTS W ERE AVERAGED OVER 20 I NDEPENDENT RUNS )

nh 2 3 4 5 6 7 8 10 12 14

Maximum Number of Connections

Mean

Training SD Best

Error Rate (%)

42 55 69 84 100 117 135 174 217 264

2.89 2.89 2.89 2.90 2.91 2.90 2.91 2.92 2.91 2.92

0.11 0.11 0.11 0.09 0.10 0.08 0.09 0.03 0.05 0.07

2.34 2.34 2.34 2.34 2.05 2.63 2.34 2.63 2.63 2.63

Worst

Mean

Testing SD Best

2.92 2.92 2.92 2.92 2.92 2.92 3.22 2.92 2.92 3.22

1.14 1.10 1.05 1.00 1.05 1.04 0.99 1.00 0.96 1.14

0.41 0.35 0.36 0.35 0.36 0.35 0.31 0.34 0.32 0.35

TABLE V C OMPARISON OF THE QNNs R ESULTS FOR THE C ANCER D ATA S ET W ITH T HOSE O BTAINED FOR O THER M ODELS IN T ERMS OF AVERAGE T ESTING E RROR Model

Author

Mean Testing Error (SD) (%)

QNN NCA AMGA SNG HMOEN_HN CFNN HMOEN_L2 OC1-AP

M. M. Islam et al. [44] M. M. Islam et al. [43] N. Garcia-Pedrajas et al. [46] C. K. Goh et al. [42] L. Ma et al. [47] C. K. Goh et al. [42] E. Cantú-Paz et al. [45]

0.89 (0.32) 0.91 (0.24) 1.30 (-) 2.78 (1.18) 3.18 (0.58) 3.30 (-) 3.74 (1.1) 5.30 (0.4)

mean testing error in Table II. As can be seen, the QNN has the lowest mean testing error. In fact, the mean testing errors of the QNN for all values of n h are lower than those of the other models. This demonstrates the powerful classification ability of the QNN under various network sizes. 2) Classification of the Cancer Data Set: This data set contains patterns of 699 individuals. Each pattern has nine attributes and two classes. All the attributes are real values. The task is to classify a tumor as either benign or malignant based on cell descriptions gathered from a microscopic examination. In the experiment, the data are split into three sets: 350 samples in the training set, 175 samples in the validation set, and 174 samples in the testing set. The classification results of the QNN for various values of n h are shown in Table IV. As can be seen, the evolution for various values of n h has a considerably low variance in the error rate. For example, the mean training error is 2.89% when n h = 2, 3 or 4, and the mean training errors are 2.90%, 2.91%, and 2.92% when n h = 5 or 7, n h = 6, 8, or 12, and n h = 10 or 14, respectively. This means that the QNN is robust regarding this parameter. More specifically, the value of n h affects the network size but it does not appear to affect the training ability of the QNN. Similar results were observed with the testing set. For the testing set, the best mean error is 0.89%, and the average number of connections is 105.85 (the number of

0.39 0.59 0.54 0 0.59 0 0.44 0.59 0 0.59

Worst 2.35 1.76 1.76 2.35 2.35 1.76 1.76 1.76 1.76 1.76

Number of Connections 22.42 27.64 33.47 39.82 47.34 57.12 64.59 82.17 105.85 126.14

connections of a fully connected network is 217), which is approximately a 51% reduction in links. Table V shows the results from some existing approaches and the best result of the QNN. As can be seen, the QNN has the lowest mean testing error. In addition, the mean testing errors for all values of n h are considerably lower than those of the other models, except for New Constructive Algorithm (NCA). Therefore, for the cancer data set, it can be concluded that the QNN is superior to or as good as many of the state-of-the-art models of evolutionary neural networks. 3) Classification of the Heart Data Set: This data set contains patterns of 303 patients, but six of them have missing values and 27 of the remaining patterns are debatable, leaving a total of 270 valid patterns. The number of attributes is 13, which includes real (1, 4, 5, 8, 10, and 12), ordered (11), binary (2, 6, and 9), and nominal (3, 7, and 13) values. The goal is to predict the presence or absence of heart disease in the patients (the original data set had five classes, considering four degrees of heart disease). In the experiment, the data are split into three sets: 134 instances for training, 68 instances for validation, and 68 instances for testing. The classification results of the QNN for various values of n h are shown in Table VI. As can be seen, the best mean-testing result (error rate is 15.62%) is obtained when the number of hidden nodes is seven. The average number of connections is 70.50 after learning (the number of connections of a fully connected network is 153), which is approximately a 54% reduction in links. Table VII shows the results of some existing approaches and the best result of the QNN in terms of mean testing error. The QNN significantly outperforms the other models. This means that the QNN has better generalization ability than those of the other models for the heart data set. In addition, Table VI shows that the mean testing error has a considerably low variance for various values of n h (the largest mean testing error is 16.43% and the lowest mean testing error is 15.62%), with all mean testing error values being superior to those of the other models. This proves that the quantum-based QNN can avoid the architectural local optima and provide good ANN structure without being affected by network size. 4) Classification of the Diabetes Data Set: This data set consists of 768 individuals, all females at least 21 years old

1274

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 8, AUGUST 2013

TABLE VI P ERFORMANCE OF THE QNN FOR THE H EART D ATA S ET (A LL R ESULTS W ERE AVERAGED OVER 100 I NDEPENDENT RUNS ) nh

Maximum

Error Rate (%)

Number of

Number

Training

Testing

of Connections

Connections

Mean

SD

Best

Worst

Mean

SD

Best

Worst

2

58

9.59

0.86

6.72

11.19

16.43

1.82

11.76

20.59

26.32

3

75

9.79

0.87

7.46

11.94

16.01

1.87

11.76

21.32

33.98

4

93

9.99

0.93

7.46

11.94

16.17

2.36

10.29

20.59

42.42

5

112

10.12

0.75

8.21

11.94

15.79

1.84

11.76

20.59

53.42

6

132

10.28

0.87

8.21

13.43

15.67

1.95

11.76

22.06

60.64

7

153

10.31

0.76

7.46

11.94

15.62

1.97

11.76

23.53

70.50

8

175

10.25

0.75

8.96

12.69

15.90

2.19

10.29

20.59

82.00

10

222

10.46

0.70

8.96

12.69

15.97

2.07

10.29

23.53

105.01

12

273

10.55

0.74

8.96

12.69

15.83

1.77

12.25

20.59

130.87

14

328

10.56

0.68

8.96

11.94

15.70

1.84

11.76

20.59

157.08

of Pima Indian heritage. The data set has eight attributes and two classes. All of the attributes are real valued and the class of each pattern indicates whether the patient shows signs of diabetes according to the World Health Organization criteria. In the experiment, the data set is divided into 384 patterns for training, 192 patterns for validation, and 192 patterns for testing. The classification results of the QNN for various values of n h are shown in Table VIII. Low variance in the training errors and the testing errors under various initial network sizes can be observed. In addition, the best mean testing result (error rate is 21.41%) is obtained when the number of hidden nodes is two, and the average number of connections is 18.05 after learning (the number of connections of a fully connected network is 38), which is approximately a 53% reduction in links. Table IX shows the results of some existing approaches and the best mean testing result of the QNN in Table VIII. As can be seen, the QNN has the lowest mean testing error. C. Discussion It is very clear from Tables II–IX that QNN can produce compact ANN structures with good generalization ability compared to other algorithms. This can be attributed to three features of the QNN. First, QNN is composed of quantum bits that represent the probability of good connections and weight subspaces rather than a specific structure and weight values. So, QNN modifies the quantum states rather than discarding this network if the network has bad fitness. Thus, a potential structure or set of weight values may not be thrown away owing to the one-to-many or manyto-one mapping problem. Second, the QNN uses observation rather than crossover and mutation to create a new network. Thereby, the negative impact of the permutation problem is reduced. Third, a partitioning strategy is used to find the near-optimal connection weights. It explores each weight space region-by-region and rapidly finds the promising subspace for further exploitation. This is helpful to provide a set of appropriate weights when evolving the network structure.

TABLE VII C OMPARISON OF THE QNNs R ESULTS FOR THE H EART D ATA S ET W ITH T HOSE O BTAINED FOR O THER M ODELS IN T ERMS OF AVERAGE T ESTING E RROR Model

Author

Mean Testing Error

QNN

-

15.62 (1.97) 18.17 (1.08)

(SD) (%) NCA

M. M. Islam et al. [44]

AMGA

M. M. Islam et al. [43]

18.87 (-)

HMOEN_HN

C. K. Goh et al. [42]

18.94 (2.61)

SNG

N. Garcia-Pedrajas et al. [46]

18.97 (5.75)

HMOEN_L2

C. K. Goh et al. [42]

20.31 (2.94)

As in [18]–[20], the proposed model is not designed to deal with various numbers of hidden nodes n h ; hence, the size of the initial structure must be selected. However, it can be seen from Tables II, IV, VI, and VIII that the training errors and the testing errors have considerably low variances for various values of n h . This means that the QNN is robust regarding this parameter. More specifically, the value of n h affects the network size but it does not appear to affect the training ability of the QNN. To study the effect of the partitioning strategy on performance, a comparison of various numbers of quantum bits in Q w (i.e., k) is made. Figs. 6 and 7 show the training and testing error rates of the four benchmark problems with n h = 2, 6, 10, and 14 at k = 0, 1, 2, 3, 4, 5, and 6. As can be seen, the training and generalization performance are significantly improved when the partitioning strategy is used. Taking the iris problem as an example, when k is 0 (i.e., no partitioning is used), the training and testing errors with n h = 10 are 5.57% and 8.07%, respectively. When k = 1, 2, 3, 4, 5, and 6, respectively (i.e., each weight space is divided into 2, 4, 8, 16, 32, and 64 subspaces, respectively), the training and testing errors are decreased to 0.23%, 0.08%, 0.01%, 0.05%, 0.04%, 0.01%, and 1.55%, 1.20%, 1.28%, 1.21%, 1.34%, and 1.28%, respectively. This shows that the partitioning strategy can

LU et al.: ALGORITHM FOR OPTIMIZING ARTIFICIAL NEURAL NETWORKS

1275

Iris

7

Iris

9 nh=2

nh=2 8

nh=6

6

nh=10

Testing error rate (%)

Training error rate (%)

4

3

2

nh=14

6 5 4 3

1

2

0 0

1

2 3 4 Number of quantum bits in Qw

5

1

6

0

1

2 3 4 Number of quantum bits in Qw

Cancer

3.35

5

6

Cancer nh=2

3.3

nh=2 2.2

nh=6 nh=10

3.25

nh=14

3.2

nh=6 nh=10

2

Testing error rate (%)

Training error rate (%)

nh=10

7

nh=14

5

nh=6

3.15 3.1 3.05 3

nh=14

1.8 1.6 1.4 1.2

2.95 1

2.9 2.85

Fig. 6.

0

1

2 3 4 Number of quantum bits in Qw

5

6

0.8

0

1

2 3 4 Number of quantum bits in Qw

5

6

Training and testing error rates of the iris and cancer problems with n h = 2, 6, 10, and 14 at k = 0, 1, 2, 3, 4, 5, and 6. TABLE VIII P ERFORMANCE OF THE QNN FOR THE D IABETES D ATA S ET (A LL R ESULTS W ERE AVERAGED OVER 100 I NDEPENDENT RUNS ) nh

2 3 4 5 6 7 8 10 12 14

Maximum Number of Connections 38 50 63 77 92 108 125 162 203 248

Error Rate (%) Mean 21.41 21.43 21.52 21.52 21.56 21.72 21.76 21.88 21.85 22.06

Training SD Best 0.56 20.31 0.51 20.31 0.45 20.57 0.47 20.57 0.39 20.83 0.40 21.09 0.47 20.57 0.56 20.83 0.50 20.57 0.46 21.09

Worst 23.96 22.92 22.92 23.44 22.66 23.18 23.70 23.70 23.18 23.18

enhance the optimal weights training, which leads to a good evolution of network structure and alleviates the noisy fitness evaluation.

Mean 21.41 21.46 21.44 21.44 21.45 21.59 21.51 21.60 21.53 21.87

Testing SD Best 1.00 19.27 0.93 19.27 0.90 19.27 0.84 19.79 0.98 19.27 0.96 19.27 0.92 18.75 1.05 19.27 0.91 18.23 0.96 20.31

Number of Connections Worst 24.48 24.48 24.48 23.44 23.70 24.48 23.96 25.00 23.96 23.96

18.05 23.80 28.68 34.46 42.14 51.05 57.59 76.16 96.27 117.39

In addition, it also can be seen from Figs. 6 and 7 that QNN is less sensitive to the number of quantum bits in Q w . Indeed, it has a considerably low

1276

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 8, AUGUST 2013 Heart

14 13.5

nh=2

nh=2

nh=6

nh=6

21

nh=10

13

nh=10

nh=14

nh=14

20

12.5

Testing error rate (%)

Training error rate (%)

Heart

22

12 11.5 11

19

18

17

10.5 16

10 9.5

0

1

2 3 4 Number of quantum bits in Qw

5

15 0

6

Diabetes

27

1

2 3 4 Number of quantum bits in Qw

5

Diabetes

27 nh=2

nh=2

nh=6

26

nh=6

26

nh=10

nh=10 nh=14

25

Testing error rate (%)

Training error rate (%)

nh=14

24

23

22

21

Fig. 7.

6

25

24

23

22

0

1

2 3 4 Number of quantum bits in Qw

5

6

21

0

1

2 3 4 Number of quantum bits in Qw

5

6

Training and testing error rates of the heart and diabetes problems with n h = 2, 6, 10, and 14 at k = 0, 1, 2, 3, 4, 5, and 6.

TABLE IX C OMPARISON OF THE QNN S R ESULTS FOR THE D IABETES D ATA S ET W ITH T HOSE O BTAINED FOR O THER M ODELS IN T ERMS OF AVERAGE T ESTING E RROR Model

Author

Mean Testing Error

QNN

-

21.41 (1.00)

HMOEN_L2

C. K. Goh et al. [42]

21.55 (1.22)

(SD) (%)

AMGA

M. M. Islam et al. [43]

21.97 (-)

HMOEN_HN

C. K. Goh et al. [42]

24.64 (1.82)

OMNN

T. B. Ludermir et al. [21]

25.88 (-)

OC1-AP

E. Cantú-Paz et al. [45]

26.00 (0.6)

SNG

N. Garcia-Pedrajas et al. [46]

26.22 (5.17)

variance in the error rate when k is larger than 2. This is helpful for reducing the burden of parameters tuning.

V. C ONCLUSION This paper presented a quantum-based model for the design of ANNs. The model was based on a quantum bit representation for codifying the network. In the evolutionary process, the quantum bits were refined so that the probability of finding the optimal network was increased. The probability representation reduced the negative impact of the permutation problem and the risk of throwing away a potential network. To find near-optimal connection weights, the technique of subspace search using quantum bits was proposed. The algorithm thus performed a region-by-region exploration in the beginning and, as the candidate subspaces were identified, a randomized search in good subspaces was employed for exploitation. The performance of the developed model was evaluated using four classification problems with different features. The experiment results showed that the quantumbased algorithm can automatically generate the appropriate structure and weights of an ANN. The testing error values were better than those obtained by other state-of-the-art models of

LU et al.: ALGORITHM FOR OPTIMIZING ARTIFICIAL NEURAL NETWORKS

evolutionary neural networks, indicating the high potential of the QNN in neural network design. In the future, an appropriate stopping criterion related to the quantum bits will be developed. ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers for their comments, which were very helpful in improving the quality and presentation of this paper. R EFERENCES [1] R. Xuemei and L. Xiaohua, “Identification of extended Hammerstein systems using dynamic self-optimizing neural networks,” IEEE Trans. Neural Netw., vol. 22, no. 8, pp. 1169–1179, Aug. 2011. [2] P. K. Patra, M. Sahu, S. Mohapatra, and R. K. Samantray, “File access prediction using neural networks,” IEEE Trans. Neural Netw., vol. 21, no. 6, pp. 869–882, Jun. 2010. [3] T. H. Oong and N. A. M. Isa, “Adaptive evolutionary artificial neural networks for pattern classification,” IEEE Trans. Neural Netw., vol. 22, no. 11, pp. 1823–1836, Nov. 2011. [4] K. Jonghoon, L. Seongiun, and B. H. Cho, “Complementary cooperation algorithm based on DEKF combined with pattern recognition for SOC/capacity estimation and SOH prediction,” IEEE Trans. Power Electron., vol. 27, no. 1, pp. 436–451, Jan. 2012. [5] X. Yao, “Evolving artificial neural networks,” Proc. IEEE, vol. 87, no. 9, pp. 1423–1447, Sep. 1999. [6] S. Van den Dries and M. A. Wiering, “Neural-fitted TD-leaf learning for playing othello with structured neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 11, pp. 1701–1713, Nov. 2012. [7] M. H. Shih and F. S. Tsai, “Decirculation process in neural network dynamics,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 11, pp. 1677–1689, Nov. 2012. [8] C. Xiao, Z. Cai, Y. Wang, and X. Liu, “Tuning of the structure and parameters of a neural network using a good points set evolutionary strategy,” in Proc. 9th Int. Conf. Young Comput. Sci., Nov. 2008, pp. 1749–1754. [9] R. Parekh, J. Yang, and V. Honavar, “Constructive neural-network learning algorithms for pattern classification,” IEEE Trans. Neural Netw., vol. 11, no. 2, pp. 436–450, Mar. 2000. [10] M. M. Islam, X. Yao, and K. Murase, “A constructive algorithm for training cooperative neural network ensembles,” IEEE Trans. Neural Netw., vol. 14, no. 4, pp. 820–834, Jul. 2003. [11] H. H. Thodberg, “Improving generalization of neural networks through pruning,” Int. J. Neural Syst., vol. 1, no. 4, pp. 317–326, Jul. 1991. [12] R. Reed, “Pruning algorithms-a survey,” IEEE Trans. Neural Netw., vol. 4, no. 5, pp. 740–747, Sep. 1993. [13] P. J. Angeline, G. M. Sauders, and J. B. Pollack, “An evolutionary algorithm that constructs recurrent neural networks,” IEEE Trans. Neural Netw., vol. 5, no. 1, pp. 54–65, Jan. 1994. [14] G. F. Miller, P. M. Todd, and S. U. Hedge, “Designing neural networks,” Neural Netw., vol. 4, pp. 53–60, Nov. 1991. [15] N. Nikolaev and H. Iba, “Learning polynomial feedforward neural networks by genetic programming and backpropagation,” IEEE Trans. Neural Netw., vol. 14, no. 2, pp. 337–350, Mar. 2003. [16] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” in Parallel Distributed Processing, vol. 1, D. E. Rumelhart and J. L. McClelland, Eds. Cambridge, MA, USA: MIT Press, 1986, pp. 318–362. [17] X. Yao and Y. Liu, “A new evolutionary system for evolving artificial neural networks,” IEEE Trans. Neural Netw., vol. 8, no. 3, pp. 694–713, May 1997. [18] F. H. F. Leung, H. K. Lam, S. H. Ling, and P. K. S. Tam, “Tuning of the structure and parameters of a neural network using an improved genetic algorithm,” IEEE Trans. Neural Netw., vol. 14, no. 1, pp. 79–88, Jan. 2003. [19] J. Tsai, J. Chou, and T. Liu, “Tuning the structure and parameters of a neural network by using hybrid Taguchi-genetic algorithm,” IEEE Trans. Neural Netw., vol. 17, no. 1, pp. 69–80, Jan. 2006.

1277

[20] T. B. Ludermir and C. Zanchettin, “An optimization methodology for neural network weights and architectures,” IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1452–1459, Nov. 2006. [21] L. Li and B. Niu, “A hybrid evolutionary system for designing artificial neural networks,” in Proc. Int. Conf. Comput. Sci. Softw. Eng., vol. 4. Dec. 2008, pp. 859–862. [22] R. K. Belew, J. McInerney, and N. N. Schraudolph, “Evolving networks: Using genetic algorithm with connectionist learning,” Dept. Comput. Sci. Eng., Univ. California, San Diego, CA, USA, Tech. Rep. CS90– 174, Feb. 1991. [23] P. J. B. Hancock, “Genetic algorithms and permutation problems: A comparison of recombination operators for neural net structure specification,” in Proc. Int. Workshop Combinat. Genet. Algorithms Neural Netw., Jun. 1992, pp. 108–122. [24] J. D. Schaffer, L. D. Whitley, and L. J. Eshelman, “Combinations of genetic algorithms and neural networks: A survey of the state of the art,” in Proc. Int. Workshop Combinat. Genet. Algorithms Neural Netw., Jun. 1992, pp. 1–37. [25] V. Maniezzo, “Genetic evolution of the topology and weight distribution of neural networks,” IEEE Trans. Neural Netw., vol. 5, no. 1, pp. 39–53, Jan. 1994. [26] J. Fang and Y. Xi, “Neural network design based on evolutionary programming,” Artif. Intell. Eng., vol. 11, no. 2, pp. 155–161, Apr. 1997. [27] X. Yao and Y. Liu, “Toward designing artificial neural networks by evolution,” Appl. Math. Comput., vol. 91, no. 1, pp. 83–90, Apr. 1998. [28] P. Kohn, “Combining genetic algorithms and neural networks,” M.S. thesis, Dept. Comput. Sci., Univ. Tenessee, Knoxville, TN, USA, 1994. [29] K. O. Stanley and R. Miikkulainen, “Evolving neural networks through augmenting topologies,” Evol. Comput., vol. 10, no. 2, pp. 99–127, Jun. 2002. [30] K. O. Stanley, B. D. Bryant, and R. Miikkulainen, “Real-time neuroevolution in the NERO video game,” IEEE Trans. Evol. Comput., vol. 9, no. 6, pp. 653–668, Dec. 2005. [31] T. Mu, J. Jiang, Y. Wang, and J. Y. Goulermas, “Adaptive data embedding framework for multiclass classification,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 8, pp. 1291–1303, Aug. 2012. [32] J. C. F. Pujol and R. Poli, “Evolving the topology and the weights of neural networks using a dual representation,” Appl. Intell., vol. 8, no. 1, pp. 73–84, Jan.–Feb. 1998. [33] K. H. Han and J. H. Kim, “Quantum-inspired evolutionary algorithm for a class of combinatorial optimization,” IEEE Trans. Evol. Comput., vol. 6, no. 6, pp. 580–593, Dec. 2002. [34] K. H. Han and J. H. Kim, “On the analysis of the quantum-inspired evolutionary algorithm with a single individual,” in Proc. Congr. Evol. Comput., Jul. 2006, pp. 2622–2629. [35] R. Zhang and H. Gao, “Improved quantum evolutionary algorithm for combinational optimization problem,” in Proc. 6th Int. Conf. Mach. Learn. Cybern., 2007, pp. 3501–3505. [36] K. H. Han and J. H. Kim, “Quantum-inspired evolutionary algorithms with a new termination criterion, gate, and two-phase scheme,” IEEE Trans. Evol. Comput., vol. 8, no. 2, pp. 156–169, Apr. 2004. [37] L. Jiao, Y. Li, M. Gong, and X. Zhang, “Quantum-inspired immune colonial algorithm for global optimization,” IEEE Trans. Syst. Man Cybern., vol. 35, no. 5, pp. 1234–1253, Apr. 2008. [38] S. Yang, M. Wang, and L. Jiao, “A novel quantum evolutionary algorithm and its application,” in Proc. Congr. Evol. Comput., Jun. 2004, pp. 820–826. [39] H. Liu, G. Zhang, C. Liu, and C. Fang, “A novel memetic algorithm based on real-observation quantum-inspired evolutionary algorithms,” in Proc. Conf. Intell. Syst. Knowl. Eng., vol. 1. Nov. 2008, pp. 486–490. [40] G. Bebis, M. Georgiopoulos, and T. Kasparis, “Coupling weight elimination with genetic algorithms to reduce network size and preserve generalization,” Neurocomputing, vol. 17, pp. 167–194, May 1997. [41] L. Prechelt, “PROBEN1-A set of neural network benchmark problems and benchmarking rules,” Faculty Informat., Univ. Karlsruhe, Karlsruhe, Germany, Tech. Rep. 21/94, Sep. 1994. [42] C. K. Goh, E. J. Teoh, and K. C. Tan, “Hybrid multiobjective evolutionary design for artificial neural networks,” IEEE Trans. Neural Netw., vol. 19, no. 9, pp. 1531–1548, Sep. 2008. [43] M. M. Islam, M. A. Sattar, M. F. Amin, X. Yao, and K. Murase, “A new adaptive merging and growing algorithm for designing artificial neural networks,” IEEE Trans. Syst. Man Cybern., vol. 39, no. 3, pp. 705–722, Jun. 2009.

1278

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 8, AUGUST 2013

[44] M. M. Islam, M. A. Sattar, M. F. Amin, X. Yao, and K. Murase, “A new constructive algorithm for architectural and functional adaptation of artificial neural networks,” IEEE Trans. Syst. Man Cybern., vol. 39, no. 6, pp. 1590–1605, Dec. 2009. [45] E. Cantú-Paz and C. Kamath, “Inducing oblique decision trees with evolutionary algorithms,” IEEE Trans. Evol. Comput., vol. 7, no. 1, pp. 54–68, Feb. 2003. [46] N. Garcia-Pedrajas, C. Hervas-Martinez, and D. Ortiz-Boyer, “Cooperative coevolution of artificial neural networks ensembles for pattern classification,” IEEE Trans. Evol. Comput., vol. 9, no. 3, pp. 271–302, Jun. 2005. [47] L. Ma and K. Khorasani, “Constructive feedforward neural networks using Hermite polynomial activation functions,” IEEE Trans. Neural Netw., vol. 16, no. 4, pp. 821–833, Jul. 2005.

Tzyy-Chyang Lu received the B.S. degree in electrical engineering from National Cheng Kung University, Tainan, Taiwan, the M.S. degree in electrical engineering from I-Shou University, Kaohsiung, Taiwan, and the Ph.D. degree from National Cheng Kung University, Tainan, Taiwan, in 1998, 2000, and 2011, respectively. He is currently a Post-Doctoral Fellow with the Advanced Institute of Manufacturing with Hightech Innovations, National Chung Cheng University, Chia-Yi, Taiwan. His current research interests include optimization algorithm, machine learning, and intelligent controls.

Gwo-Ruey Yu (M’94) received the B.S. and M.S. degrees from National Cheng Kung University, Tainan, Taiwan, in 1988 and 1990, respectively, and the Ph.D. degree in electrical engineering from the University of Southern California, Los Angeles, CA, USA, in 1997. He is currently an Associate Professor with the Department of Electrical Engineering, National Chung Cheng University, Chia-Yi, Taiwan. His current research interests include quantum information science & technology, artificial neural networks, intelligent systems and control, and H∞ robust control. Dr. Yu was the recipient of the Best Technical Paper Presentation Award of the American Automatic Control Council 2004, the Best Paper Award of the 17th National Conference on Fuzzy Theory and its Applications 2009, and the First Prize of the Best Paper Award of 2012 International Conference on Fuzzy Theory and Its Applications.

Jyh-Ching Juang (S’82–M’83–S’84–M’88) received the B.S. and M.S. degrees from National Chiao-Tung University, Hsinchu, Taiwan, in 1980 and 1982, respectively, and the Ph.D. degree in electrical engineering from the University of Southern California, Los Angeles, Los Angeles, CA, USA, in 1987. He was with Lockheed Aeronautical System Company, Burbank, CA, before joined the Faculty of the Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan, in 1993. He is currently a Professor with the Department of Electrical Engineering, National Cheng Kung University. His current research interests include sensor networks, GNSS signal processing, and software-based receivers.

Quantum-based algorithm for optimizing artificial neural networks.

This paper presents a quantum-based algorithm for evolving artificial neural networks (ANNs). The aim is to design an ANN with few connections and hig...
669KB Sizes 0 Downloads 0 Views