480

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 3, MARCH 2012

Sensitivity-Based Adaptive Learning Rules for Binary Feedforward Neural Networks Shuiming Zhong, Xiaoqin Zeng, Shengli Wu, and Lixin Han

Abstract— This paper proposes a set of adaptive learning rules for binary feedforward neural networks (BFNNs) by means of the sensitivity measure that is established to investigate the effect of a BFNN’s weight variation on its output. The rules are based on three basic adaptive learning principles: the benefit principle, the minimal disturbance principle, and the burden-sharing principle. In order to follow the benefit principle and the minimal disturbance principle, a neuron selection rule and a weight adaptation rule are developed. Besides, a learning control rule is developed to follow the burden-sharing principle. The advantage of the rules is that they can effectively guide the BFNN’s learning to conduct constructive adaptations and avoid destructive ones. With these rules, a sensitivity-based adaptive learning (SBALR) algorithm for BFNNs is presented. Experimental results on a number of benchmark data demonstrate that the SBALR algorithm has better learning performance than the Madaline rule II and backpropagation algorithms. Index Terms— Adaptive learning algorithm, feedforward neural networks, learning rule, sensitivity.

binary

I. I NTRODUCTION

T

HE ability to learn is the foremost function of neural networks. Hence, to build a proper learning mechanism is a key issue for all types of neural networks. In this paper, we focus on the learning mechanism of binary feedforward neural networks (BFNNs), also known as Madalines, with a symmetric hard-limiting active function and aim at improving its learning performance. A BFNN is a discrete feedforward neural network (DFNN) with a supervised learning mechanism, which is suitable for handling inherently discrete tasks, such as logical calculation, pattern recognition, etc., [1], [2]. However, BFNNs have not yet had a satisfactory adaptive learning algorithm so far. Although a discrete task can theoretically be regarded as a special case of a continuous one and thus be tackled by continuous feedforward neural networks (CFNNs) with the well-known backpropagation (BP) learning algorithm [3], BFNNs do have some obvious advantages over

Manuscript received May 7, 2011; revised November 7, 2011; accepted November 19, 2011. Date of publication January 18, 2012; date of current version February 29, 2012. This work was supported in part by the National Natural Science Foundation of China under Grant 60571048 and Grant 60971088, the Research Innovation Program for College Postgraduate in Jiangsu Province under Grant CX09B 157Z, and the Excellent Doctoral Dissertation Cultivation Program of Hohai University under Grant 2010B19214. S. Zhong, X. Zeng, and L. Han are with the Institute of Intelligence Science and Technology, Hohai University, Nanjing 210098, China (e-mail: [email protected]; [email protected]; [email protected]). S. Wu is with the School of Computing and Mathematics, University of Ulster, Northern Ireland BT370QB, U.K. (e-mail: [email protected]). Digital Object Identifier 10.1109/TNNLS.2011.2177860

CFNNs in nature, which are: 1) ease of description of discrete tasks without extra requirement of discretization that can mostly cause more or less imprecision; 2) simplicity in computation and explanation with hard-limit activation function and limited input and output states; and 3) facility for hardware implementation with the available VLSI technology. Besides, the BP algorithm itself is still not perfect in performance because it easily falls into a local minimum, which is attributed to the use of the gradient descent technique. However, although it is difficult to theoretically prove that a BFNNs learning mechanism can guarantee overcoming this drawback, the gradient descent technique is no longer suitable for BFNNs because its discrete activation function (hard-limiting function) is not differentiable, which can at least avoid the cause of the drawback in the BFNNs learning mechanism. In the literature, there are many approaches to BFNN learning. On the whole, they can be divided into two main ones. One is the adaptive learning approach, in which the most popular one is Madaline Rule II (MRII) [4], [5], which is extended from Ridway’s algorithm [6] also known as MRI. Both MRII and MRI apply the Mays Rule [7], a variation of the perceptron rule [8] to BFNNs. Unfortunately, these algorithms are too poor in performance to be used in practical applications [4], [5]. The other one is called the geometrical construction approach [9]–[12], which aims to fabricate a set of hyperplanes based on the linearly separable function of BFNNs’ neuron to meet the input–output mapping implied in the training dataset. One advantage of the geometrical construction approach is that it can automatically determine BFNNs’ structures and parameters with the number of hyperplanes and their normal vectors. But the size of BFNNs constructed by this way varies much in terms of different input sample sequences. Moreover, for the same learning task, the BFNNs constructed by the geometrical approach are generally much larger than those trained by the conventional adaptive learning approach. It is well known that too large a structure can weaken the network generalization performance, increase the computation complexity, and complicate hardware implementation. Therefore, further investigation of the adaptive learning algorithm of BFNNs is highly desirable. In the design of an adaptive learning mechanism, which parameter to adapt and how to adapt are two key issues. Apparently, the parameter adaptation of a BFNN aims at changing its output to establish the input–output mapping implied in the training dataset. However, it should be noted that though the output change due to the parameter adaptation

1045–9227/$31.00 © 2012 IEEE

ZHONG et al.: SENSITIVITY-BASED ADAPTIVE LEARNING RULES FOR BFNNs

Input

x1 x2

Weight

w1

Adaline’s action

xn Fig. 1.

Input X

1

Layer (L−1) Layer L Output Y L

Layer 1

L−2 n2 n

w0 x11

w2

 xn−1

Output

481

f(•)

y

wn−1 wn

Model of an Adaline.

x12

n2 nL−2

yL1

x1n0−1

L−2 n2 n

yLn L

1 n0

x

n2 nL−2 n0 inputs Fig. 2.

may reduce the network output errors, it may also increase the network output errors. Therefore, it is necessary for the learning mechanism to be able to qualitatively analyze or quantitatively measure the positive or negative effect of the parameter adaptation on the BFNN output errors. In this paper, on the basis of three basic adaptive learning principles, namely, the benefit principle, the minimal disturbance principle [4], [5], [13], and the burden-sharing principle, we first analyze the BFNN output change due to weight adaptation, and then discuss how to quantitatively measure the change by means of the BFNN sensitivity. The output sensitivity of a neural network to its parameter variation is conceptually regarded as a significant measure for evaluating the network’s performance, and thus has drawn many researchers’ attention. Even recently, new research results and applications continue to appear in the literature [14], [15]. The sensitivity of BFNN [16]–[18] aims at investigating the effect of weight variation of the BFNN on its output. In this sense, sensitivity can be employed to measure the BFNN output change in training. Actually, one can find that the confidence level (the summation of weighted inputs) used as a measure in MRII [5] to locate adaptive neuron does not implement the minimal disturbance principle well because it can only reflect the BFNN output change for the current input but not for all non-current inputs. This may result in learning from the current training sample while forgetting what was previously learned from other training samples, which may be the main cause of MRII’s poor performance. Fortunately, the sensitivity measure that reflects the BFNN output change due to weight adaptation with respect to all possible inputs can overcome this shortcoming by means of properly locating the neuron in need of training and exactly determining the adaptive value of the weight in need of adaptation. The main contribution of this paper is that it proposes, by originally applying the BFNN sensitivity measure to implement some abstract adaptive learning principles, three new concrete adaptive learning rules for BFNNs. They are the neuron selection rule, the weight adaptation rule, and the learning control rule. With the rules, a new adaptive learning algorithm of BFNNs can be programmed to more accurately locate the neurons in real need of adaptation, to properly determine the weight adaptation, to evenly allocate learning burdens among all neurons of the BFNN in training, and, all in all, to greatly improve the learning performance of BFNNs.

n1 neurons

nL−1 neurons

nL neurons

Model of a BFNN.

The rest of this paper is organized as follows. In the next section, the BFNN model and its sensitivity are briefly described. Then, three basic adaptive learning principles are discussed in Section III. Based on the principles and by means of the sensitivity measure, a set of adaptive learning rules are developed in Section IV. Next, in Section V, is the new learning algorithm [sensitivity-based adaptive learning rule (SBALR) algorithm] programmed with the rules for BFNNs. Experimental verifications are given in Section VI. Finally, Section VII concludes this paper and discusses our future work on BFNN’s learning mechanism. II. BFNN M ODEL AND S ENSITIVITY M EASURE In order to facilitate following discussions, this section briefly introduces the BFNN model, as well as some notations, and the BFNN sensitivity. A. Models and Notations A BFNN is a kind of discrete multilayer feedforward neural network with a supervised learning mechanism; it consists of a set of neurons, also known as Adalines, with a binary input, an output, and a hard-limiting activation function. Fig. 1 is the Adaline model. The input of an Adaline, which is represented by X = (x 0 , x 1 , . . . , x n )T including an extra element x 0 (≡1) corresponding to a bias w0 , is first weighted by the weight W = (w0 , w1 , . . . , wn )T containing the bias, and then fed to an activation function f (·) that yields an output of the Adaline as

T

y = f (X W ) =

⎧ ⎪ ⎪ ⎨ −1 ⎪ ⎪ ⎩ +1

n  j =1 n  j =1

x j w j + w0 < 0 (1) x j w j + w0 ≥ 0.

In general, a BFNN has L layers, and each layer l(1 ≤ l ≤ L) has nl (nl ≥ 1) neurons. The form of n 0 − n 1 − · · · − n L is used to represent the BFNN with the indication of its structural configuration, in which each nl (0 ≤ l ≤ L) not only stands for a layer but also indicates the number of neurons in the layer. n 0 is an exception, which denotes the input dimension of the BFNN. For the j th layer,

482

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 3, MARCH 2012

X j (1 ≤ j ≤ L) denotes the input of all the neurons in that layer, and Y j (1 ≤ j ≤ L) denotes the output of the layer. They meet at Y k−1 = X k (2 ≤ k ≤ L). Particularly, X 1 denotes not only the input of all neurons in the first layer but also the input of the entire BFNN, n L denotes the output layer, and Y L is the output of both the output layer and the entire BFNN. Fig. 2 is the model of a BFNN. Functionally, a BFNN can realize a logical mapping from n 0 dimension input space to n L dimension output space, namely 0 L f : {−1, 1}n → {−1, 1}n . For simplicity and without loss of generality, the following discussions only focus on BFNNs with a single hidden layer since the results can be easily extended to the BFNNs with more than one hidden layer. B. BFNN Sensitivity Usually, adaptive supervised learning is a process of iteratively adapting weights to change the network output so that the input–output mapping implied in a training dataset can be established. Obviously, the network output variation due to its weight adaptation is a key issue in the network learning mechanism. The study on BFNN sensitivity aims at exploring the effects of a BFNN parameter’s variation on its output. The following subsections will briefly introduce research results on the BFNN sensitivity, which will be employed as a technical tool to support the investigation of the BFNN learning mechanism. For further details, refer to [18]. 1) Neuron Sensitivity: Definition 1: The sensitivity of an Adaline is defined as the probability of the Adaline’s output inversion (from 1 to –1, and vice versa) due to its weight variation with respect to all inputs, which can be expressed as s(W ) = P( f (W T X) = f (W T X)) =

Vvar Vn

(3) 

= arccos((W W )/(|W ||W |))/π ≈ (|W |/|W |)/π for |W | |W | 

where W , W , and W refer to the original weight, the weight variation, and the varied weight, respectively. Due to the information transmission between layers in a BFNN, an Adaline’s sensitivity will lead to the corresponding input variation of all Adalines in its next layer, so the Adaline’s sensitivity due to its input variation also needs to be taken into account. The earlier study [18] has also shown that it can be easily computed by transforming the input variation to the equivalent weight variation. The formula is described as follows: 

 n n  2 π (4) wi wi wi s(X) ≈ arccos i=0

i=0

Definition 3: The sensitivity of a BFNN is the sensitivity of its output layer Snet = S L = (s1L , s2L , . . . , snLL )T .

(7)

During training, it is important to quantify the BFNN sensitivity due to its weight adaptation. Usually, there are two ways to quantify the BFNN sensitivity. One is the number of inputs at which BFNN outputs as vectors are varied, the other is the number of output elements that are varied before and after the weight adaptation. Apparently, for BFNNs with a vector output, the latter can more truly reflect their output variation before and after the weight adaptation. Therefore, the sensitivity of a BFNN can be further quantified as follows: nL 

snet =





2) Network Sensitivity: Based on the structural feature of BFNNs and the sensitivity of Adalines, the sensitivity of a layer and the sensitivity of a BFNN can be defined as follows. Definition 2: The sensitivity of layer l(1 ≤ l ≤ L) is a vector in which each element is the sensitivity of the corresponding neuron in the layer due to its input and weight variation, which is expressed as (6) S l = (s1l , s2l , . . . , snl l )T .

(2)

where W refer to the varied weight, Vvar is the number of inputs at which the Adaline outputs are inversed due to the weight variation, and Vn is the total number of inputs. Previous study [18] has shown that the Adaline sensitivity can be approximately computed by the following formula: s(W ) ≈ E(s)



−wi i ∈ { j1 , j2 , . . . , j K } , X denotes the otherwise wi input variation in which only K elements are varied, and jt (1 ≤ t ≤ K ) denotes that the jt th input element of the Adaline is varied, and its corresponding equivalent varied weight element is wjt . Especially, when each weight element of an Adaline is on the same magnitude level, (4) can be further simplified as

for K n. (5) s(X) ≈ 4K /(n + 1) π, 

where wi =

i=1

siL · Vn

n L · Vn

L

n 1  L = L si n

(8)

i=1

where Vn is the number of all inputs. From (8), the BFNN sensitivity is equal to the average of sensitivities of all output layer neurons. III. T HREE BASIC L EARNING P RINCIPLES Usually, the weight adaptation of a BFNN may inevitably cause its output change with respect to all possible inputs. Due to the binary feature of the BFNN output, it is appropriate to measure the output change in the number of states rather than in the actual magnitude. Therefore, the effects of weight adaptation during training on the output of the BFNN can be measured by output variation =

Vn T  1  L L |yi,1 |, . . . , |yi,n L| 2 i=1

= (s1L , . . . , snLL )T Vn = Snet Vn

(9)

L , . . . , y L )T = (Y L ) − Y L , Y L and (Y L ) where (yi,1 i i i i i,n L are the BFNN outputs for the i th input X i before and after the weight adaptation, respectively, Vn is the number of all possible inputs.

ZHONG et al.: SENSITIVITY-BASED ADAPTIVE LEARNING RULES FOR BFNNs

483

The adaptive learning of a BFNN is a process in which its weights are iteratively adapted in order to meet the input– output mapping given by the training dataset. Therefore, it is necessary to evaluate in each iteration the effect of the BFNN output change on the learning performance by building a performance index. Since the mean square error (MSE), which is often employed as the performance index of the CFNNs, is no longer suitable for BFNNs, the performance index of BFNNs, similar to (9), can be expressed as

From (11), the output disturbance due to the weight adaptation reflects the number of changed output elements of the BFNN for the non-current inputs. According to (9), the BFNN sensitivity due to its weight adaptation is in direct proportion to the output disturbance. Hence, the BFNN sensitivity could be used to reasonably evaluate the output disturbance. Similar to the quantification of the BFNNs sensitivity as (8), the sensitivity for measuring the output disturbance can be expressed as

Vn T  1 L L |ei,1 |, . . . , |ei,n output errors = L| 2

(10)

i=1

L , . . . , e L )T where (ei,1 i,n L output for the i th input X i

(YiL ) − Di ,

(YiL )

= is the BFNN after the weight adaptation, Di is the desired output for the i th input X i , and Vn is the number of all possible inputs. During the training process, the weight adaptation aims at decreasing the output errors for the current training sample, and even for the overall training samples. The weight adaptation is referred to as a benefit adaptation if it is beneficial for reducing the output errors, and the enforcement of each weight adaptation during training being a benefit adaptation is regarded as the benefit principle. In general, the benefit principle has three different levels: 1) a weight adaptation can make the output errors decrease for overall training samples, which is ideal, or the highest level; 2) a weight adaptation can at least make the output errors decrease for the current training sample, which is the middle level; and 3) even if a weight adaptation cannot directly make the output errors decrease for the current training sample, it should indirectly contribute somewhat toward the output errors decrease, which is the lowest level. The introduction of the three levels of the benefit principle is to emphasize that it may need to accept the lowest level of benefit for weight adaptation. However, assessing whether the output change is beneficial for reducing the output errors is not easy. On one hand, this assessment for non-training inputs is impossible because we do not know their desired output, on the other, the assessment by counting the change of output errors for the overall training samples is rather time consuming. Therefore, for all noncurrent inputs, the output change is always viewed as a negative factor, which is herein called an output disturbance

Vn    1  L   L  T yi,1  , . . . , yi,n L 2 i=1,i =k (11) where yi,L j (1 ≤ j ≤ n L ) is the j th element of the BFNN output variation for the i th input X i due to the weight adaptation, Vn is the number of all possible inputs, and k indicates the current sample. Naturally, during training, it is always required that the output disturbance be as small as possible. This requirement is regarded as the minimal disturbance principle, which was first proposed by Winter and Widrow in MRII [4], [5], [13]. In order to implement the minimal disturbance principle during the training of a BFNN, it is necessary to quantitatively measure its output disturbance due to the weight adaptation.

output disturbance =

output disturbance ∝ sensitivity measure . 1 ≤ l ≤ L, 1 ≤ i ≤ nl . = snet (Wil ),

(12)

In addition to the weight adaptation, the learning burden allocation is also one of important factors that affect the BFNNs learning performance. It is generally necessary to allocate the learning burden to each neuron as uniformly as possible in a BFNN. This requirement is regarded as the burden-sharing principle. Since the BFNN learning consists of iterative neuron selection for locating the weights in need of adaptation, each selected neuron should bear more or less learning burdens. Thus, it is reasonable to use the weight adaptation number of each neuron as a measure to evaluate the learning burden allocation during training learning burden measure . = (. . . , < b1l , b2l , . . . , bnl l >, . . .), 1 ≤ l ≤ L

(13)

where bil (1 ≤ i ≤ n l ) is the weight adaptation number of the i th neuron in the lth layer. The three basic learning principles mentioned above are just conceptual guidelines. In order to apply the principles to the BFNN learning mechanism, more concrete learning rules and applicable measuring tools need to be further developed. IV. L EARNING RULES OF BFNN S In general, the BFNN learning mechanism involves two basic questions: 1) which neuron’s weight should be adapted; and 2) how to adapt the weight(s) of the selected neuron(s). To answer the first question, an effective way to select the neuron(s) in need of adaptation is necessary to be developed. To answer the second question, an effective way to determine the direction and the magnitude of a weight adaptation is necessary to be developed. In this section, based on the three basic learning principles as well as the perceptron rule [8], and by means of the BFNN sensitivity measure, three new learning rules are developed to answer the two questions. A. Neuron Selection Rule For CFNNs with the support of the gradient descent technique, all neurons take part in adaptation for each iteration. However, due to the discrete feature of BFNNs, the organization of adapting neurons is more complicated. For BFNNs, when output errors occur, the easiest way is to directly adapt the weights of output-layer neurons whose outputs are wrong. But it is well known that a single-layer BFNN can only handle linearly separable problems. So, the precondition of directly adapting output-layer neurons is that the hidden-layer outputs must be linearly separable, otherwise,

484

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 3, MARCH 2012

the BFNN learning will get into the situation of no solution. Under such consideration, the priority of weight adaptation should be given to the neurons in the hidden layer. As the information flow in BFNNs is always one-way propagation from the input layer to the output layer, it is apparent that the neurons in the former layer should be before those in latter layer. Then, the next question is how to organize neurons in the same hidden layer to learn. First, according to the benefit principle, the adaptation of the selected neuron or the selected neuron combination must be able to reduce the output errors of the BFNN in training. This is easy to be judged by the following way, called “trial reversion”: namely, first reverse the output(s) of the selected neuron(s), then compute the possible output of the BFNN and check the number of output errors of the BFNN, and, if the number of output errors is reduced, we regard the selected neuron(s) as meeting the benefit principle. Second, according to the minimal disturbance principle, the weight adaptation of the selected neuron(s) must also minimize the BFNN output disturbance. However, in the same hidden layer, there may be more than one neuron or neuron combination that meets the benefit principle, but the magnitudes of output disturbances caused by them are usually different. Therefore, to meet the benefit principle, it is necessary to choose the one with the minimal disturbance. According to (12), the BFNN sensitivity can be used to properly measure the output disturbance due to the weight adaptation. So the sensitivity measure for selecting the hiddenlayer neurons can be simply expressed as follows: . (14) sensitivity measure = snet (W 1j ), 1 ≤ j ≤ n 1

minimal disturbance principle can be described as follows: the weight adaptation priority should be given to the neuron or adjacent neuron combination that meets the benefit principle and meanwhile has the least sensitivity value.

where snet (W 1j ) stands for the BFNN sensitivity due to the weight adaptation of the j th neuron in the hidden layer. According to (8), the key to compute (14) is to compute the sensitivity of each output-layer neuron. Notice the two points: 1) in order to meet the minimal disturbance principle, the weight adaptation is always set small enough to meet |W 1j | |W 1j |; and 2) the constraint in (5) can be met when the number of hidden-layer neurons, namely n 1 , is much bigger than 1 (Winter’s experiences indicate that n 1 ≥ 3 are good parameters to use [5]). Thus, [18] shows that the sensitivity of the i th output-layer neuron due to the weight adaptation of the j th hidden-layer neuron can be calculated as

where W and W  are the original weight and the adapted weight, respectively, X is the current training sample’s input, d(∈ {−1, 1}) is the corresponding desired output of the neuron, and β(>0) is the learning rate that determines the magnitude of weight adaptation. Since β is greater than 0, the weight adaptation direction determined by d X can guarantee that the weight adaptation will contribute to the output inversion of the neuron. So, the Perceptron rule always meets the benefit principle, at least the lowest level of the benefit principle. Similar to the perceptron rule, the weight adaptation rule for BFNNs can be derived to meet both the benefit principle and the minimal disturbance principle. The main task herein is to set appropriate magnitude of weight adaptation, namely the value of β. Although the weight adaptation direction determined by d X can guarantee (17) to at least meet the lowest level of the benefit principle, it is more desirable to realize an instant inversion of a neuron’s output. With the goal of realizing instant inversion and based on the benefit principle, one can derive a reasonable setting for β as (18) below. For a detailed derivation, refer to Appendix A. ⎧ |(Wi1 )T X 1 | . ⎪ ⎪ = a1 for a hidden − layer neuron ⎨ n1 (18) β≥ ⎪ 2 T 2 ⎪ . ⎩ |(W j ) X | = a2 for an output − layer neuron n2

si2 (W 1j ) = s 1j (W 1j ) si2 (X i2 )  2 ≈ (2/π ) 1/(n 1 + 1)(|W 1j |/|W 1j |) for |W 1j | |W 1j |, n 1 1.

(15)

We can see from (15) and (8) that for a BFNN with the given structure, the magnitude of snet (W 1j ) only depends on the weight variation ratio of the hidden-layer neuron, namely |W 1j |/|W 1j | (1 ≤ j ≤ n 1 ). Hence, the sensitivity measure for selecting the hidden-layer neurons can be further simplified as sensitivity measure ≈ |W 1j |/|W 1j |, 1 ≤ j ≤ n 1 .

(16)

Therefore, by means of the sensitivity measure, the neuron selection rule based on both the benefit principle and the

B. Weight Adaptation Rule For the adaptive learning mechanism of BFNNs, the establishment of an appropriate weight adaptation rule is most important. According to the benefit principle and the binary feature of a BFNNs output, the direct goal of weight adaptation of a neuron aims at inversing its output so as to reduce the BFNNs output errors for the current training sample. However, the adaptation may to some extent change the input– output mapping of the BFNN established by previous training samples, that is, it may disturb the BFNN behavior learned previously. So, a weight adaptation rule should not only correct the BFNNs output for the current training sample but also limit the output disturbance as small as possible. In order to meet the above demands, some measures need to be taken in two aspects. One is to find the appropriate direction of weight adaptation that can make the rule meet the benefit principle. The other is to find appropriate magnitude of weight adaptation that can set a reasonable balance between the beneficial adaptation and the minimal disturbance, especially when they fall into conflicting situations. It is well known that the perceptron rule [8] can surely make the inversion of a neuron’s output for the current training sample within limited iterations. The Perceptron rule can be expressed as (17) W  = W + dβ X

ZHONG et al.: SENSITIVITY-BASED ADAPTIVE LEARNING RULES FOR BFNNs

485

where i (1 ≤ i ≤ n 1 ) indicates the i th neuron in the hidden layer, and j (1 ≤ j ≤ n 2 ) indicates the j th neuron in the output layer. Weight adaptation with (17) and even with (18) ensures that the benefit principle can be met, but the output disturbance cannot be mostly avoided and sometimes may be too large to be acceptable. Therefore, in order to follow the minimal disturbance principle, it is necessary to set up an allowable maximum disturbance in advance. According to (12), the allowable maximum disturbance is just the allowable maximum sensitivity, denoted as max_snet . How to set max_snet will be discussed in Section IV-C. According to (8), the maximum BFNN sensitivity, i.e., max_snet , is also equal to the maximum average sensitivity of output-layer neurons, denoted as max_s 2j

adaptation in order to achieve a reasonable balance between the benefit adaptation and the minimal disturbance adaptation. The weight adaptation rule for BFNNs is a combination of them, and can be finally expressed as ⎧ W + d min(a1 , b1 )X ⎪ ⎪ ⎨ . . . . . . . . . for a hidden − layer neuron (22) W = W + d min(a2 , b2 )X ⎪ ⎪ ⎩ . . . . . . . . . for an output − layer neuron.

max_snet = max_s 2j ,

1 ≤ j ≤ n2 .

(19)

By means of the BFNN sensitivity and based on the minimal disturbance principle, a reasonable setting for β is shown in (20) below. For a detailed derivation, see Appendix B.  ⎧ 2 |W 1 | ⎪ π n1 + 1 ⎪ . i ⎪ ⎪ max _snet = b1 ⎪ 0+1 ⎪ 2 n ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ . . . . . . . . . for a hidden − layer neuron (20) β≤ ⎪ ⎪ 2 ⎪ π|W j | ⎪ . ⎪ ⎪ max _snet = b2 √ ⎪ ⎪ 1 ⎪ n +1 ⎪ ⎪ ⎪ ⎪ ⎩ . . . . . . . . . for an output − layer neuron where i (1 ≤ i ≤ n 1 ) indicates the i th neuron in the hidden layer, and j (1 ≤ j ≤ n 2 ) indicates the j th neuron in the output layer. By combining (18) and (20), the value field of β, which can ensure the weight adaptation to meet both the benefit principle and the minimal disturbance principle, can be expressed as   (β ≥ a1 ) (β ≤ b1 ) for a hidden − layer neuron  (21) (β ≥ a2 ) (β ≤ b2 ) for an output − layer neuron. Now, the question is how to determine a specific value of β from (21). This can be divided into two cases. One case is that (21) has solutions. In other words, any solution of (21) can ensure that the weight adaptation meets both the benefit principle and the minimal disturbance principle. In this case, the selected value of β must comply with the criterion, called Criterion I, that the output disturbance should be as small as possible only if it does not violate the benefit principle. The other case is that (21) has no solution. In other words, there is no β that can meet both the benefit principle and the minimal disturbance principle. In this case, the selected β must comply with the criterion, called Criterion II, that the minimal disturbance principle must be given priority to be met by degrading the level of the benefit principle. However, under this premise, β should be as large as possible so that the neuron’s output is more inclined to be inversed. In a word, the above two criteria can be used as a guide for the setting of the learning parameter β to ensure weight

C. Learning Control Rule In this subsection, the effect of the adaptation number of a neuron on the learning performance will be discussed. Apart from the neuron selection rule and the weight adaptation rule, the adaptation number of a neuron is another important factor, which can reflect learning burden borne by the neuron. A large number of experiments show that almost all of BFNNs that failed in training have a common phenomenon, the adaptation number of a few neurons is far more than that of other neurons: in other words, there are a few neurons whose weight is frequently adapted. Apparently, these failed cases violated the burden-sharing principle. According to the burden-sharing principle, the learning burden should be allocated as uniformly as possible to each neuron. For BFNNs, as (13) shows, the learning burden allocation is reflected in the neuron adaptation, so that the burden-sharing principle can be implemented by controlling the adaptation number of neurons. In general, for the issue of the adaptation number of a neuron, we may look at it at two different levels: the total adaptation number for all training samples and the adaptation number for the current training sample. The former is called the global adaptation number while the latter is called the local adaptation number. As early as 1989, Winter studied the effect of the global adaptation number of a neuron, called “usage,” on the learning performance of BFNNs, and took “usage” into account in hidden-layer neuron selection, which was a successful attempt and the main contribution of MRII. In order to integrate it into the confidence level of MRII, MRII formed the following expression [5]: global adapation number (23) MU LT ∗ M where M is the number of samples in the training set, and MU LT is a multiplier, being generally MU LT set to 5 [5]. Our experiments also verify its validity. In order to implement the burden-sharing principle, here we integrate (23) into the sensitivity measure for selecting the hidden-layer neurons as sensitivity measure

 global adapation number snet (W 1j ) = 1+ MU LT ∗ M

 1 global adapation number |W j | ≈ 1+ MU LT ∗ M |W 1j | . . . . . . . . . for n 1 1 and |W 1j |/|W 1j | 1

(24)

where 1 ≤ j ≤ n 1 . In fact, it is not enough to only consider the “usage” of a neuron, i.e., the global adaptation number. Our experiments

486

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 3, MARCH 2012

show that the adaptation number of a neuron for the current training sample, i.e., the local adaptation number, that is not be considered by MRII, is also another important factor affecting the learning performance of BFNNs. Intuitively, re-adapting the same neuron during training for the current training sample will weaken the effect of its previous adaptation to some extent. Therefore, the local adaptation number should also be introduced into the learning mechanism so as to make the learning mechanism in better conformity with the burden sharing principle. In this paper, a neuron is allowed to adapt only once for the current training sample during the training process. The learning process of a BFNN almost inevitably leads to output disturbance, which mainly comes from the weight adaptation of the selected neuron(s). So, in order to make the weight adaptation follow the minimal disturbance principle, how to set allowable maximum network sensitivity, i.e., max_snet , is also a very important issue in the control rule. Intuitively, the setting of max_snet is application-dependent, which depends on the difficulty and the size of the learning task. In general, the more difficult the learning task is, the smaller max_snet should be set, the larger the size of the learning task is, the smaller max_snet should be set. However, the degree of difficulty of the learning task is often hard to grasp, and thus in applications the setting of max_snet is very much dependent on the size of the learning task that can be reflected by the input dimension of a BFNN. V. N EW L EARNING A LGORITHM OF BFNN S In addition to the learning rules mentioned above, it is also necessary to discuss how to determine the desired output of each neuron. As the weight adaptation rule (22) shows, the desired output of a neuron must be determined in advance. In general, the desired output of a neuron can be achieved by a trial, which is described as follows: first, reverse the output of a neuron or the outputs of a neuron combination, and then re-compute the output of the BFNN in training, if output errors of the BFNN are reduced, the reversed output of the neuron is its current desired output. In other words, if the output inversion of a neuron or the output inversions of a neuron combination can reduce the output errors of the BFNN, its desired output is the opposite of the current output. The learning process of a BFNN can be logically divided into four different phases from the smallest to largest in granularity. The smallest one is called the “neuron learning phase” in which a neuron’s weight is adapted according to the weight adaptation rule of (22), so that the BFNN output errors for the current training sample may decrease or tend to decrease. The second one is called the “layer learning phase” which is composed of several neuron learning phases for a layer. During the layer learning phase, the neurons in the layer start to learn according to the neuron selection rule. The third one is called the “sample learning phase” which is composed of all layer learning phases. During the sample learning phase, a training sample is fed into the BFNN, and then the BFNNs layers start to learn from the first layer to the output layer until the BFNNs output meets the desired output of current

SBALR Algorithm Input: A BFNN with given structure as n 0 − n 1 − n 2 and random initial weights, a training data set with Msample samples, maximum iterations, the training precision requirement, and the allowable maximum sensitivity max_snet to the weight adaptation of a neuron. 1) Randomly arrange training samples. 2) For i from 1 to Msample , feed training sample i into the BFNN. a) If the BFNN responds correctly to current training sample, go to step 2). b) Obtain the weight increment of each hidden-layer neuron by the weight adaptation rule of (22), and then compute the values of their sensitivity measure by (24). c) Based on the neuron selection rule, sort hiddenlayer neurons into a queue in ascending order according to values of their sensitivity measure. d) For j , the number of neurons, from 1 to the length of the queue do. i) For all possible adjacent neuron combinations with j neurons do. A) Conduct the trial reversion for current neuron combination. B) If the output errors of the BFNN do not decrease, reject the adaptation, and go on the next neuron combination. C) Adapt the weights of neurons in current neuron combination by the weight adaptation rule of (22), and count the adaptation number of each neuron in current neuron combination by adding 1. D) If the BFNNs output errors are equal to zero for current training sample, go to step 2), else according to the learning control rule, delete neurons of current neuron combination from the queue, and go to step d). e) For output-layer neuron k from 1 to n 2 do. i) If the output of the kth output-layer neuron is wrong for current training sample, adapt its weight by the weight adaptation rule of (22). 3) Go to step 1) unless the training meets the precision requirement, or the iteration reaches the maximum number given in advance. Output: The weights, the training accuracy and the summation of adaptation number of all neurons of the BFNN.

training sample or the output-layer learning quits. The largest one is called the “network learning phase” which comprises all sample learning phases. During the network learning phase, the samples from the training dataset are trained one by one until output errors of the BFNN for each training sample are zero or reach a given precision. Based on the SBALR, a learning algorithm of BFNNs can be described in SBALR Algorithm.

Data set

Learning convergence rate (%)

Learning convergence rate (%)

Class 2 2 2 10 10 10 10 10 10 3 2

Set size Training Testing 124 432 169 432 122 432 200 1000 400 500 2000 500 200 1000 700 500 1000 500 463 162 616 408

Iteration 2000 2000 2000 2000 2000 2000 2000 2000 2000 5000 50000

120

BP MRII SBALR

100 80 60 40 20 0

Monks−1(10−3−1)Monks−1(10−4−1)Monks−2(10−2−1)Monks−2(10−3−1)Monks−3(10−2−1)Monks−3(10−3−1)

Data set (network architecture) (a) 120

BP MRII SBALR

100 80 60 40 20 0

Led7−1(7−4−4)Led7−2(7−4−4)Led7−3(7−4−4)Led24−1(24−4−4)Led24−2(24−4−4)Led24−3(24−4−4)

Data set (network architecture) (b) 120

BP MRII SBALR

100 80 60 40 20 0

Monks−1(10−3−1)Monks−1(10−4−1)Monks−2(10−2−1)Monks−2(10−3−1)Monks−3(10−2−1)Monks−3(10−3−1)

Data set (network architecture) (a) 120

BP MRII SBALR

100

Learning success rate (%)

Learning convergence rate (%)

Monks-1 Monks-2 Monks-3 Led7-1 Led7-2 Led7-3 Led24-1 Led24-2 Led24-3 Balance-scale 10-bit parity

Attribute (bit code) 10 10 10 7 7 7 24 24 24 12 10

Learning success rate (%)

TABLE I D ATASETS U SED IN T HIS PAPER

487

Learning success rate (%)

ZHONG et al.: SENSITIVITY-BASED ADAPTIVE LEARNING RULES FOR BFNNs

120 100

BP MRII SBALR

80 60 40 20 0

10−bit parity(10−12−1) Balance−Scale(12−5−2) Data set (network architecture) (c)

Fig. 4. Comparison of learning success rate of the three algorithms. (a) Monks problem. (b) Led problem. (c) Parity problem and balance problem.

80 60 40 20 0

Led7−1(7−4−4)Led7−2(7−4−4)Led7−3(7−4−4)Led24−1(24−4−4)Led24−2(24−4−4)Led24−3(24−4−4)

Data set (network architecture) (b) 120

BP MRII SBALR

100 80 60 40 20 0

10−bit parity(10−12−1) Balance−Scale(12−5−2) Data set (network architecture) (c)

Fig. 3. Comparison of learning convergence rate of the three algorithms. (a) Monks problem. (b) Led problem. (c) Parity problem and balance problem.

VI. E XPERIMENTAL V ERIFICATION Generally, the learning performance and the generalization performance are used to evaluate the ability of a learning algorithm. The learning performance can be further divided into learning effectiveness and learning efficiency. Due to the discrete characteristic of BFNNs, performance indices that are often used to reflect the performance of the learning algorithms

of CFNNs, such as the MSE, are no longer suitable for the BFNNs learning algorithm. So, in this paper, two indices, the learning success rate and the learning convergence rate, are used to reflect the learning effectiveness of the proposed BFNNs learning algorithm, i.e., the SBALR algorithm. The learning success rate is the percentage of successful training samples for a trained BFNN, whereas the learning convergence rate is the percentage of BFNNs that reach a complete solution (i.e., the training error is zero) under specified iterations among a group of BFNNs participating in training. According to these definitions, a BFNN whose learning success rate is 100% is called a “convergent BFNN.” The learning efficiency mainly describes the time cost of an algorithm in training, and is often reflected by the iteration number and the adaptation number. The learning process in which each training sample is fed into the BFNN once is regarded as an iteration, while the adaptation number is regarded as the total weight adaptations of all neurons during training. Obviously, the adaptation number is more suitable for assessing the time cost of the BFNN learning than the iteration number because not all neurons participate in each iteration. The generalization performance is often reflected by the generalization rate, which shows the percentage of the successful testing samples for a well-trained BFNN. In order to facilitate comparison with the CFNNs trained by the BP algorithm, the performance indices described above are also implemented for multilayer perceptrons (MLPs).

120

BP MRII SBALR

100 80 60 40 20 0

Monks−1(10−3−1)Monks−1(10−4−1)Monks−2(10−2−1)Monks−2(10−3−1)Monks−3(10−2−1)Monks−3(10−3−1)

Generalization rate(%)

Data set(network architecture) (a) 120

BP MRII SBALR

100 80

0.25 0.2 0.15 0.1 0.05 0

2(Xor)

2(And−Xor)

7(Led7)

7(7−bit parity)10(10−bit parity) 10(Monks) 12(Balance−Scale) 24(Led24)

Input dimension (data set)

Fig. 6. Relation between the allowable maximum sensitivity and the input dimension.

60

TABLE II

40

C OMPARISON OF E FFICIENCY B ETWEEN THE SBALR AND MRII

20

( THE B ETTER O NE IS IN B OLD , C ONV. IS THE A BBREVIATION OF C ONVERGENCE )

0

Led7−1(7−4−4) Led7−2(7−4−4) Led7−3(7−4−4) Led24−1(24−4−4) Led24−2(24−4−4) Led24−3(24−4−4)

Data set(network architecture) (b) 120 Generalization rate (%)

Allowable maximum sensitivity

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 3, MARCH 2012

Generalization rate (%)

488

100

BP MRII SBALR

Dataset

Net architecture

Algorithm

Monks-1

10-3-1

MRII SBALR MRII SBALR MRII SBALR MRII SBALR MRII SBALR MRII SBALR MRII SBALR MRII SBALR MRII SBALR MRII SBALR MRII SBALR MRII SBALR

80

10-4-1

60 40

Monks-2

10-2-1

20 0

10-3-1

10−bit parity(10−12−1) Balance−Scale(12−5−2) Data set (network architecture) (c)

Monks-3

10-2-1 10-3-1

Fig. 5. Comparison of the generalization rate of the three algorithms. (a) Monks problem. (b) Led problem. (c) Parity problem and balance problem.

To verify effectiveness of the proposed learning algorithm, a number of experiments are carried out by running the SBALR algorithm, the MRII algorithm, and the momentum gradient descent BP algorithm. In the experiments, the networks including BFNNs and MLPs with a single hidden layer are built for solving some representative problems, such as Monks, Led display, and Balance Scale from the UCI repository [19], as well as one parity problem, which are listed in Table I. In order to guarantee the validity of experimental results, each result comes from the average of 100 runs. In more detail, for each given architecture and each learning algorithm, 100 networks with randomly initial weights are chosen to be trained, and the values of corresponding performance indexes are recorded and averaged to yield the corresponding experimental result. The training precision requirements of all experiments are set to be 100% and the iteration requirements are listed in Table I. The experimental results are shown in Figs. 3–5. As Fig. 3 shows, the advantage of the SBALR algorithm over the MRII and BP algorithms in the learning convergence rate is outstanding. Only for the Led-7 problem, which is relatively simple, the MRII algorithm shows a slightly better performance in the learning convergence rate. Besides, for almost all the problems, the BP algorithm is difficult to reach the training precision requirement set in advance. Especially,

Led7-1

7-4-4

Led7-2

7-4-4

Led7-3

7-4-4

Led24-1

24-4-4

Led24-2

24-4-4

Led24-3

24-4-4

10-bit parity

10-12-1

Balancescale

12-5-2

Iteration to conv. 26 338 30 88 29 15 60 251 95 162 34 72 43 6 18 3 3 1 1383 252 944 66 683 56

Adaptation to conv. 110 1304 171 266 222 335 223 4504 379 646 86 245 1638 154 2122 163 1559 167 121447 7328 285767 7315 293859 7283

MRII

-

-

SBALR

6157

223741

MRII

-

-

SBALR

1783

10834

for the 10-bit parity and balance-scale problems, the learning convergence rate of MRII is zero, and for balance-scale problem, the learning convergence rate of the BP algorithm is also zero. These results demonstrate that the BP algorithm is not good at solving discrete problems and the MII algorithm is not mature enough for solving complicated discrete problems. Similarly, Fig. 4 shows that the SBALR algorithm is better than the MRII and BP algorithms in the learning success rate for almost all the problems with only two exceptions. One is

ZHONG et al.: SENSITIVITY-BASED ADAPTIVE LEARNING RULES FOR BFNNs

489

the 10-bit parity problem using the BP algorithm and the other is the Led-7 problem using the MRII algorithm. As to the generalization performance, it can be seen from Fig. 5 that the generalization rates of the SBALR algorithm for all the problems are almost over 90%. The SBALR algorithm is better in most cases and is almost equivalent in a few cases to the BP algorithm, it is much better than the MRII algorithm for most of the problems except for the Led-7 problem which is a relatively simple problem. In summary, as the experimental results show, the SBALR algorithm has good performance not only in learning performance but also in generalization performance. The experimental results on the learning efficiency of the SBALR and MRII algorithms are listed in Table II, which shows that the SBALR algorithm also has better performance for most of the problems except for the Monks problems. Since all neurons take part in adaptation during training by the BP algorithm, the SBALR algorithm is undoubtedly superior to the BP algorithm in the learning efficiency, which is reflected by the adaptation number. Moreover, the SBALR algorithm also has another good property: that is, fewer learning parameters need to be set in advance than for the MRII algorithm. For the SBALR algorithm, the allowable maximum network sensitivity max_snet is the only parameter to be set, while the MRII algorithm has as many as three parameters. Further, experiments have shown that max_snet is, to some extent, in inverse proportion to the input dimension of the BFNN in training, and is almost independent of the domain knowledge of a specific learning task, which makes the SBALR algorithm more convenient to solve discrete problems. Fig. 6 presents the setting of max_snet in our experiments.

balance among benefiting learning, keeping away from output disturbance and avoiding overburden and then performs weight adaptation by maintaining another balance between reducing output error and lowering output disturbance. The SBALR algorithm could more accurately locate the neurons in real need of adaptation, properly determine the weight adaptation, and evenly allocate learning burdens among all neurons of the BFNN in training. Some verification experiments were given which demonstrated that The SBALR algorithm has better learning performance than the MRII and BP algorithms. The adaptive learning mechanism has been studied for many years for neural networks, and a typical representative is the most popular BP algorithm for MLPs. However, the mechanism has not been perfectly developed even now. For example, the BP algorithm adapts all neurons’ weights by means of gradient descent technique instead of selecting only the most needed neurons’ weights and this is the cause for easily falling into a local minimum, in addition to that, the weight adaptation rate in the BP algorithm is not easy to be determined to avoid oscillations. In fact, the BP algorithm is just a special case of the adaptive learning mechanism, which is only suitable for CFNNs with a differentiable activation function. In our study, we tried to develop a more effective and general adaptive learning algorithm that could avoid the situations of falling into a local minimum and relax extra restrictions on feedforward neural networks. Although our proposed SBALR algorithm has greatly improved learning performance for the most problems in the experiments, especially for complicated ones, it still shows some weakness in the case of some simple problems. In our future work, we will continue to perfect the adaptive learning mechanism by searching for essential learning rules as well as proper techniques for both DFNNs and CFNNs.

VII. C ONCLUSION

A PPENDIX A D ERIVATION OF THE C ONSTRAINT ON PARAMETER β FOR M EETING THE B ENEFIT P RINCIPLE

Theoretically, an adaptive learning mechanism in each of its adaptations needs to know which parameter should be adapted and how to adapt so as to learn as much as possible and in the meantime forget as little as possible. In this paper, we presented our recent work on the exploration of the adaptive learning mechanism of BFNNs. This paper first abstracted the three conceptual learning principles from the commonly accepted behaviors of an adaptive learning mechanism. They are the benefit principle, which demands that each adaptation should at least be more or less and directly or indirectly beneficial to learning, the minimal disturbance principle, a refinement of Widrows’s work [4], [13], which demands that the adaptation to meet the current training sample should disturb as little as possible those established by previous training samples, and the burden-sharing principle, which demands that learning burden should be allocated evenly to each neuron participating in training. Then, on the basis of the principles and the sensitivity measure of BFNNs, a set of concrete adaptive learning rules for BFNNs were proposed, i.e., the neuron selection rule, the weight adaptation rule, and the learning control rule. Finally, an adaptive learning algorithm, i.e., the SBALR algorithm, under the guidance of the rules was presented, which endeavors to select neuron by maintaining a

Let u = X T W and u  = X T W  . Premultiply both sides of (17) by X T , and we have u  = u + dβ(n + 1)  u + β(n + 1) =d d = d[β(n + 1) − |u|]. To inverse the neuron’s must satisfy the following  u ≥ 0 u < 0

(∵ d = −1 or 1, ud ≤ 0)

output, it is obvious that inequality: for d = +1 for d = −1.

By substituting (A.1) for u  in (A.2), we have  β(n + 1) − |u| ≥ 0 f or d = +1 β(n + 1) − |u| > 0 f or d = −1.

(A.1) u

and d

(A.2)

(A.3)

Apparently, the following inequality can always meet (A.3), and make sure that the neuron’s output is inversed whether d is 1 or −1 |u| β(n + 1) − |u| > 0 ⇒ β > . (A.4) n+1

490

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 3, MARCH 2012

In order to facilitate the discussion, let β ≥ |u|/n. As β ≥ |u|/n > |u|/(n + 1), β ≥ |u|/n can ensure the inversion of the neuron’s output such that the weight adaptation rule (17) can meet the middle level of the benefit principle. For neurons in different layers, β ≥ |u|/n can further be extended and denoted as ⎧ 1 T 1 . ⎪ ⎨ |(Wi )1 X | = a1 for a hidden − layer neuron n β≥ 2 T 2 ⎪ . ⎩ |(W j ) X | = a2 for an output − layer neuron n2 (A.5) where i (1 ≤ i ≤ n 1 ) indicates the i th hidden-layer neuron, and j (1 ≤ j ≤ n 2 ) indicates the j th output-layer neuron. A PPENDIX B D ERIVATION OF THE C ONSTRAINT ON PARAMETER β FOR M EETING THE M INIMAL D ISTURBANCE P RINCIPLE Considering the computation difference of sensitivity between the hidden layer and the output layer, the constraint on β for a neuron in different layers will be separately discussed as follows. 1) For the i th (1 ≤ i ≤ n 1 ) neuron in hidden layer. In [5] and [18], it was shown that the constraint of (5) can be satisfied when the number of the varied input elements is less than half of overall input elements. Besides, in order to follow the minimal disturbance principle, max_snet is always set a rather small real number (generally less than 0.1), and it will also decrease as the BFNN input dimension increase. Zhong [18] has shown that max_snet is set so small that the constraint |W | |W | in (3) always can be met (experience shows that |W |/|W | ≤ 30% is enough to meet the constraint). Therefore, we can obtain the following expression by (15):  2|Wi1 |max 1 2 1 1 2 2 max_s j = max_si (Wi ) s j (X j ) ≈ n1 + 1 π 2 |Wi1 | 1 ⇒ |Wi1 | ≤ |Wi1 |max = π 2 |Wi1 | n 1 +1 max _s 2j . 2 (B.1) By substituting (19) for max_s 2j in (B.1), we have 1 |Wi1 | ≤ π 2 |Wi1 | n 1 + 1 max_snet . 2 By (17), we have |Wi1 | = |β X| = β n 0 + 1.

(B.2)

(B.3)

By substituting (B.3) for |Wi1 | in (B.2), we can have the following expression:  π 2 |Wi1 | n 1 + 1 max_snet . (B.4) β≤ 2 n0 + 1 2) For the j th (1 ≤ j ≤ n 2 ) neuron in output layer. Similar to 1), the following formula can be derived from (3): max_s 2j =

|W 2j |max π|W 2j |

⇒ |W 2j | ≤ |W 2j |max = π|W 2j | max _s 2j . (B.5)

By (19), (B.5) can be further expressed as |W 2j | ≤ π|W 2j |max_snet .

(B.6)

Besides, by (17), we have

|W 2j | = |β X| = β n 1 + 1.

(B.7)

By substituting (B.7) for|W 2j | in (B.6), we can obtain (B.8) as π|W 2j | (B.8) max_snet . β≤√ n1 + 1 Combining (B.4) and (B.8), we can finally have the constraint on β for meeting the minimal disturbance principle as  ⎧ 2 |W 1 | ⎪ π n1 + 1 ⎪ . i ⎪ ⎪ max_snet = b1 ⎪ 0+1 ⎪ 2 n ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ . . . . . . . . . for a hidden − layer neuron β≤ (B.9) ⎪ ⎪ 2| ⎪ π|W ⎪ . j ⎪ ⎪ ⎪ ⎪ √n 1 + 1 max_snet = b2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ . . . . . . . . . for an output − layer neuron. R EFERENCES [1] C. Zhang, J. Yang, and W. Wu, “Binary higher order neural networks for realizing Boolean functions,” IEEE Trans. Neural Netw., vol. 22, no. 5, pp. 701–713, May 2011. [2] I. Aizenberg, “Periodic activation function and a modified learning algorithm for the multivalued neuron,” IEEE Trans. Neural Netw., vol. 21, no. 12, pp. 1939–1949, Dec. 2010. [3] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533–536, Oct. 1986. [4] R. Winter and B. Widrow, “MADALINE rule II: A training algorithm for neural networks,” in Proc. IEEE Int. Conf. Neural Netw., vol. 1. San Diego, CA, Jul. 1988, pp. 401–408. [5] R. Winter, “Madaline rule II: A new method for training networks for Adalines,” Ph.D. dissertation, Dept. Electr. Eng., Stanford Univ., Stanford, CA, 1989. [6] W. C. Ridgway, “An adaptive logic system with generalizing properties,” Stanford Electronics Laboratories, Standford, CA, Tech. Rep. 1557-1, 1962. [7] C. H. Mays, “Adaptive threshold logic,” Ph.D. thesis, Stanford Electronics Laboratories, Stanford, CA, Tech. Rep. 1556-1, 1963. [8] F. Rosenblatt, “On the convergence of reinforcement procedures in simple perceptrons,” Cornell Aeronautical Laboratory, Buffalo, NY, Tech, Rep. VG-1796-G-4, 1960. [9] M. Mezard and J. P. Nadal, “Learning in feedforward layered networks: The tiling algorithm,” J. Phys. A: Math. General, vol. 22, no. 12, pp. 2191–2203, 1989. [10] M. Frean, “The upstart algorithm: A method for constructing and training feedforward neural networks,” Neural Comput., vol. 2, no. 2, pp. 198–209, 1990. [11] S. A. J. Keibek, G. T. Barkema, H. M. A. Andree, M. H. F. Savenije, and A. Taal, “A fast partitioning algorithm and a comparison of binary feedforward neural networks,” Europhys. Lett., vol. 18, no. 6, pp. 555– 559, 1992. [12] J. H. Kim and S. K. Park, “The geometrical learning of binary neural networks,” IEEE Trans. Neural Netw., vol. 6, no. 1, pp. 237–246, Jan. 1995. [13] B. Widrow and M. A. Lehr, “30 years of adaptive neural networks: Perceptron, madaline, and backpropagation,” Proc. IEEE, vol. 78, no. 9, pp. 1415–1442, Sep. 1990. [14] S. Yang, C. Ho, and S. Siu, “Computing and analyzing the sensitivity of MLP due to the errors of the i.i.d. inputs and weights based on CLT,” IEEE Trans. Neural Netw., vol. 21, no. 12, pp. 1882–1891, Dec. 2010.

ZHONG et al.: SENSITIVITY-BASED ADAPTIVE LEARNING RULES FOR BFNNs

491

[15] S. M. Baek and J. W. Park, “Hessian matrix estimation in hybrid systems based on an embedded FFNN,” IEEE Trans. Neural Netw., vol. 21, no. 10, pp. 1533–1542, Oct. 2010. [16] X. Zeng, Y. Wang, and K. Zhang, “Computation of Adalines’ sensitivity to weight perturbation,” IEEE Trans. Neural Netw., vol. 17, no. 2, pp. 515–519, Mar. 2006. [17] Y. Wang, X. Zeng, D. S. Yeung, and Z. Peng, “Computation of Madalines’ sensitivity to input and weight perturbations,” Neural Comput., vol. 18, no. 11, pp. 2854–2877, 2006. [18] S. Zhong, X. Zeng, H. Liu, and Y. Xu, “Approximate computation of Madaline sensitivity based on discrete stochastic technique,” Sci. China Inf. Sci., vol. 53, no. 12, pp. 2399–2414, 2010. [19] UCI Machine Learning Repository [Online]. Available: http://www.ics. uci.edu/∼mlearn/MLRepository.html

Shengli Wu received the Ph.D. degree from the Department of Computer Science and Engineering, Southeast University, Nanjing, China, in 1996. He is currently a Lecturer with the University of Ulster, Jordanstown, U.K. His current research interests include database and information systems, information retrieval, data mining, and machine learning.

Shuiming Zhong received the M.S. and Ph.D. degrees from Hohai University, Nanjing, China, in 2007 and 2011, respectively. He is currently a Lecturer with the School of Computer Science and Software, Nanjing University of Information Science and Technology, Nanjing, China. His current research interests include artificial neural networks, machine learning, and pattern recognition.

Xiaoqin Zeng received the B.S. degree from Nanjing University, Nanjing, China, the M.S. degree from Southeast University, Nanjing, and the Ph.D. degree from Hong Kong Polytechnic University, Kowloon, Hong Kong, all in computer science. He is currently a Professor, a Ph.D. student Supervisor, and the Director of the Institute of Intelligence Science and Technology, Hohai University, Nanjing. His current research interests include machine learning, neural networks, pattern recognition, and graph grammar. Prof. Zeng is an Associate Editor of the IEEE T RANSACTIONS ON S YS TEMS , M AN , AND C YBERNETICS —PART B. He is the Principal Investigator of several research projects sponsored by the Natural Science Foundation of China.

Lixin Han received the Ph.D. degree in computer science from Nanjing University, Nanjing, China. He has been a Post-Doctoral Fellow with the Department of Mathematics, Nanjing University, and a Research Fellow with the Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong. He is currently a Professor with the Institute of Intelligence Science and Technology, Hohai University, Nanjing, China. He has published over 30 research papers. Prof. Han is an Invited Reviewer for several renowned journals and has been a Program Committee Member of many international conferences. He is listed in Marquis’ Who’s Who in the World and Marquis’ Who’s Who in Science and Engineering.

Sensitivity-based adaptive learning rules for binary feedforward neural networks.

This paper proposes a set of adaptive learning rules for binary feedforward neural networks (BFNNs) by means of the sensitivity measure that is establ...
566KB Sizes 0 Downloads 0 Views