Accepted Manuscript A new class of multi-stable neural networks: Stability analysis and learning process E. Bavafaye Haghighi, G. Palm, M. Rahmati, M.J. Yazdanpanah PII: DOI: Reference:

S0893-6080(15)00034-9 http://dx.doi.org/10.1016/j.neunet.2015.01.010 NN 3443

To appear in:

Neural Networks

Received date: 7 August 2014 Revised date: 22 December 2014 Accepted date: 12 January 2015 Please cite this article as: Bavafaye Haghighi, E., Palm, G., Rahmati, M., & Yazdanpanah, M. J. A new class of multi-stable neural networks: Stability analysis and learning process. Neural Networks (2015), http://dx.doi.org/10.1016/j.neunet.2015.01.010 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

A new class of multi-stable neural networks: stability analysis and learning

process E. Bavafaye Haghighi1,2,*, G. Palm1, M. Rahmati2 and M. J. Yazdanpanah3 1

Institute of Neural Information Processing, Ulm University, Ulm, Germany Computer Engineering & Information Technology Department, Amirkabir University of Technology, Tehran, Iran 3 Control & Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran * Corresponding Author: E-mail: [email protected], [email protected] 2

Abstract: Recently, multi-stable Neural Networks (NN) with exponential number of attractors have been presented and analyzed theoretically; however, the learning process of the parameters of these systems while considering stability conditions and specifications of real world problems has not been studied. In this paper, a new class of multi-stable NNs using sinusoidal dynamics with exponential number of attractors is introduced. The sufficient conditions for multi-stability of the proposed system is posed using Lyapunov theorem. In comparison to the other methods in this class of multi-stable NNs, the proposed method is used as a classifier by applying a learning process with respect to the topological information of data and conditions of Lyapunov multi-stability. The proposed NN is applied on both synthetic and real world datasets with an accuracy comparable to classical classifiers. Keywords: multi-stable neural network; exponential number of attractors; Lyapunov stability; classification, sinusoidal dynamic.

1. Introduction Dynamical Neural Networks (NN) are powerful methods which are applied in system identification and modeling nonlinear dynamical systems (Nørgaard, 2000; Janczak, 2005; Liu, 2001), classification of time series (Ao, 2010; Hu & Hwang, 2010) and pattern recognition using memories (Zurada, 1992; Perfetti & Ricci, 2008; Wang et al., 1990; Shen & Cruz, 2005; Chartier & Boukadoum, 2006; Sudo et al., 2009). Within neurobiological aspect, these methods are important to emulate and explain different biological behaviors includes information storage and recall (Zeng and Zheng, 2012). In real world recognition problems, by means of neural networks, in which the number of classes is high (e.g. in data mining (Lin et al., 2008), object recognition (Sonka et al. 2014), etc.), a new framework may be useful or even necessary. Recently, multi-stable neural networks with exponential number of attractors have received considerable attention. By applying one step piecewise linear activation function in n-neuron dynamical NN, coexistence of locally exponentially stable equilibrium points under studied conditions is shown (Zeng et al., 2004; Zeng & Wang, 2006). In (Lili et al., 2010), a class of r-level piecewise linear nondecreasing activation functions is used to increase the storage capacity of multi-stable NNs. The storage capacity of dynamical NNs increases in (Zeng & Zheng, 2012) by applying time-varying delays as well as activation functions with concave-convex characteristics. After proposing the concept of -stability which concerns NNs with unbounded time-varying delays (Chen & Wang, 2007a; Chen & Wang, 2007b), multiple -stable NNs with unbounded time-varying delays is presented in (Wang & Chen, 2014). Other studies on the subject of multi-stable NNs include, but not limited to, bidirectional associative memories with exponential number of attractors (Du & Xu, 2014), analyzing the effect of disturbed delays (Nie & Cao, 2009), conditions of existence of limit cycle (Chenga et al., 2007), applying different types of activation functions includes Mexican-hat-type (Nie et al., 2014) and real-imaginary-type (Huang et al., 2014).

1

Although these NNs are proposed for multi-objective optimal control and memories with exponential storage capacity, they are not applied for these problems in practice. In order to use these methods, it is necessary to learn their parameters by considering stability conditions and the specification of the problem. In the case of the NNs with time-varying delays, proper adjustment of the delays to avoid chaotic behavior makes the learning process more challenging. In this paper, a new framework for multi-stable dynamical NNs is presented which guarantees exponential number of attractors as well as learning ability to apply the proposed NN in pattern recognition tasks. Since sine function is applied to provide the dynamic of this NN, it is called Sinusoidal Dynamic Neural Network (SDNN). The stability of this system is studied using Lyapunov theorem. In order to apply SDNN in pattern recognition, topological information of data and stability conditions of the proposed framework are considered within two steps of the learning process. In the first step of learning SDNN, attractors and domain of attraction are estimated for state vectors with independent variables. The advantage of this step is that samples are mapped to hyper-cubic domain of attraction such that similar patterns are mapped to the same domain or adjacent ones. However, these domains of attraction are not flexible to consider the details of distribution of samples. By applying interdependent variables and nonlinear border of domain of attraction, which is the next step of the learning algorithm, flexibility of the domains of attraction and the accuracy of classification using SDNN increases. In this step, parameters of SDNN are adjusted finely with respect to the correlation analysis of the initial states of samples. Parameter tuning guarantees asymptotical multi-stability of the system using Lyapunov theorem (Slotine & Li, 1991; Khalil, 1996). The proposed framework of SDNN provides exponential number of attractors as well as learning ability. The arrangement of the remaining sections is as follows: In Section 2, a review on multi-stable NNs with exponential number of attractors is made. Our proposed framework of SDNN is presented in Section 3. Stability analysis using Lyapunov theorem is discussed in Section 4. The learning process by considering topological information of data and stability conditions is presented in Section 5. Sections 6 and 7 include experimental results and conclusion and future works, respectively.

2. Multi-stable neural networks with exponential number of attractors “Multi-stability” of NNs which is referred to the coexistence of multiple equilibrium points by considering their local stability (Wang & Chen, 2014), have different advantages in multi-objective optimal control and pattern recognition (Zeng & Zheng, 2012). Recently, Multi-stable NNs with exponential number of attractors have studied extensively. In (Zeng et al., 2004; Zeng & Wang, 2006), by applying one step piecewise linear activation function in n-neuron dynamical NN, coexistence of locally exponentially stable equilibrium points is derived and the state space is partitioned to subspaces. In order to increase the storage capacity of multi-stable NNs, Wang, Lu and Chen introduce a class of r-level piecewise linear nondecreasing activation functions (Lili et al., 2010). The applied dynamical system is a simplified version of Cohen-Grossberg NN which is presented as follows: (1) where represents the ith element of n dimensional state vector (i.e. ith neuron); determines the exponential rate by which will reset by isolating it from the network and corresponding external input; is the th th connection weight between j and i neurons; stands for the activation function and is the external input. Fig. 1 illustrates the r-level piecewise linear nondecreasing activation function schematically (Lili et al., 2010). It is shown in (Lili et al., 2010) that by applying r-level activation function, a dynamical NN with n neurons has 2

equilibria. The conditions of stability are analyzed using inequality techniques which guarantees that the system has locally exponentially stable equilibrium points. In (Chen & Wang, 2007a; Chen & Wang, 2007b), the concept of -stability which concerns NNs with unbounded time-varying delays, is presented. Existence, uniqueness and global stability of the equilibrium point under the analyzed conditions is studied in this version of -stable NN. It is shown in (Wang & Chen, 2014) that -stable NN has more than one equilibrium point under special conditions. The general framework of multiple -stable NNs is as follows: (2) where , , , and are defined similar to (1). The is the connection weight between and at time and is the time delay. In (Zeng & Zheng, 2012), multi-stable NNs are presented which apply time-varying delays as well as activation functions with concave-convex characteristics. By considering r and p as arbitrary natural numbers, this NN has potentially locally stable equilibrium points or locally attractive periodic states. Therefore, the storage capacity of this NN increases significantly in comparison to the previous ones. The stability conditions of this network are studied using inequality techniques. Muti-objective control and memories are suggested applications of the multi-stable NN with time-varying delays and concaveconvex characteristics (Zeng & Zheng, 2012).

Fig. 1. The piecewise linear nondecreasing activation function

, which is applied in (Lili et al., 2010), is illustrated schematically.

Multiple -stable NNs with unbounded time-varying delays is studied in (Wang & Chen, 2014) where it is shown that n-neuron NN has -stable equilibrium points. Direct applications of the presented theorems of -stable NNs are summarized as follows: getting criteria on the multiple exponential stability, multiple power stability, multiple log-stability, multiple log-log-stability, etc. Additionally, these advantages can be extended to the r-level activation functions (Wang & Chen, 2014). Some of the other studies about multi-stable NNs include, analysis the effect of disturbed delays (Nie & Cao, 2009), conditions for existence of limit cycle (Chenga et al., 2007), applying Mexican-hat-type (Nie et al., 2014) and real-imaginary-type activation functions (Huang et al., 2014), Bidirectional Associative Memories with exponential number of attractors (Du & Xu, 2014), etc. By considering studies on the NNs with exponential number of attractors (e.g. Du & Xu, 2014; Nie et al., 2014; Wang & Chen, 2014; Zeng & Zheng, 2012; Lili et al., 2010), it is revealed that these networks are well analyzed within different theoretical aspects; however, they are not applied in the real world applications including pattern recognition or multi-objective optimal control. Learning the parameters of these systems by considering stability conditions as well as the specifications of the real world problems has not been studied. In the case of dynamical 3

NNs with time varying delays, proper adjustment of the delays to avoid chaotic behavior makes the learning process more challenging. In this paper, by proposing a new framework for NNs with exponential number of attractors, it is possible to learn the parameters of the system by considering stability conditions and topological information of data to apply it in pattern recognition tasks.

3. Problem Formulation and Preliminaries A new framework for dynamical neural networks to support exponential number of attractors is presented in this section. The first order system of the proposed framework is presented as follows: (3) where is the frequency of the dynamics and presents the state of the system (important notations are summarized in Table 1). By solving the equation , it is clear that the system has unlimited number of 1 equilibrium points which are represented using (4) (Slotine & Li, 1991; Khalil, 1996; Strogatz, 1994; Kuznetsov, 1998): (4) K

,

NI

Table 1. Important notations Number of classes. Dimensionality of dataset. Order of SDNN ( ). is ith training sample ( l) or test sample (l + 1 ). The label of . ; the state vector of SDNN. ; the equilibrium point and the stable equilibrium point (attractor) , respectively, of SDNN. ; frequency of dynamic of kth variable ( ). ; diagonal matrix of frequencies of dynamics ( ). ; orthonormal rotation matrix of SDNN. ; the coefficient matrix of SDNN. kth row of matrix . Lyapunov function. : ; determines the initial state of samples in the state space. ; determines the selected features applied in . The number of variables applied in . ; representation of using features determined by . includes training samples such that ith row of is . are the assigned primary code and the attractor of , respectively. ; the ith element of and are and , respectively. ; is a vector to define to map to the attractors and is its estimation. is the estimated vector to define to map to the primary codes. Error of the mappings and , respectively. Half of the optimal width of domain of attraction of kth sub-system of SDNN-H. is a free parameter which is applied for fine tuning of . Number of distinct primary codes or attractors defined for each sub-system of SDNN-H.

Fig. 2 illustrates the phase trajectory (Slotine & Li, 1991; Khalil, 1996; Strogatz, 1994; Kuznetsov, 1998) of (3) for and . A box, drawn by dashed lines, presents the domain of attraction of . Because of , increases in time in (0, 1). Therefore, there is a stable flow from 0 to 1. In contrast, decreases in time in (1, 2) because of . According to the stable flows around , it is one of the stable equilibrium points of 1

Theoretically, there is no limit on the number of equilibrium points. However, representation of real numbers in digitized computers is limited by considering predefined number of bits which bounds the number of equilibrium points.

4

the demonstrated system. There are similar arguments for stability/instability of other equilibrium points using stable/unstable flows around them. In Fig. 2, stable and unstable equilibrium points are shown using solid black circles and open circles, respectively. Similarly, they are associated with odd and even values of q (see (4)). In this model, beside each unstable equilibrium point, there are two attractors. As a result, the proposed framework guarantees reaching to an attractor regardless to the initial state of the system. Since the width of domain of attraction in (3) equals to , low frequencies enlarges domain of attraction.

Fig. 2. Phase trajectory of for and . Stable and unstable equilibrium points are shown using solid black circles and open circles, respectively. A box, drawn by dashed lines, presents the domain of attraction around one of attractors ( ). For all of initial states in this domain, there are stable trajectories which converge to .

The multidimensional state equation of (3) with independent variables is presented as follows:

(5)

In (5), is the frequency by which the dynamic of kth variables of the state vector is determined. By considering the independency of the variables of (5), it includes independent sub-systems. The stability of each sub-system can be analyzed separately from others and the phase trajectory of each ( ) is similar to (3) (Fig. 2). As a result, the stability of the equilibrium point is related to the stability of the subsystems. Equilibrium point is stable (or unstable) when all s ( ) are stable (or unstable) or equivalently, solutions of all sub-systems are associated with odd (or even) s (see (4)). In the case of saddle points, there are stable flows, as well as, unstable ones (i.e., includes both stable and unstable s) (Slotine & Li, 1991; Khalil, 1996; Strogatz, 1994; Kuznetsov, 1998). In Fig. 3.a, stable, unstable and saddle points of a two dimensional system with are represented using solid black, open and black-white circles, respectively. The stable flows of saddle points are indicated using the dark parts of them. Fig. 3.b presents the domain of attraction and trajectories around stable equilibrium point . For and or , is zero and follows a stable trajectory until reaching [4,2]. As a result, the upper border of the domain of attraction is a straight line. In general, the independency of the variables of (5) defines Hyper-cubic domain of attractions and, consequently, this model is referred as SDNN-H. 5

Applying independent variables in (5) limits domain of attractions to a hyper-cubic form. A realistic-enhanced version of (5) where dependency of variables is considered is presented as follows:

(6)

In (6), s( ) are coefficients by which the dynamic of the kth variable of is related to the dynamics of the other variables. The border of domains of attraction of (6) is a curve. In Fig. 4, Phase portrait of two dimensional SDNN which with interdependent variables is demonstrated. The curved border of the domain of attraction is recognizable in this figure.

(a) (b) Fig. 3. (a) Stable, unstable and saddle points of a two dimensional SDNN-H are represented using solid black, open and black-white circles, respectively. (b) Domain of attraction and trajectories around . The borders of the domain of attraction are straight lines in SDNN-H.

Fig. 4. Phase portrait of two dimensional SDNN with interdependent variables. The border of the domain of attraction is obviously being curved using (6).

To determine the coefficient matrix the diagonal matrix are applied where

in this paper, the orthonormal rotation matrix

and

(7)

6

The elements on the main diagonal of are s (see (5)) and the rotation matrix is calculated by considering correlation analysis of patterns in the state space. Details about parameter tuning of SDNN is given in Section 5. The rotated equation of (6) is presented in compact form within the following definition. Definition 1: The applied nonlinear dynamical system in SDNN which provides exponential number of attractors is defined as: (8) where

is the kth row of .

By setting as the identity matrix in (7) and (8), it is obvious that SDNN-H is a special case of SDNN. Stability conditions of (8) is analyzed in Section 4.

4. Stability Analysis Stability analysis of SDNN using Lyapunov theorem (Slotine & Li, 1991; Khalil, 1996) is another advantages of the proposed system. The local qualitative behavior of (8) near equilibrium point ( ) is studied using the linearization technique (Slotine & Li, 1991; Khalil, 1996; Strogatz, 1994; Kuznetsov, 1998; Terman, 2005). Equilibrium points are determined by solving where is dimensional vector of zeros or by solving . Since is assumed as an orthonormal, and consequently a full rank matrix, this result is equivalent to solving equations for all [29] or solving the following equations: (9) For each set of values of s, is computed by solving (9). Since and are full rank matrices, (9) is a set of independent linear equations and each set of s is related to a unique solution (Meyer, 2000). After determining for a special set of s, the neighbor equilibrium points around are determined by modifying some of s to . Lemma 1: Equilibrium points which satisfy (10), (11) or (12) are stable, unstable or saddle points, respectively, if matrix is Positive Definite (PD). Here condition (12) is meant to exclude (10) and (11). (10) (11) (12) Proof: Using linearization technique with respect to is presented as follows:

, the Jacobian matrix (Slotine & Li, 1991; Khalil, 1996) at

(13)

By applying (10),

is given as:

7

(14)

Since is an orthonormal matrix, by assuming to be PD, Jacobian matrix of (14) is Negative Definite (ND) and (i.e., in this case) is stable in a continuous and limited local domain around it. In contrast, when (11) is satisfied. Because of applying a PD Jacobian matrix, is an unstable equilibrium point. When is indefinite and (12) is satisfied, will be a saddle point. ■ By using Lemma 1, qualitative local stability of equilibrium points are analyzed (Slotine & Li, 1991; Khalil, 1996). In the next step, the stability of the system is proved within extended domains around attractors. It is clear that (10) is satisfied by applying odd values for all of s ( ) of (9). For a stable with a set of odd s, corresponding unstable neighbors are determined by modifying all of s to . As a result, there are unstable points in the neighborhood of . In the case of neighbor saddle points, j ( ) variables of s can be modified to

. As a result, there are

adjacent saddle points around

. By determining all of

the equilibrium points in the neighborhood of , it is concluded that each stable point is surrounded by only saddle and unstable points2. Additionally, with respect to Lemma 1, there are stable trajectories in a continuous domain around attractor . These trajectories, in their complete form, have tendency and pushes to/from adjacent saddle and unstable points before converging to the attractor. As a result, they form a continuous neighborhood around which is surrounded by (however not includes) a set of saddle and unstable points. Definition 2: The continuous neighborhood D around stable is a set which includes all trajectories that converge to or, equivalently, it is the domain of attraction of (Khalil, 1996). When the initial state of (8) is an equilibrium point, the state of the system has no further changes. Therefore, saddle and unstable points are not included in the trajectories converging to the stable . Theorem 1: By considering as a stable point of (8), there is a Lyapunov function asymptotical stability of in the neighborhood D. Proof: Let

which guarantees local

be a continuously differentiable function defined as follows: (15)

To prove Lyapunov stability using

, it is necessary to show that:

i) ii)

.

in in

and .

The latter condition is necessary to prove asymptotical stability. In the case of condition (i), positivity of is satisfied using the following inequalities:

in

; ; . 2

(16)

It is worth to note that the neighbor attractors of are determined by modifying all s to . Therefore, neighbor attractors are out of the surrounding unstable neighbors. Additionally, since is strictly PD, no strange attractor is allowed which behaves chaotically and unpredictably (Strogatz, 1994; Kuznetsov, 1998; Terman, 2005).

8

For none of , it is possible that . If exists such that ( ), must be another attractor in D; however, it is explained before that D is defined by using trajectories converging to . In order to complete the arguments for condition (i), it is known that .

(

). Therefore,

In order to fulfill condition (ii), it is clear that: .

(17)

. In (17), is partial derivative of with respect to Definition 1, (7), (17) and (18), is represented as:

(18)

and

. By considering

. By assuming as a PD matrix, is satisfied for which guarantees asymptotical stability of in D.

. Therefore,

(19) is a Lyapunov function ■

Corollary 1: It is clear that for all (except saddle and unstable points), the conditions (i) and (ii) of Theorem 1 are satisfied. As a result, getting to a stable is feasible for almost all . Additionally, when the initial state of the system is an unstable or saddle point, a disturbance can correct that state to be included in a stable trajectory. By considering stability conditions and topological information of dataset, parameters of SDNN is estimated to be applied as a classifier (Section 5).

5. Learning parameters of SDNN The learning process of SDNN is presented in Algorithm 1. In the first step of the algorithm, SDNN-H which has Hyper-cubic domain of attraction and independent variables is estimated. In the second step, fine parameter tuning is performed to learn SDNN (Definition 1). First step of Algorithm 1 includes learning a topology preserving mapping from feature space to the state space. An effective representation of data in the state space increases the accuracy of SDNN. Therefore, the mapping function : (Bavafaye Haghighi & Rahmati, 2013; Bavafaye Haghighi & Rahmati, 2012) is estimated which determines the initial state of samples in the state space (i.e., ). Definition 3: Mapping to m-dimensional domain of attraction is defined using

such that:

[30-31]

(20)

In (20), represents the inner product of two vectors and n and L are the dimensionality and total number of samples, respectively. When , it indicates that the jth element of the sample is selected to be applied in the sub-mapping . The notation refers to the representation of with respect to . The total number of 9

variables of are applied in

applied in

is denoted by

. The condition

ensures that all variables

.

In order to determine s, a feature selection process is accomplished. By considering (5), each sub-system of SDNN-H is estimated independent from the others. As a result, step 1 can take the advantages of parallel computing (El-Rewini & Abd-El-Barr, 2005). This step includes determining a set of primary codes by which mapping is enriched using topological information of data. The width of domain of attraction of a sub-system is estimated using a multi-objective cost function (Bavafaye Haghighi & Rahmati, 2013; Bavafaye Haghighi & Rahmati, 2012) which considers theorems of numerical analysis (Stoer & Bulirsch, 2002; Heath, 1997) and generalization ability of (Xu, 2007). By using the estimated domain of attraction and primary codes of kth subsystem, the attractors of SDNN-H are determined. At the next step, is estimated to map to the state space. In step 2, the border of hyper-cubic domain of attractions of SDNN-H are being rotated and curved by learning the matrices and . The matrix is determined using correlation analysis of samples in the state space and is estimated with respect to the frequency of dynamics of SDNN-H and matrix . Each attractor of SDNN is labelled with respect to the label of accepted samples by it (Step 3). Since similar samples are mapped to the same domain of attraction or adjacent ones, unlabelled attractors are labelled using the labelled adjacencies (Step 4). Details of Algorithm 1 are presented throughout this section. Algorithm 1: Learning SDNN Input: ; NI; , l). Output: ( ); ; ; ; ; labeled attractors of SDNN. 1. Learn SDNN-H 1.1. Apply Algorithm 2 to select features. 1.2. For kth sub-system of SDNN-H do ( ): 1.2.1. Assign primary codes. 1.2.2. Estimated domain of attraction. 1.2.3. Determine attractors. 1.2.4. Estimate . 2. Determine fine parameters of SDNN. 2.1. Represent samples in m-dimensional state space using . 2.2. Decompose correlation matrix of m-dimensional samples. 2.3. Determine coefficient matrix . 3. With respect to the accepted samples of each attractor, assign frequent label to it. 4. Assign label to the unlabelled attractors with respect to their labelled adjacencies.

5.1. Feature Selection In order to determine the binary vectors , a feature selection algorithm (Algorithm 2) is applied using Principal Component Analysis (PCA) (Izenman, 2008; Jolliffe, 2002). In Algorithm 2, the effectiveness of variables in principle components are determined by sorting the elements of an eigenvector in increasing order. The first fine variables which include more details are selected from major eigenvectors. With respect to the effect of variable scales on the principle components, applying PCA on the centered-normalized version of variables (Izenman, 2008) is more efficient.

10

Algorithm 2: Feature Selection Algorithm Input: ; centered-normalized training data. Output: ( ). 1. Apply PCA on the centered-normalized training data. 2. Select the first major eigenvectors. 3. For : 3.1. Select the first fine variables of kth eigenvector for 3.2. Set for the selected variables. 4. Set for the residual features for .

.

5.2. Primary Code Assignment By applying a topology preserving mapping, it is expected that close samples have the same attractor or adjacent ones. In the case of SDNN, topological information of a dataset is learned with respect to the most informative directions of data. In this way, a set of primary codes are defined on the major principle component of s ( , ) (Bavafaye Haghighi & Rahmati, 2012). By applying the width of domain of attraction and the primary codes, attractors in the kth sub-system are determined and is estimated by considering topological information of dataset (Algorithm 1, steps 1.2.3 and 1.2.4). In order to determine a proper set of primary codes for the kth sub-system, the major informative direction of s is applied. The probability of overlapped projection on the major principle component for samples which belong to the same class is reduced (Izenman, 2008; Jolliffe, 2002). As a result, in order to find primary codes for a subsystem using PCA, the projection domain on the major principle component is divided into NI number of equal parts. The centre of these intervals are numbered sequentially to determine the set of primary codes (see Fig. 5 with NI=3). Samples which are projected onto the same interval are assigned to the same primary code. Consequently, the vector is defined such that its ith element is the primary code of .

Fig. 5. By projecting samples on the major informative directions of a dataset, the probability of overlapped projection for different classes is reduced. The set of primary codes is determined by dividing the projection domain to NI number of equal parts (Bavafaye Haghighi & Rahmati, 2012).

Primary codes are transferred to the attractors of kth sub-system of SDNN-H which makes the proposed method a topology preserving dynamical neural network. In order to transfer primary codes to the attractors, domain of attraction of each sub-system is estimated which is explained in the next subsection. 5.3. Optimal Domain of Attraction Referring to the explanations presented in Section 3, the width of domain of attraction in kth sub-system ( ) is equal to and half of this value is denoted as . It is shown in (Bavafaye Haghighi & Rahmati, 2013) that estimating is related to minimizing a multi objective cost function. One of the objective terms of this 11

function is increasing the value of which leads to an extended domain of attraction (Stoer & Bulirsch, 2002; Heath, 1997). Therefore, greater values of is more advantageous. On the other side, smaller values of guarantee generalization ability of s (Xu, 2007). In addition, the error of mapping is proportional to and (Bavafaye Haghighi & Rahmati, 2013; Bavafaye Haghighi & Rahmati, 2012). which is the error of the mapping to the primary codes, is given as follows: (21) where

is the mapping to the primary codes and is defined as: (22)

In (22), the ith row of matrix The value of

is

and

is Moore-Penrose pseudo inverse operator.

is determined using a multi objective minimization cost function which is formulated as: (23)

The first objective term of (23) deals with increasing the value of to extend the domain of attraction and the second term minimizes the error of the mapping which results in increasing its generalization ability [35]. By considering the challenge of the proper adjustment of the weights of the objective terms in a multi objective cost function (Xu, 2007; Vapnik, 2000), it is shown in (Bavafaye Haghighi & Rahmati, 2013; Bavafaye Haghighi & Rahmati, 2012) that correlates inversely with and the general representation of is as follows: (24) The term which is set as “ ”, appears in (24) to avoid appearance of infinite values for parameter which is adjusted with “50” by which better performance is shown in our experiments.

.

is a free

5.4. Learning Attractors and Mapping f(.) It is noted in Section 3 that the attractors of SDNN-H are associated with odd values of s. Therefore, attractors are determined by multiplying in odd natural numbers in the domain of [1, 2NI+1]. When attractors and primary codes are arranged in increasing order, a bijection between them is determined. For example for the set of primary codes {1,2,3} and , the set of attractors is {2,6,10} and the bijection between them is as follows: . Since NI is the number of attractors which is defined for each sub-system, the total number of attractors of the system in dimensional state space is . As a result, the storage capacity of the network grows exponentially by increasing the value of . In the next step, each

(

) is assigned to an attractor which is in correspondence with th

. The vector

represents the attractors of training samples for k sub-system of SDNN-H. Once determined, the mappings from input space to attractors are estimated by:

is

(25) In (25), is used to denote the estimation of . This enables us to show, theoretically, how mapping is defined to the attractors. However, in practice due to the distribution of samples, they are mapped in vicinity of attractors in domain of attraction. For example, if ( ) is considered such that , for a disturbed pattern 12

, we have . The value of is nonzero for almost all disturbances. As a result, according to the distribution of samples, it is impossible to map all of them to the defined attractor. However, since attractors are determined by applying primary codes with respect to the topological information of dataset, is a topology preserving mapping. Therefore, close samples have close initial states. By applying fine parameter adjustment, more accurate domain of attractions are determined to converge to the correct attractors. 5.5. Fine Parameter Tuning of SDNN By performing fine parameter adjustment of SDNN, it is expected that the borders of domain of attractions being rotated and curved. In this line, minor adjustments are applied by considering the frequencies of dynamics of variables and correlation of initial states of samples. By applying on training samples, their initial states are determined. Matrix is the correlation of these states. By decomposing using Singular Value Decomposition (SVD) (Meyer, 2000), a diagonal matrix and a rotation matrix (which is ) are computed. In order to determine , the values of s, which are equal to (Subsection 5.3.), are arranged on the main diagonal of this matrix, accordingly. Finally, the coefficient matrix is computed using (6). Since ( ), is strictly PD. As a result, the stability condition of Theorem 1 for Lyapunov theorem is satisfied. Consequently, based on the initial state of a sample, it converges to an attractor of SDNN following its dynamical evolution3. By determining the attractor of each sample, the attractor is assigned to the frequent label of the accepted samples. Since SDNN is a topology preserving method, unlabeled attractors are labeled with respect to their neighbors.

6. Experimental Results In order to analyzes SDNN, artificial and synthetic datasets are applied. In Fig. 6, representations of 8 dimensional artificial datasets in are illustrated. In order to produce each dataset in , the coordinates of circles, triangles or arcs in are repeated 4 times. SDNN-H and SDNN are both applied on these datasets. Trajectories toward attractors of SDNN-H and SDNN are shown in figures 7.a-7.f. In the case of circles dataset, the error rate of SDNN and SDNN-H are 1.11 % and 4.02 %, respectively. For horizontal and vertical parts of data, the accuracy of SDNN-H which has hyper-cubic domain of attraction (Fig. 7.a), is better than SDNN (Fig. 7.b). In contrast to SDNN-H, SDNN fits better on slopes which is the result of considering correlation of patterns in the state space. The error rate of SDNN-H for triangles and arcs are 8.46 % (Figure 7.c) and 27.5 % (Figures 7.e), respectively. The hyper-cubic domain of attraction (square in ) of SDNN-H can be seen in these figures. Since these squares are not flexible to learn the distribution of data, the corners of triangles and about 1/3 of samples from arcs are misclassified. By considering the correlation of data using SDNN, the error rate to classify triangles and arcs are 0.0 % and 9.16 %, correspondingly. As a result, although SDNN has better performance for the applied artificial datasets in comparison to the hyper-cubic version, SDNN-H may sometimes be advantageous depending on the distribution of data.

3

In the case of equilibrium points (i.e. when ), a random-disturbance which is called , is added to the initial state. In order to make a small disturbance, ( ) is satisfied where is kth element of . By adding a small disturbance to unstable/saddle equilibrium points, they will be included in a stable trajectory.

13

(a)

(b)

(c) Fig. 6. Two dimensional representation of (a) circles (b) triangles and (c) arcs datasets.

(a)

(b)

(c)

(d)

14

(e) (f) Fig. 7. Figures (a), (c) and (e) present phase trajectory of SDNN-H for the datasets of Fig. 6. At the right side, figures (b), (d) and (f) illustrate trajectories of SDNN. Although, SDNN-H fits better on the horizontal and vertical parts of data in comparison to SDNN (see figures (a) and (b)), its fixed shape of hyper cubic domain of attraction makes it impossible to match on different distributions of samples (figures (c)-(f)).

6.1. Real world Datasets Table 2 presents dimensionality, number of classes and samples of datasets and, also, parameter setting of SDNN and SDNN-H. Forest Cover Type (Forest), Wall Following Robot (Robot) and Segmentation (Segment) are datasets which are downloaded from UCI repository (UCI_Repository, 2014). Forest is known as a challenging classification task which includes binary variables as well as measured ones. No feature extraction is applied on the UCI datasets. COIL100 (Nene et al., 1996) includes color images of 100 different objects which are turned by 5°. Therefore, there are 72 images from different views for each object. In COIL-A, 18 images of each object (which are turned 20°) are applied to train and the 54 remaining images are used to test. In COIL-B, 36 images (by turning each object 10°) are considered to train and the remaining 36 images are used to test. Similar to (Kietzmann et al., 2008), 292 dimensional features are extracted from each image of COIL100. Each feature includes 64 dimensional histograms of Lab channels, 64 dimensional histogram of Discrete Cosine Transformation (DCT), 8 dimension for Hu moments and 10 dimension for shape information in addition to the logarithm of their absolute values. Table 2. Specifications of the applied datasets and parameter settings. Forest COIL-A COIL-A Robot Segment

n

K

l

L

m

NI

54 292 292 24 19

7 100 100 4 7

290321 1800 3600 4911 210

580642 7200 7200 5457 2310

12 8 8 8 8

8 14 14 12 6

50 50 50 50 50

The parameters of the proposed method are chosen as given in Table 2. It is shown in Section 5.3, that the value of correlates inversely with . As a result, only a fine adjustment for is applied using . Consequently, the sensitivity of the method to is lower than the other parameters which is about 1%; however, the setting of the parameters NI and m play an important role on the performance of SDNN. Figure 8 presents the error rate of SDNN and SDNN-H using different values of NI , m and Storage Capacity (S.C.= ) for the Segments dataset. In the cases of S.C.=16, S.C. {125, 128} or S.C.=65536 in which low values for NI and m are applied, the capacity of the network and its accuracy decreases significantly. By referring to the accuracy of low capacity networks, it is realized that S.C. is an important factor for the performance of SDNN or SDNN-H. By using similar-low capacities for the networks, the performance increases by selecting higher value of m while the effect of losing information by mapping to lower dimensional space (Lee & Verleysen, 2007; Jolliffe, 2002; Izenman, 2008) reduces in these

15

cases. In other words, by applying more mappings of information of data.

, the state space includes more details about topological

Comparing the error rate of SDNN and SDNN-H for each setting of the parameters in low capacity networks, it is known that the performance is enhanced by considering correlation of samples in the state space using SDNN. As a result, although the loss of information is one of the disadvantages of applying , adding the correlation of samples in the state space in SDNN leads to a situation where the accuracy is enhanced in comparison to SDNN-H when the value of storage capacity is not high. For networks with high capacity (e.g. S.C.=4782969, S.C.=60466176 and S.C.=64000000), the latter solution is not always successful. Increasing the storage capacity of SDNN and SDNN-H causes the problem of over-fitting (Vapnik, 2000). This problem increases by adding more information of data-correlation in SDNN. As a result, the error rate of SDNN can be higher than SDNN-H in high capacity networks. Similarly, applying high values for m does not necessarily enhance the results, while details are mapped more precisely onto the “pre-over-fitted” state space. By comparing the error rate of SDNN-H for “m =14, NI =3” and “m =10, NI =6”, one can see that the storage capacity of the former setting is less than the latter one; however, its accuracy decreases. Therefore, a trade-off point between S.C. and m should be found. Another important note about high capacity networks is, when the capacity of the network grows extremely high in comparison to the number of samples, most of the attractors remain unlabelled. In the case of Segments dataset, the setting of “m =8, NI =6” is a trade-off point, which makes a better control on the storage capacity of the network, unlabelled attractors and over-fitting. The value of m is set as neither low nor high, which supports enough details for the network.

Fig. 8. Error percent of SDNN and SDNN-H on Segment dataset using different values of NI, m and Storage Capacity (S.C.).

Table 3 presents the error rate of SDNN on real world datasets in comparison to SDNN-H which includes hypercubic domain of attraction. By applying fine parameter tuning in SDNN, the error rate of the proposed method is reduced. However, in the case of Robot, there is a low increment in the error rate of SDNN which might also be a 16

matter of the special topology of data (similar to Fig 7.a) in which applying hypercubic domain of attraction is preferred. Although in this paper, the learning ability of the proposed framework as a dynamical NN with exponential number of attractors is considered, classical methods which have shown acceptable performance on real world data are applied in our comparisons. Using these comparisons, advantages and disadvantages of the proposed method can be studied better. The classical methods include Multi-Layer Perceptron (MLP) (Oza, 2005), Linear Support Vector Machine (Yang et al., 2002), Non-Linear Support Vector Machine (Bala & Agrawal, 2009) and Combination of Classifiers (C_C) (Sen & Erdogan, 2011) which are applied using similar test conditions (i.e. number of train and test samples are similar). Since the proposed method supports an exponential number of attractors, it outperforms the other methods in the case of Forest which has the highest number of samples. In the cases of COIL-A, COIL-B, Robot and Segment, the accuracy of SDNN is not better, however comparable, in comparison to the classical classifiers. Table 3. Error percent of SDNN in comparison to SDNN-H and classical methods with similar test conditions. Forest COIL-A COIL-B Robot Segment

SDNN-H 22.5 14.6 10.13 3.9 19.76

SDNN 17.48 12.03 9.75 4.1 15.61

Other Methods 24.27 MLP (Oza, 2005) LSVM (Yang et al., 2002) 8.7 LSVM (Yang et al., 2002) 3.96 C_C (Sen & Erdogan, 2011) 2.5 NLSVM (Bala & Agrawal, 2009) 10.48

7. Conclusion and Future Works In this paper, Sinusoidal Dynamic Neural Network (SDNN) as a new class of multi-stable neural networks with exponential number of attractors was proposed. The conditions of multi-stability of this system are analyzed using Lyapunov theorem. The learning process and determining attractors are accomplished by considering topological information of data and stability conditions. Consequently, the new framework of dynamical NN can be applied in real world datasets which is an important advantage in comparison to the other NNs in the same class. In order to increase the performance of SDNN, a wide range of enhancements are proposed which include: considering other alternatives for the dynamics of the system such as applying polynomial function instead of the sine one; enhancing the learning algorithm of SDNN in order to better preserving topological information of data; and theoretical analysis of the method using information theory aspects (Palm, 2013). The latter subject is also important to estimate a trade-off point between capacity of the network and the number of mappings to achieve a better control on the effective storage capacity and over-fitting. Additionally, extensive application of SDNN in real world challenges such as multi-objective intelligent control, data mining, object recognition, etc. with high number of classes is recommended. Combining the advantages of the proposed framework with memristors (Thomas, 2013) may be considered for future work.

References Ao, S.L. (2010). Applied Time Series Analysis and Innovative Computing. Springer. Bala, M., Agrawal, R.K. (2009). Evaluation of Decision Tree SVM Framework Using Different Statistical Measures. International Conference on Advances in Recent Technologies in Communication and Computing, 341-345. Bavafaye Haghighi, E., Rahmati, M., (2012). Enhancing the Accuracy of Mapping to Multidimensional Optimal Regions Using PCA. International Joint Conference on Computational Intelligence, 536-546.

17

Bavafaye Haghighi, E., Rahmati, M. (2013). Theoretical Aspects of Mapping to Multidimensional Optimal Regions as a MultiClassifier. Intelligent Data Analysis, 17, 981–999. Chartier, S., Boukadoum, M. (2006). A Bidirectional Heteroassociative Memory for Binary and Grey-Level Patterns. IEEE Transactions on Neural Network, 17, 385-396. Chen, T., Wang, L. (2007a). Power-rate global stability of dynamical systems with unbounded time-varying delays. IEEE Transactions on Circuits and Systems II: Express Briefs, 54, 705–709. Chen, T., Wang, L. (2007b). Global μ-stability of delayed neural networks with unbounded time-varying delays. IEEE Transactions on Neural Networks, 18, 1836–1840. Chenga, Ch.Y., Linb, K.H., Shih, Ch.W. (2007). Multistability and convergence in delayed neural networks. Physica D, 225, 61–74. Du, Y., Xu, R. (2014). Multistability and Multiperiodicity for a Class of Cohen–Grossberg BAM Neural Networks with Discontinuous Activation Functions and Time Delays. Neural Processing Letters, springer. El-Rewini, H., Abd-El-Barr, M. (2005). Advanced Computer Architechture and Parallel Processing. John Willey and Sons. Heath, M. T. (1997). Scientific Computing: An Introductory Survey, Mc Graw Hill. Hu, Y.H., Hwang, J.N. (2010). Handbook of Neural Network Signal Processing. CRC press. Huang, Y., Zhang, H., Wang, Z. (2014). Multistability of complex-valued recurrent neural networks with real-imaginary-type activation functions. Applied Mathematics and Computation, 229, 187–200. Izenman, A. J. (2008). Modern Multivariate Statistical Technics. Springer. Janczak, A. (2005). Identification of Nonlinear systems using Neural Networks and Polynomial Models, A Block Oriented Approach. Springer. Jolliffe, I.T. (2002). Principle Component Analysis. (2nd ed.). Springer. Khalil, H.K. (1996). Nonlinear Systems. (2nd ed.). Prentice Hall. Kietzmann, T.C., Lange, S., Riedmiller, M. (2008). Incremental GRLVQ: Learning Relevant Features for 3D Object Recognition. Neurocomputing, 71, 2868-2879. Kuznetsov, Y.A. (1998). Elements of Applied Bifurcation Theory. (2nd ed.). Springer. Lee, J.A., Verleysen, M. (2007). Nonlinear Dimensionality Reduction. Springer, New York. Lili, W., Wenlian, L., Tianping, Ch. (2010). Coexistence and local stability of multiple equilibria in neural networks with piecewise linear nondecreasing activation functions, Neural Networks. 23, 189_200. Lin, T.Y., Xie, Y., Wasilewska, A., Liau, C.J., (2008). Data mining: foundations and practice, Springer. Liu, G.P. (2001). Nonlinear Identification and Control, A neural Network Approach. Springer. Meyer, C. D. (2000). Matrix Analysis and Applied Linear Algebra, SIAM. Nene, S.A., Nayar, Sh.K., Murase, H. (1996). Columbia Object Image Library (COIL 100). Technical Report No. CUCS-00696, Department of Computer Science, Columbia University. Nie, X., Cao, J. (2009). Multistability of competitive neural networks with time-varying and distributed delays. Nonlinear Analysis: Real World Applications, 10, 928–942. Nie, X., Cao, J., Fei, S. (2014). Multistability and Instability of Competitive Neural Networks with Mexican-Hat-Type Activation Functions. Abstract and Applied Analysis, 1-20. Nørgaard, M. (2000). Neural Networks for Modelling and Control of Dynamic Systems: A Practitioner's Handbook. Springer. Oza, N.C. (2005). Online Bagging and Boosting, IEEE Intenational Conference on Systems, man and cybernetics, 2340-2345. Palm, G. (2013). Neural associative memories and sparse coding. Neural Networks, 37, 165-171. Perfetti, R., Ricci, E. (2008). Recurrent Correlation Associative Memories: A Feature Space Perspective. IEEE Transactions on Neural Networks, 19, 333-345. Sen, M.U., Erdogan, H. (2011). Max-Margin Stacking and Sparse Regularization for Linear Classifier Combination and Selection. Cornell University Library, arXiv:1106.1684v1 [cs.LG]. Shen, D., Cruz J.B., J.R., (2005). Encoding Strategy for Maximum Noise Tolerance Bidirectional Associative Memory. IEEE Transactions on Neural Networks, 16, 293-300. Slotine, J.J.E., Li, W.A. (1991). Applied Nonlinear Control. Prentice Hall. Sonka, M., Hlavac, V., Boyle, R. (2014). Image Processing, Analysis, and Machine Vision. (4th ed). Cengage Learning. Stoer, J., Bulirsch, R. (2002). Introduction to numerical analysis, Springer. Strogatz, S.H. (1994). Nonlinear Dynamics and Chaos, with Application to Physics, Biology, Chemistry and Engineering. Perseus Books Publishing.

18

Sudo, A., Sato, A., Hasegawa, O. (2009). Associative Memory for Online Learning in Noisy Environments Using SelfOrganizing Incremental Neural Network, IEEE Transactions on Neural Networks, 20, 964-972. Terman, D. (2005). An introduction to dynamical systems and neuronal dynamics. In A. Borisyuk, A. Friedman, B. Ermentrout, D. Terman (Eds.), Tutorials in Mathematical Biosciences I, Mathematical Neuroscience. Springer. Thomas, A. (2013). Memristor-based neural networks. Journal of Physics D: Applied Physics, 46, 093001. UCI_Repository (2014). http://archive.ics.uci.edu/ml/. Vapnik, V. N. (2000). The Nature of Statistical Learning Theory. (2nd ed.). Springer. Wang, L., Chen, T. (2014). Multiple μ-stability of neural networks with unbounded time-varying delays. Neural Networks, 53, 109_118. Wang, Y.F., Cruz, J.B., J.R., Mulligan, J.H. J.R. (1990). Two Coding Strategies for Bidirectional Associative Memory. IEEE transactions on neural networks, 1, 81-92. Xu, L. (2007). A Trend on Regularization and Model Selection in Statistical Learning: A Bayesian Ying Yang Learning Perspective. In W. Duch, J. Mandziuk (Eds.), Challenges for Computational Intelligence (pp. 343-406). Springer. Yang, M.H., Roth, D., Ahuja, N. (2002). Learning to Recognize 3D Objects with SNoW. Neural Computation, 14, 1071-1104. Zeng, Z., Zheng, W.X. (2012). Multistability of Neural Networks with Time-Varying Delays and Concave-Convex Characteristics, IEEE Transactions On Neural Networks and Learning Systems, 23, 293-305. Zeng, Z. G., Wang, J., Liao, X.X. (2004). Stability analysis of delayed cellular neural networks described using cloning templates, IEEE Trans.Circuits Syst. I, 51, 2313–2324. Zeng, Z. G., Wang, J. (2006). Multiperiodicity and exponential attractivity evoked by periodic external inputs in delayed cellular neural networks, Neural Comput., 18, 848–870. Zurada, J. (1992). Introduction to Artificial Neural Systems, West Publishing Company

19

A new class of multi-stable neural networks: stability analysis and learning process.

Recently, multi-stable Neural Networks (NN) with exponential number of attractors have been presented and analyzed theoretically; however, the learnin...
1MB Sizes 1 Downloads 5 Views