IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING. VOL. 37. NO. 3. MARCH 1990

23 1

Stability in Contractive Nonlinear Neural Networks DOUGLAS G. KELLY

Abstract-We consider models of the form pi = -1 + p + W F ( x ) where x = I ( t ) is a vector whose entries represent the electrical activities in the units of a neural network. Wis a matrix of synaptic weights, F is a nonlinear function, and p is a vector (constant or slowly varying over time) of inputs to the units. If the map W F ( x ) is a contraction, then the system has a unique equilibrium which is globally asymptotically stable; consequently the network acts as a stable encoder in that its steady-state response to an input is independent of the initial state of the network. We consider some relatively mild restrictions on Wand F ( . r ) , involving the eigenvalues of Wand the derivative of F, that are sufficient to ensure that W F ( x ) is a contraction. We show that in the linear case with spatially-homogeneous synaptic weights, the eigenvallies of Ware simply related to the Fourier transform of the connection pattern. This relation makes it possible, given cortical activity patterns as measured by autoradiographic labeling, to construct a pattern of synaptic weights which produces steady state patterns showing similar frequency characteristics. Finally, we consider the relationships, in the spatial and frequency domains, between the equilibrium of the model and that of the linear approximation pi = - x p Wx; this latter equilibrium can be computed easily from p in the homogeneous case using discrete Fourier transforms.

+ +

INTRODUCTION HERE is considerable interest, coming from several directions, in neural networks which act as stable encoders of input signals, in the sense that when delivered constant or slowly-varying input, they quickly reach a steady state (equilibrium) that depends only on the input and not on the initial state of the system. First, such systems are of interest in the modeling of biological neural processes. Recent work (see [ 171, [23], [24]) on the encoding of input stimuli by mammalian primary sensory cortex indicates that to a first approximation it behaves in this manner, in that the cortical response to repeated stimulation, as measured by autoradiographic labeling, is stimulus-specific and constant within species. Second, there is much current interest in neural-network architectures for computing and/or cognitive machines. In particular, work of Hopfield and Tank [14][16], [22] and others (see Feldman and Ballard [9] for a survey) makes it clear that the dynamics of model neural networks-or, at least, “connectionist machines” that have many similarities to neural networks-can be exploited for accomplishing various types of cognitive and computational tasks. Recent papers of Sejnowski,

T

Manuscript received September 23, 1988; revised March 22, 1989. This work was supported by NIDR Program Grant DE-07509, (B. L. Whitsel, Principal Investigator). The author is with the Departments of Mathematics and Statistics, University of North Carolina, Chapel Hill, NC 27599. IEEE Log Number 8933060.

Churchland, and Koch [6], [19], [20] and the essay of Crick [8] contain valuable surveys and comments on the burgeoning area of neural-architecture computing and its relations to the modeling of real biological neural networks. The paper by Lippmann [ 181 is a useful survey of the taxonomy of computing architectures based on neural or connectionist principles. Cohen and Grossberg [7] argue persuasively to justify an interest in stable encoding or, as they call it, global pattern formation. Their paper deals with a class of neural networks (symmetric, purely-inhibitory networks) which can be proved to respond to any constant or slowly-varying stimulus by moving to a steady state that depends on and hence encodes the input. The symmetry and negativity of the synaptic weights allow the construction of a Liapounov function whose existence guarantees the stability. They point out that several kinds of models of cognitive processes and other systems fit into their class (although, because of the required lack of excitatory synapses, biological nets do not fit directly). Of great interest is the fact that “Hopfield nets” are of this sort; the work of Hopfield and Tank referred to above exploits the form of the Liapounov function to construct neural-architecture machines that find local minima for various classical optimization problems. In this paper we consider another class of neural nets that act as stable encoders in the same sense. Our nets require neither symmetry nor negativity of synaptic weights; instead, we assume contractiveness; that is, that the dynamical system determined by the differential equations of the net be contractive (details are in Section IV). This condition turns out to be easy to verify, and to be a relatively mild restriction on the construction of models of living biological neural networks. Another assumption that can be imposed on a neural network of the type we consider is that of spatial homogeneity. This is the requirement that the synaptic connections between pairs of units be the same for pairs that are in the same relative position to each other (details are in Section 111). This condition, not the same as symmetry, is appropriate for modeling a real network in which the synaptic-weight pattern is approximately uniform. This assumption provides a great deal of tractability; as we note in Section 111, a linearized version of the spatially-homogeneous model can be analyzed completely in the spatial-frequency domain, and correspondences with biological data can be shown. We indicate in Section I11 how to construct a stably encoding model that reproduces certain filtering properties inferred from spectral analysis of au-

0018-9294/90/0300-023 1$01 .OO O 1990 IEEE

232

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING. VOL 37. NO. 3. MARCH I Y Y I )

toradiographic labeling in the forelimb area of S-I in the cat. The general model, on which we impose the condition of contractiveness, is essentially like those considered by Cowan and others [25], [lo]. It consists of units which represent aggregates of neurons, although they could as well represent single neurons. For the purposes of comparison with the experimental data we will view the units as cortical “minicolumns, which are approximately 30 pm (one neuron) in diameter and extend radially through the cortex from upper to lower surface. Associated with each unit, say unitj, is a time-varying quantity xJ = xJ ( t ) representing the average electrical activity (as measured, say, by membrane potential above or below resting level) of the neurons in the unit over a short time interval containing time t . With every pair ( j , k ) of units is a synaptic weight WJkthat does not change with time; it represents an average strength of synaptic transmission from neurons in unit k to neurons in unit j . These weights may be positive, negative, or zero, and W need not equal WkJ.In !k realizations of the model the units are arranged on a twodimensional grid or in a one-dimensional array, and the synaptic strengths may be (but need not be) zero or near zero for units that are distant from each other. Synaptic transmission is nonlinear: activity in unit k has an effect on that in unit j proportional to WJkfk(Xk)where fk is a smooth sigmoid function. Input is constant over time (or slowly varying on the time scale of the equations of the system) but otherwise arbitrary: each unit, say unit j , receives input from outside the system that tends to change its potential by a fixed amount pJ per unit time, and the quantities pJ can form any pattern whatever, with no restrictions as to magnitude. These assumptions result in the system (1.1) of differential equations (below) governing the behavior of the network. The results we obtain for such networks are as follows: 1) If the model is contractive (see Sections I1 and IV; this assumption involves the eigenvalues of the matrix W = { w!k} and the derivatives of the functionsfk) , then the * , xN ( t ) converge at an exactivities x, ( t ), x2( t ), ponential rate to steady-state activities XI,X 2 , * * * X N that depend only on the inputs pJ and not on the initial conditions xJ ( 0 ) . 2) Under the assumption of spatial homogeneity of synaptic weights (that is, that WJkis the same as W,, if the direction and distance of unit k from unit j are the same as those of unit s from unit Y ) , the conditions required in 1) above can be easily checked, and the relation of the steady state to the input can be computed (exactly in a linearized approximation to the model, approximately in some cases for the nonlinear model) using discrete Fourier analysis. In fact, the network under these assumptions acts as a filter, enhancing certain characteristic spatial frequencies present in the input pattern and suppressing others. ”

--

3

3) Discrete Fourier analysis of 2-deoxyglucose autoradiographic data shows spatial frequencies other than those predicted by known anatomical structure [23]. Appropriate choice of synaptic weights in the model can cause it to produce steady states with comparable spatial frequencies. 4) Simulation studies indicate that the correspondence between the steady states of the nonlinear model of the linearized approximation to it, is good, at least in the spatial-frequency domain, for certain nontrivial ranges of input. I . THEGENERALMODEL In this paper we consider the following neural network model and some special cases of it: Units 1 , 2, 3, , N represent single neurons or aggregates of neurons behaving as processing units. For each unit j , xJ ( t ) represents the average membrane potential in unit j in a small time interval ending at time t. The quantities xJ ( t )change with time according to the differential equations N

Fi ( t ) =

( t ) + PJ +

k= I

ykfk(xk(t))

( j = 1, 2, . ,N ) . (1.1) Here f k ( x k )is a function which converts average membrane potential into average firing rate for unit k ; different units may have different conversion functions. WJ! is a constant representing the synaptic strength from unit k to unit j ; multiplying by W converts a firing rate in unit k !k to the resulting rate of increase or decrease in average membrane potential in unitj. Positive values of WJkreflect excitatory synapses (or, more generally, aggregates k whose net average effect on aggregatej is excitatory), and negative WJkcorrespond to inhibition. WJkneed not equal wjk.pJ represents input to u n i t j from outside the network, in the form of a rate of increase in average membrane potential. p is a time constant governing the rate of change. The p , are assumed to vary slowly with time as compared to the evolution of the potentials x, , so that we may consider them constant in system (1.1). In one possible realization of this general model, the units are viewed as aggregates of neurons, say the 350pm cortical columns widely thought to act as units of cortical information processing (see Szentagothai [21]). In such a realization an appropriate choice of fk(xk) is a “smooth sigmoid,” and it is no serious restriction to assume it is nondecreasing with a bounded derivative. In another possible realization, units represent single neurons; it may be desirable in such a case to introduce hard thresholds by using the “high-gain limit” of the functions fk. See [l], [2], [5], [lo]-[16], [22], [25] for examples and discussions concerning various kinds of sigmoids in this and related models. In an intermediate case the units represent ‘‘minicolumns” ; this produces a model appropriate to represent a small cylindrical patch of cortical tissue, say 400 pm in diameter. The use of smooth sigmoids

233

KELLY: STABILITY IN NEURAL NETWORKS

for the is again appropriate in this case. In Section 111-C below we investigate synaptic-weight patterns for a minicolumn realization suggested by comparison with autoradiographic labeling studies in such a patch of tissue. Because the quantities p, are constant over time, the results we derive can be viewed as applicable to the shortterm processing (encoding) of input stimuli by primary sensory cortical regions, or to the processing of repeated input stimuli, such as that encountered in radiographic labeling experiments. In vector-matrix notation the system (1.1) reads p i = -x

+ p

+ WF(x),

(1.2)

where

x = x(t) =

and

ditions x ( 0 ) x(t) =

= x o , the

ecw-/)'/Pxo

+ (I

-

solution of (2.1) is + (W - I ) - l e ( w )

' / P

P

(2.2)

w)-'p.

In this case the system has a unique equilibrium, given by

x = (I -

w)-Ip.

(2.3) If in addition the real parts of all the eigenvalues of Ware less than 1, then this equilibrium is the limiting state of the system for any initial state; that is, the equilibrium (2.3) is globally asymptotically stable. The linear system (2.1) in the stable case can be viewed as follows: given a matrix W of synaptic weights whose eigenvalues all have real parts less than 1, the system acts as a linear function (2.3) which maps any input vector p in R N to a steady-state X in R N that is independent of the initial state x o . In the next section we will consider cases in which the system can be regarded as a bandpass filter. In Section IV below we will show that a certain subclass of nonlinear models (1.1) also act approximately as filters in a similar fashion. In case W has the additional property of being diagonable, that is, if W has N linearly independent eigenvectors X I , x 2 , . . . , x N , the linear system can be viewed as enhancing certain components of the input vector and suppressing others. For it is then possible to write the input vector p as a linear combination of the eigenvectors, say N

p

and if A , , A,, XI,

When we need suffixes in vector names we will use superscripts, reserving subscripts for indicating the components of a vector. Thus, for example, we will refer to a collection of activity vectors x as X I , x 2 , etc., and the kth component of x j will be denoted xi.. 11. THE LINEARCASE Here we consider the special case of the model described above in which& (xJ) = x, for each unitj, so that (1.2) becomes px = -x

+ p + wx.

(2.1)

This model is no less general than the one in which6 (x, ) = a, + bJx,, because the constants a, and bJ can be absorbed into the components of p and W , respectively. Despite the simplicity and the theoretical tractability of this version, and despite the general agreement that nonlinearities are essential features of realistic brain models, the linear case can be used as an illuminating first-approximation model of the processing of input by primary sensory cortex [24]. It follows from the elementary theory of linear systems that if no eigenvalue of W equals 1, then for initial con-

x2,

=

c

h=l

7TJxJ;

(2.4)

, AN are the eigenvalues associated with

- - . x N , then the equilibrium can be computed as

In the next section we examine a class of models included in (2.1) in which the eigenvectors x J are spatial sinusoids and the computation of the 7rJ and A, amounts to discrete Fourier analysis. 111. SPATIALLY-HOMOGENEOUS LINEARSYSTEMS In this section we further specialize the linear system to the spatially homogeneous case; that is, we require that the synaptic strength betwen two units depend only on their relative positions. This assumption is appropriate in modeling the encoding of input stimuli by a network with a uniform synaptic-weight pattern. More importantly, subsystems of real biological networks may approach spatial homogeneity without attaining it, in which case approximations like those in Section IV-C provide that this section's results still apply. We consider first a one-dimensional network; this model exhibits the essential features of the two-dimensional network and is notationally simpler. Moreover, even the one-dimensional model can be used (as in Sub-

234

I E E E TRANSACTIONS ON B I O M E D I C A L ENGINEERING, VOL. 37. NO. 3. MARCH 1990

section C below) to mimic the spectral behavior of real networks. A . One Dimension Here we view the units of the linear model (2.1) as arranged in a one-dimensional array, with equal spacing between adjacent units. We then impose the additional assumption of spatial homogeneity: that WJk depends only on the relative positions of the units j and k and not on their values. It is convenient for this purpose to number , N and the units 0, 1, . . . , N - 1 instead of 1, 2, * to consider that unit N - 1 is adjacent to unit 0. This is conceptually the simplest way of taking care of “end effects”; the artificiality of having the units at either end adjacent to each other is preferable to the alternatives: considering infinitely many units, clamping the activity levels of the end units, or adding additional units with activities tapering to zero. If the number N of units is large, any of these ways of dealing with end effects diminishes negligibly the usefulness of the model in reflecting real processes, especially if the input patterns p have little activity near the ends of the array. It is worthwhile to note that spatial homogeneity is not the same as symmetry, nor as isotropy. Symmetry requires that WJk = wkJfor all j and k , while isotropy requires that WJk.= WJIfor any units k and 1 at the same distance from J , regardless of their direction. We will assume symmetry in addition to spatial homogeneity in some of what follows; we will not have occasion to consider isotropy. With the two end units adjacent, we are essentially viewing the units as though they were arranged on a circle. Spatial homogeneity then amounts to the assumption that there are constants wo, w I , . . . , W N - I such that WJk = wk where k - j is to be interpreted as the number of counterclockwise steps from unit j to unit k . (Symmetry would require that wk-J = w J - k . ) The matrix W in the spatially-homogeneous case is a circulant, a special kind of Toeplitz matrix: it is of the form

-

is an eigenvector of W , and N- 1

(3.2)

is the eigenvalue associated with x i Proof: The kth entry of Wx’ is N- I

N- 1

the kth entry of A i x J . Proposition 2: The coefficients (2.4) are given by

.

0 7rj

in the expansion

N-l

( j = 0, 1, 2,

*

*

,N - 1).

(3.3)

Proof: The assertion is that (3.3) implies N- I

Pk =

,E , j e f ( 2 x U k / N )

(k

J=o

=

0, 1, 2,

. . ,N *

-

1).

(3.4) But (3.3) and (3.4) are equivalent assertions: the former is the definition of the discrete Fourier transform and the latter is the discrete Fourier inversion formula. (See [41. 0 We note that the eigenvectors x of W do not depend on the values of wjk, but only on the fact that W is a circulant. They are always complex spatial sinusoids:

rl

1

cos ( 2 7 r j / N ) XJ

=

cos ( 2

*

27rj/N)

- J

Two I w= I wN-l

1

w1

wN-11

wI

I

wI

( ( N - 1)

*

27~j/N)J

1

I-

wN-I

* -

COS

w *N -*

.

I

w1

1

Proposition 1: In this case, for each j = 0, 1, 2, N - 1,

... ,

(3.1)

This sinusoid has wave numberj; it completesj cycles as we move from unit 0 to unit N - 1 and thus has a frequency o f j / N cycles per unit and a period of N / j units. Equation (3.2) shows that the Ai’s constitute the inverse discrete Fourier transform of the wj’s. They are not real in general (although they are real if W is additionally assumed to be symmetric). From the above discussion we see that given any set of , w N - I one can check stability, and weights wo,w l , then find the equilibrium potential levels X o , X I , , X N P l for any set of input activity levels p o , p l , * , p N- by the following sequence of steps:

KELLY: STABILITY IN NEURAL NETWORKS

235

In

a

0

"

"

"

"

"

"

"

'

0

0 0

2-

In

In

22-

2-

0

0

2-

In

9-

0

SE 0

z 0

.-,

0

eigenvalues of p a t t e r n 2 % , " " " " " " ' " ' ' ~

Fig. I . Two-spatially-homogeneous one-dimensional synaptic weight patterns [(a) and (c)] and their (squared) eigenvalues [(b) and (d)], as computed from (3.2), for a system of N = 81 units. All eigenvalues are real because the patterns are symmetric, and are less than 1 in absolute value. X, is the largest of pattern 2's eigenvalues; steady states will show enhanced spatial frequencies with periods of around 81/3 units. Those from pattern 3 will show enhanced frequencies with periods of around 81/10 units

1) Compute ho,A I , . . . , A N - I using (3.2). The system is globally stable if and only if Re ( A, ) < 1 for all j . 2) .Compute ro,r l , * , r N - using the discrete Fourier transform (3.3). 3) Compute Eo, E l , * * * , X N - , using

-

N- I

%k=

c-

e +(2aijk/N)

rJ

J = o 1 - h,

(k

=

0, 1, 2,

* *

,N

- 1).

(3.5)

The above discussion, involving the two propositions of this section along with (2.4) and (2.5) in the previous section, amounts to implementing the observation that in the stable, spatially-homogeneous linear case the model (insofar as we consider only the steady-state X as a func-

tion of input p ) acts by selectively enhancing certain spatial frequencies (those with hi near 1 ) and suppressing others. Figs. 1 and 2 show examples of such systems; the enhancement of spatial frequencies is especially apparent when the input p contains random noise. B. Two Dimensions Except for notational inconvenience the two-dimensional spatially-homogeneous linear model is not different in principle from its one-dimensional counterpart: the eigenvectors (' 'eigenpatterns") of W correspond to two-dimensional spatial sinusoids, the eigenvalues are the twodimensional discrete inverse Fourier transform of the pattern of synaptic weights [in analogy with (3.2)]; an input pattern p can be expressed as a linear combination of the

236

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING. VOL.

input p a t t e r n 4

37. NO 3. MARCH 1990

avg pdgm - layer 3

z z r .

40

20

60

I

100

80

frequency Ccycles/rnrnl

(a)

weights

t

(a)

output: input 4, weights 2 9

l

,

l

,

l

,

.

,

.

I



I

,

I

,

I

‘-8

-6

-4

-2

0 1

2

4

6

8

(b) Fig. 3. Choice of synaptic weights in one-dimensional linear version of the model to show correspondence with observed frequencies in autoradiographic data. (a) Average of periodograms of eight serial sections taken from within a single “patch” (area of high labeling intensity roughly 400 p n in medio-lateral extent) in forelimb area of cat S-I. (Data courtesy of Dr. B. L. Whitsel.) (b) Synaptic weight pattern for a symmetric, spatially-homogeneous one-dimensional network model whose response to white-noise input shows the observed frequency structure at periods corresponding to multiples of the width of a minicolumn.

j (b)

output: input 4, weights 3

10

20

30

40

50

60

70

80

j

(C) Fig. 2. An input pattern (a) for a one-dimensional network of N = 81 units, consisting of a smooth curve with white noise added, and the steady states [(b) and (c)] produced from it by the two networks depicted in Fig. 1. A change in the spread of excitatory and inhibitory connections produces a marked change in the network’s encoding of an input stimulus; pattern 2, with its wider spread of connections, enhances lower spatial frequencies than does pattern 3.

eigenvectors by the two-dimensional discrete Fourier transform [in analogy with (3.3)]; and the equilibrium pattern can be computed by dividing the coefficients in this expression by the numbers 1 - h and applying the inverse Fourier transform. The units are now arranged on a two-dimensional grid, and we give them double labels to indicate their positions; thus unit ( j , k ) is in thejth row and the kth column of the grid, f o r j = 0, 1, 2 , . . . , N - 1 and k = 0, 1, 2 , . . . , M - 1. For convenience in imposing spatial homogeneity we suppose that for each j , the unit ( j , M 1 ) at the end of row j is adjacent to ( j , 0) and for each k , unit ( N - 1, k ) at the bottom of column k is adjacent to ( 0 , k ) . This amounts to arranging the units on a torus. The synaptic weights in the general, inhomogeneous cases are now quantities with four indexes: Wj,k,I,m represents the synaptic strength from unit (I, m ) to unit ( j , k ) . But the assumption of homogeneity is that this number depends only on the numbers I - j and m - k , which again are to be interpreted modulo N or M as being numbers of

237

KELLY. STABILITY IN NEURAL NETWORKS

steps f r o m j to 1 or from k to m. That is, we assume that tions in such data, of certain spatial frequencies that arc there are numbers wry ( r = 0 , 1, 2, . . . , N - 1 , ~ = 0 , not predicted by anatomical considerations alone. These findings suggest features of network behavior such as the 1, 2, . . . , M - 1 ) such that existence of excitatory-inhibitory processes like those of ( 3 . 6 ) the models considered here. The results obtained above y.h.Jfr.h+J The vectors x ( I ) of potential levels and p of inputs are in this section make it easy to construct stable encoding now vectors in R N whose entries correspond to units models whose steady-state responses will show enhanceon a grid, and it is convenient to represent them as two- ment of desired frequencies. Here we present an example of how such models can be constructed; the process esdimensional arrays or patterns; thus sentially inverts the prescriptions given in the previous * XO,N-l(t) two sections, and is similar to deconvolution in signal xOO(t) analysis. x(t) = ... and Fig. 3 (a) shows the spatial frequencies present in one direction in a narrow strip within the upper layers of a single “patch” in the autoradiographic (2-deoxyglucose) . . . PO,N-I labeling pattern evoked by repeated electrocutaneous stimulation to the forelimb in cat. These data were made available to the author by Dr. B. L. Whitsel; see 1231. Each bar in this average periodogram represents the relThe matrix W is now an MN X MN matrix, whose ei- ative power in a frequency range of width 2.5 cycles per genvectors are vectors of length MN which, like x ( t ) and millimeter. Thus, for example, the fourteenth bar shows p , can best be viewed as two-dimensional patterns (the the power at frequencies between 33.75 and 36.25 cycles “characteristic patterns” or “characteristic states” of the per millimeter, that is, at periods between 27.59 and 29.63 system). It can be shown, in analogy with Proposition 1 Pm. The relatively high power at frequencies corresponding above, that for each pair (j,k ) the array x J k whose ( r , s ) to the twelfth through the seventeenth periodogram ordientry is nates, in both metabolic and anatomic (Nissl) labeling, X ~ k= , + 2 r i ( J r / M + k s / N ) (3.7) has been interpreted 1171, 1231 as showing the presence rs of minicolumns within the patch; likewise the power in is an eigenvector, and the associated eigenvalue is the fourth through the ninth ordinates corresponds to peM-I N-l riods of two minicolumns’ extent, and has been interpreted as consistent with the operation of an opponent process involving lateral inhibition. Moreover, to represent p as a linear combination of the The narrow strip analyzed comprised approximately 15 x J kwith coefficients T j k , that is, to find XJk such that minicolumns. It is therefore of interest to know what symM - l N-I metric synaptic-weight pattern { w, } in a spatially-homogeneous fifteen-unit model would produce a steadystate frequency structure of the kind indicated by the first few observed periodogram ordinates. To this end we asone uses sume that the input to the model has a flat periodogram - M-l N-l with constant power c at all frequencies (this is consistent with the observed pattern in the input layer), and that the Equation (3.10) is the definition of the discrete two- steady state has a symmetric periodogram I aJ I ( j = 0, 1, * , 14) such that ( a 2 I 2 ., * , ( a , ( *have dimensional Fourier transform, and (3.9) is two-dimensional discrete Fourier inversion. Thus the steady-state X the same relative heights as the observed periodogram orcan be computed from the weights w,,~and the input pat- dinates in Fig. 3 (a). It follows from the results of Section tern values pr,- using a prescription like that given at the 111-A above that end of subsection above, but with ( 3 . 2 ) and (3.3) replaced by (3.8) anad (3.10), respectively, and with the ( j , k ) entry of the steady state pattern X computed as

1

1

-

~

M-I

-

N-I

C . Estimation of Model Parameters that Reproduce Observed Spatial Frequencies Recent work 1231 on the analysis of autoradiographic labeling data points to the existence, at various resolu-

where the A, are given by ( 3 . 2 ) . Because of the symmetry, all the numbers in the above equation are real and we can estimate the A; ( j = 1, 2, . * , 7 ) by

IF

=

-

d2

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING. VOL. 37. NO. 3. MARCH I Y Y O

238

We choose c equal to the average power per ordinate in the first seven ordinates of the observed periodogram. Finally, we use the inverse of (3.2) (deconvolution), namely

rium X, and that for any x o E R N , the sequence { x n } defined by x n f ' = G ( x " ) ( n = 0 , l , 2, * * . ) converges to X at an exponential rate:

( x nto estimate the synaptic weight pattern. Here N is 15 and A,, * , XI4 are taken equal to A,, * * . , X I respectively. It should be noted that we have no estimate of Xo from the data, and consequently that the j = 0 term in (3.12) is missing; however, this term is h o / N for all s, and so we still know the synaptic weights up to an additive constant. These weights are shown in Fig. 3 (b). They suggest that the low frequencies (with periods greater than the width of two minicolumns) present in the observed data are consistent with a homogeneous linear network model in which a given unit (minicolumn) has strong self-excitation, relatively strong inhibition of units at distance 4, and weak interactions as shown with nearer or more distant units. IV. NONLINEAR CONTRACTIVE SYSTEMS In the previous section we showed that in the spatially homogeneous linear case the global asymptotic stability of the model's encoding of the input stimuli could be ensured via the discrete Fourier transform of the model's synaptic weight pattern. Here we consider the general case, in which the model is neither spatially homogeneous nor linear, and we show a wide class of models (contractive models) in which such stable encoding nevertheless obtains. In addition, we investigate the relations between the equilibrium of the general system and that of a corresponding linear system; these can be of use in predicting the appearance of spatial frequencies in the output of nonlinear systems. A. Unique Stable Equilibria in Contractive Networks An equilibrium of the system (1.2) is a fixed point of the map G :R N R N defined by -+

G(x) = p

+ WF(x).

(4.1)

Because contractions, which are maps G for which there is a real constant CY < 1 such that IG(x') - G(xZ)l I( Y ~ x-' x ' )

for all x' and x 2 E R N

That is, the unique equilibrium of a contractive system can be approximated exponentially by iterating G from any starting value whatever. We shall see presently that this equilibrium must have all of R N as its domain of attraction, so that the neural network will converge to it regardless of the initial state. It is of interest to note the connection between the Euler approximations to the solution of (1.2) from a given initial state x o and the iteration of the map G starting from x o . It is not difficult to check that if G ( x ) = p WF(x) is a contraction, then for any step size At I p and any starting value x o , the Euler approximation to the system (1.2) converges as t 00 to its unique equilibrium, which is the fixed point of G. If the step size At equals p , the steps of the Euler approximation are exactly the iterates of G . Curiously, in the general (noncontractive) case it does not follow directly from the convergence to X of all Euler approximations with sufficiently small step size from arbitrary initial points, that the network itself converges to X. However, in the contractive case an application of Liapounov's direct method (see [12], p. 291.) provides this convergence, as we shall now show. If we write h for x - X, then (1.2) becomes

+

-+

h

1

= -

(-h

P

+ C(h))

where

C(h) = p - x

+ WF(X + h )

(4.4) is a contraction. It is easy to check that V ( h ) = 1 h 1 = h h is a global Liapounov function for the system (4.3) on R N ;that is, that V is a continuous real-valued function with V ( 0 ) = v ( 0 ) = 0 and V ( h ) > 0 and v ( h ) < 0 for all nonzero h . Here denotes - ( d / d t ) V ( h (t ) ) . The Liapounov Theorem then provides that h ( t ) > 0 for any h ( 0 ); that is, that x ( t ) X for any x ( O ) , as t m. -+

+

(4.2)

(where I I denotes the usual Euclidean ( L 2 )norm) have unique fixed points, it is natural to consider systems satisfying this assumption. We call the system (1.2) contractive if (4.2) holds for the map G defined by (4.1). In this subsection we note that the Contraction Mapping Theorem and Liapounov's Theorem guarantee that a contractive network has a unique globally asymptotically stable equilibrium which can be approximated closely by iteration. To be precise, it follows directly from the Contraction Mapping Theorem (see, for example, [3], p. 161) that if (1.2) is a contractive system, then it has a unique equilib-

(4.3)

B. Suficient Conditions for a System to be Contractive The norm of a matrix W is the nonnegative real number 1 W I defined by

IwI

= max-, x+o

I Wxl. 1x1

thus I Wx I I 1 W 1 I x I for all x E R N ,with equality for at least one nonzero x . If Wis a symmetric matrix, then 1 W I = max { 1 h I : h is an eigenvalue of W } , but in the absence of symmetry there is no simple general relationship between the norm and the eigenvalues of a matrix. Now suppose the derivatives of the functions J ( x ) are all bounded above by some number 0:that is, I f j ( x ) I

KELLY: STABILITY IN NEURAL NETWORKS

239

< /3forallxERandj= 1,2, , N . It then follows immediately from the Mean Value Theorem that for any vectors X I and x' in R N , *

*

e

provided

Ifj

/3 I W I < 1 where /3 is a bound on the derivatives

( x ) I . Therefore,

1

I G ( x ' ) - G ( x 2 ) / I WI ( F ( x ' ) - F ( x ' ) ( Ip

Iw(Ix'-x*1.

Consequently:

I f / 3 I W I < 1 , where /3 is a bound on the I f j ( x ) 1 , then G (x) = p + WF ( x ) is a contraction and therefore (1.2) has a unique equilibrium, which is globally asymptotically stable and can be approximated exponentially by iterating Gfrom any starting value. The simplicity of the condition /3 1 W 1 < 1 makes it an easy matter to check that a nonlinear network of the form (1.2) is contractive and hence globally asymptotically stable, and shows that the class of such networks allows considerable freedom in the choice of synaptic weight patterns and the sigmoids & .

C. Estimating the Equilibrium of a Nonlinear System Finally, we consider the possibility of using the equilibrium X of the linear system (2. I ) as an estimate of the equilibrium 7 of the nonlinear system. The purpose is not the approximate computation of the equilibrium of a nonlinear contractive system; this, after all, can be done efficiently by iterating G, because of the exponential convergence guaranteed by the Contraction Mapping Theorem. Rather it is of interest to know when the equilibrium of the nonlinear system (1.2) is well approximated by that of the linear system (2.1), so that it can be expected to exhibit similar characteristics, such as the spatial frequencies discussed in Section I11 above. In particular it would be of interest to know conditions, and ranges of input p , under which the spectral properties of the encoding p j are predictable from the eigenvalues of W and relatively unaffected by the nonlinearity introduced by the sigmoids f , . Indeed, as we shall see, it is unreasonable to expect that X and J will be near each other in that ( X- J I is small, unless the linear region of the functions 4 ( x ) is so large as to include both equilibria. However, simulations show that in some ranges of parameter settings, X and JJ can have very similar periodograms (that are different from the periodogram of the input p ) even though they are not near each other as vectors. Accordingly let X and J be the unique equilibria of the systems (2.1) and (1.2), respectively; that is, -+

X= p

+ WX

and j = p

+ WF(jj).

Spatial-Domain Approximation: If we are interested in WF( y ) bounding IX - JJ I we can iterate G( y ) = p starting from y o = X; it follows from the Contraction Mapping Theorem that

+

So we can bound 1 J - X I by making a small number of iterations of G starting from X and applying (4.5). For n = 1, for example, (4.5) simplifies to

This bound can be simplified further by making some regularity assumptions on the form of the sigmoids & ( x ). In particular, if

J ( 0 ) = 0, f j ( 0 )

=

I

1, and f y ( x ) l

Iy

(4.7)

for all j and x , where y is some positive constant, then it is not difficult to check that

and therefore that

Bounds like (4.6) and (4.8) reflect and quantify the simple observation that if the equilibrium of the linear system is within a region in which F ( x ) is well approximated by x , then the equilibrium of the nonlinear system will be nearby. This observation is of limited use, however, as the assumption is simply that the system is essentially linear for the range of input it receives. The problem of obtaining effective spatial-domain linear approximations to J in the case of a contractive nonlinear network remains open. Frequency-Domain Approximation: Simulation studies have shown that over large ranges of weight settings and input values, the spectrum of J resembles that of p much more closely than that of X, but that in some ranges (including but not limited to those in which X is in the linear region of F ) , the periodogram of J is more like that of X than that of p . Fig. 4 shows the results of one such simulation. In this case the components of X range well outside the linear region of F , and to a lesser extent so do those of JJ,and their magnitudes as vectors in R N ( N = 40) are markedly different from each other and from that of p . Yet their similarity to each other and their difference from p are clearly seen in the frequency domain. Other simulations reveal that for the same networkthat is, the same choice of W and F-and for others, this is not always the case; there are ranges of input vectors p for which JJ and X are dissimilar in both the spatial and frequency domains. Typically in such simulations the spectrum of 7 resembles that of p more closely than that of X. The questions of identifying the ranges of input over which X and 7 have similar power spectra, and of the ef-

240

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING. VOL. 37. NO. 3. MARCH 1990

weight p a t t e r n 1 ~ ~ " " ~ ' ' ~ " " " " " ' I

eigenvalues of p a t t e r n 1

-

k

S

(a)

-+.

ib)

input p a t t e r n 1

. , , , , , , , " " ' , " " ' ,

periodogram of input 1

N

9 c

'4 &J

8 9 70

4

8

12

16

20

24

28

32

36

40

j

wave number (4

(C)

linear network output .

.

.

.

pdgm of linear output .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

U)

x-N NI

(D

I

10

4

8

12

16

20

24

28

32

36

40

' 0

2

4

6

8

10

12

wave number

j

(e)

(f)

Fig. 4. (a), (b) A spatially-homogeneous one-dimensional synaptic weight pattern for a network of 40 units, and the eigenvalues of the pattern. (c), (d) Input pattern p consisting of sinusoidal components plus white noise, and the periodogram of p , showing power at spatial frequencies having periods around 10, 4 , and 2 units. (e)-(h) Steady-state outputs X, of the linear and nonlinear networks X = -x p Wx and y = -y + p W F ( y ) , and their periodograms. Here the sigmoidsJ; used for F are all the same: J; (x, ) = c ( 1 / 2 + 1 / T arctan ax, ) , with c and a chosen so t h a t f ' ( 0 ) = 1 and z, = f ( x , ) saturates at z, = 0 and 3. This sigmoid is approximately linear in the range - 1 5 x, 5 1. The linear and nonlinear outputs X and j are markedly different in magnitude from each other and from the input p ; yet their power spectra are closer to each other's than to that of p .

+

+ +

14

16

18

20

KELLY: STABILITY I N NEURAL NETWORKS

24 1

nonlinear network output

x

pdgm of nonlinear output

p. I

K

9 v,

N

-I

&: a .

s;

8 2 ”

:[

I O

,

, 4

,

, 8

,

,

12

. 16

,

.

,

20

. 24

,

. 28

,

. , . , . ~ 32 36 40

? I O

,

, 2

~

4

6

,

I

,

8

10

12

14

16

78

20

wave number

j

(9)

(h) Fig. 4. (Continued.)

fects of contractiveness on this behavior, await further research. V . DISCUSSION In this paper we have noted that a certain class of neural network models, the contractive ones, are globally asymptotically stable, and thus act as stable encoders of input signals. We use the contraction mapping principle, as well as Liapounov functions, to show the stability. This paper thus parallels that of Cohen and Grossberg [7], which obtains a similar result for a different class of networks. In the Cohen-Grossberg class the synaptic weights are assumed to be negative and symmetric; in the contractive class these assumptions are unnecessary, but are replaced by bounds on the norm of the synaptic-weight matrix and on the derivatives of the sigmoid functions. The particular form of the Liapounov function for networks like the Cohen-Grossberg class leads to the observation of Hopfield and Tank [14]-[16], [22] that neural nets can (by appropriate passage to the “high-gain’’ limit) be used to find local minima of the objective functions of various classical optimization problems. The Liapounov function applicable to contractive neural nets is not of this form, but instead is simply the squared Euclidean distance from the state of the system to the equilibrium state. Thus, it is conceivable that contractive nets could be regarded as optimizers for least-squares problems, given appropriate formulations, although accomplishing such formulation awaits further work. In this paper we have also considered another assumption on synaptic weights-namely , spatial homogeneitywhich resembles, but in fact is neither stronger nor weaker than, the assumption of symmetry used by Cohen and Grossberg, Hopfield and Tank, and others. With spatial homogeneity the linear version of the model becomes especially tractable in the spatial-frequency domain. The well-known theory of digital filtering can be used to construct patterns of synaptic weights that are consistent with the spatial frequencies observed in autoradiographic-la-

beling studies. It remains an open question whether the spectral behavior of spatially-homogeneous nonlinear models can be successfully analyzed or approximated if contractiveness is assumed. Simulations like the one reported in Section IV-C show promise of success for some ranges of inputs that are nontrivial (in the sense that input and output are not confined to the region in which the sigmoid is approximately linear). ACKNOWLEDGMENT The author is greatly indebted to Dr. B. Whitsel and his co-workers, especially Dr. 0. Favorov and M. Tommerdahl, for generously providing data for analysis, and also for numerous illuminating and helpful conversations.

REFERENCES [ I ] S. Amari, “Field theory of self-organizing neural nets,” I€€€ Trans. Sysr., Man, Cybern., vol. SMC-13, pp. 741-748, 1983. 121 S. Amari and M. A. Arbib, Eds., “Competition and cooperation in neural nets,” in Lecture Notes in Biomathematics. Berlin: Springer, 1982. [3] R . G. Bartle, The Elements ofReal Analysis, 2nd Ed. New York: Wiley, 1976. 141 P. Bloomfield, Fourier Analysis of Time Series: An Introduction. New York: Wiley, 1976. [5] G. A. Carpenter and S . Grossberg, “Neural dynamics of category learning and recognition: Attention, memory consolidation, and arnnesia,” in Brain Structure, Learning, and Memory, J. Davis, R. Newburgh, and E. Wegman, Eds. AAAS Symposium Series, 1986. [6] P. S. Churchland and T. J. Sejnowski, “Perspectives on cognitive neuroscience,” Science 242, pp. 741-745, Nov. 4 1988. [7] M. A. Cohen and S . Grossberg, “Absolute stability of global pattern formation and parallel memory storage by competitive neural networks,” I€€E Trans. Syst., Man, Cybern., vol. SMC-13, pp. 815826, 1983. [8] F. Crick, “The recent excitement about neural networks,” Nature, vol. 337, pp. 129-132, Jan. 12 1989. 191 J. A. Feldman and D. H. Ballard, “Connectionist models and their properties,” Cognitive Sci., vol. 6, pp. 205-254, 1982. [IO] J . L. Feldman and J . D. Cowan, “Large-scale activity in neural nets, I and 11,” Biol. Cybern., vol. 17, pp. 29-51, 1975. [ I I ] S . Grossberg, The Adaptive Brain, vol. I and II. Amsterdam: ElsevieriNorth Holland, 1986. (121 J. K . Hale, Ordinary Diferential Equarions. New York: Wiley, 1969. [ 131 A. V . Holden, “Models of the stochastic activity of neurones.” Leclure Notes in Biomathematics, vol. 12. Berlin: Springer, 1976.

242

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING. VOL. 31, NO. 3. MARCH 1990

[14] J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” in Proc. Natl. Acad. Sci., 1982, pp. 2554-2558. [15] J. J. Hopfield and D. W. Tank, “Neurons with graded response have collective computational properties like those of two-state neuorons,” in Proc. Natl. Acad. Sci., 1984, pp. 3088-3092. [16] J . J. Hopfield and D. W. Tank, “Neural computation of decisions in optimization problems,” Biol. Cybern., vol. 52, pp. 141-152, 1985. [17] S. L. Juliano, P. J. Hand, and B. L. Whitsel, “Patterns of increased metabolic activity in somatosensory cortex of monkeys Macaca fascicularis, subjected to controlled cutaneous stimulation: A 2-deoxyglucose study,” J . Neurophysiol., vol. 46, pp. 1260-1284, 1981. [ l e ] R. P. Lippman, “An introduction to computing with neural nets,” IEEE ASSP Mag., Apr. 1987. [I91 T . J. Sejnowski and C. R. Rosenberg, “Learning and representation in connectionist models,” in Perspectives in Memory Research, M. S . Gazzaniga, Ed. Cambridge, MA: M.I.T. 1988. [20] T. J . Sejnowski, C. Koch, and P. S. Churchland, “Computational neuroscience,” Science, vol. 241, pp. 1299-1306, 1988. [21] J. Szentagothai, “The neurone network of the cerebral cortex: A functional interpretation,” in Proc. Roy. Soc. Land., vol. B201, pp. 219-249, 1978. [22] D. W. Tank and J. J. Hopfield, “Simple neural optimization networks: An A/D converter, signal decision circuit, and a linear programming circuit,” IEEE Trans. Circ. Syst., vol. CAS-33, pp. 533541, 1986. [23] B. L. Whitsel, 0. Favorov, M. Tommerdahl, M. Diamond, S. Juliano, and D. Kelly, “Dynamic processes govern the somatosensory cortical response to natural stimulation,” Organizational Principles

of Sensory Processing: The Brain’s View of the World, J. S . Lund, Ed. New York; Oxford Univ., 1987. [24] B. L. Whitsel and D. G. Kelly, “Knowledge acquisition (learning) by the somatosensory cortex,” in Brain Structure, Learning, and Memory, J . Davis, R. Newburgh, and E. Wegman, Eds. AAAS Neuroscience Symposium, 1986. [25] H. R. Wilson and J. D. Cowan, “A mathematical theory of the functional dynamics of cortical and thalamic nervous tissue,” Kybernetik, vol. 13, pp. 55-80, 1973.

Douglas G. Kelly received the Ph.D. degree in mathematics from Indiana University, Bloomington. He has a joint appointment as Professor in the Departments of Mathematics and Statistics of the University of North Carolina, Chapel Hill. He has been there, except for leaves of absence, since 1968. Prior to that he spent a year at Jet Propulsion Laboratory, California Institute of Technology. He has worked for several years on mathematical modeling and statistical data analysis in connection with neurophysiological phenomena. His research interests also include probability and combinatorial mathematics, and in particular the applications of stochastic methods to problems in combinatorics and optimization.

Stability in contractive nonlinear neural networks.

We consider models of the form mu chi = -x + p + WF(x) where x = x(t) is a vector whose entries represent the electrical activities in the units of a ...
1023KB Sizes 0 Downloads 0 Views