BioSystems, 27 (1992) 97-113

97

Elsevier Scientific Publishers Ireland Ltd.

Enzymes as molecular automata: a reflection on some numerical and philosophical aspects of the hypothesis P e d r o C. Mariju~in a a n d J o h n W e s t l e y b aDept. Genetica Molecular, Investigation y Desarrollo CSIC, 08034 Barcelona (Spain) and bDepartment of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL 60637 (USA) (Received December 2nd, 1991) (Revision received March 2nd, 1992)

Enzymes, by means of their properties of specific recognition and allosteric modulation, are able to integrate many separate processes into systemic units with coherent functions; in a sense, they have to be considered as the true organizers of the cytoplasmic processes. In this respect, the present article describes a simple model, based on binary variables and automata theory, which simulates the basic regulatory performance of the modulated enzyme. The model admits a variety of modifications and improvements; it also suggests some original lines of thought on which to reflect about the organization and collective phenomena of the networks of enzymes. In discussing the connection of this 'molecular automata' hypothesis with other areas of present-day theoretical biology, a fertile panorama of initiatives appear. A special partnership between Information Science (computation) and Biology is developing.

Keywords: Molecular automata; Enzyme simulation; Boolean networks.

1. Introduction

1.1 Introductory remarks Exploration of 'system effects', those aspects of kinetic behavior that are dictated by the formal structure of a reaction system, is a timehonored scientific enterprise. A substantial body of biochemical and mathematical literature has been built responding to such a purpose. Though the reflections have reached a high degree of mathematical sophistication, very often it is only the analytical side of the biochemical problem that receives attention; the possible systemic consequences for the organization of the cellular functions and also the relationship of the biochemical subjects with other speculative areas of theoretical biology are less emphatically developed (crossing borders between disciplines

Corre~ondence to: P.C. Mariju~n.

always acts as a call to extreme prudence). To a very limited extent, the present article tries both to explore the simplest numerical consequences of what can be called the 'molecular automata hypothesis' and to speculate about the nature of the collective functions emerging from massive enzyme networks responding to such dynamics. Starting with the consideration of a common building block, the steady-state kinetics of individual bisubstrate enzymes, the characteristic behaviors of different formal mechanisms can in this case be approached analytically (e.g., at the level of initial velocity patterns) and a variety of useful distinctions can be made. With more complex systems, however, or when the behavior to be observed varies non-linearly over the experimental time, numerical approaches to direct simulation of the behavior have been more practical. For instance, Franco and Canela (1984) have provided a program for microcomputers

0303-2647/92/$05.00 © 1992 Elsevier Scientific Publishers Ireland Ltd. Printed and Published in Ireland

98

that proceeds by numerical integration to trace the detailed time courses of all concentrations in an unregulated one-substrate, one-intermediate enzyme mechanism. When the system becomes more complicated than this, however, the numerical integration procedure rapidly becomes more difficult, requiring either larger, faster computers or serious restrictions on the values of rate constants (e.g., in the form of rapid-equilibrium assumptions). In this respect, the first part of the present article provides a simple illustration of the different approach to direct simulation of kinetic behavior in complex reaction systems that was recently recommended by Marijuan (1991) -- a stochastic approach utilizing 'molecular automata' to represent enzymes. The results of simulation by this approach have been tested against those generated by the numerical integration method (both employing the same microcomputer) for the simplest possible regulated enzymic reaction. At this level of the individual regulated enzyme, the automatabased model has shown promise for developing a simulation comparable to that obtained by the more familiar continuous approach. The way to relatively simple simulation of the behavior of metabolism-like networks of such elements therefore appears to be open. More speculatively, the second part of the article deals with the theoretical implications that this hypothesis may have in relation to the role(s) that enzyme networks play in the organization of the cell. Among the variety of quantitative and qualitative factors that intervene in the dynamics of enzyme networks, the relationship with gene expression deserves special consideration. Recent efforts in very different areas close to theoretical biology (boolean functions, computational metabolism, artificial life, molecular computing) are opening a genuine new dimension for this kind of study. 1.2 The continuous approach It is widely appreciated in enzyme kinetics that every reaction system (formal mechanism) uniquely specifies a particular pattern of veloci-

ty behavior. This pattern is accessible through a set of differential equations, each of which specifies, in terms of rate constants and concentrations, the rates of change in the concentration of one of the chemical species in the system. Taken together with appropriate conservation equations, such a set of differential equations can be integrated to achieve a mathematical model that faithfully reproduces the pattern of behavior (variations in all concentrations over time) of the reaction system. It has also been recognized that the reciprocal relationship is not true: an observed pattern of behavior does not uniquely specify a corresponding formal mechanism. Enzyme kineticists have long been aware of this, as questions of the uniqueness of formal mechanism assignment must arise (e.g., Volini and Westley, 1966; Westley, 1969). Rosen (1979, 1985a) has commented at some depth on this situation in the context of activation and inhibition: even though every formal mechanism, via its corresponding differential equations, implies one particular pattern of activation or inhibition, that pattern cannot be thought to imply uniquely that formal mechanism. Quite generally, observation of a particular pattern of kinetic behavior usually can be applied to eliminate some formal mechanisms from consideration, but there always remain alternative possibilities. These must be evaluated on some other basis (often, in enzymic catalysis, some direct chemical evidence such as the isolation of a kinetically competent covalent intermediate; or simply Occam's razor). This exercise is an instructive one, with the lesson (inter alia) that any behavior pattern, which is what is significant biologically, may always be produced by alternative means. The problem when we approach the simulation through integration of a set of differential equations is, ultimately, its cumbersomeness. As the system considered grows more complex, taking into account an extended reaction scheme in homogeneous solution (Kacser, 1983) and then going on to consider enzyme interactions with other enzymes as well as with membranes and the cytoskeleton (Srere, 1987; Welch, 1987; see

99 also the debate in TIBS -- Crabtree and Newsholme, 1987), large, fast, sophisticated computation facilities are required. In these circumstances, the simulation tends to lose its intuitive connection with the reaction system and the realization of the purpose of the exercise -to gain increased insight into the causes of the behavior pattern -- suffers. 1.3 Exploring an alternative possibility When we undertook to apply the Franco and Canela (1984) numerical integration program to the simplest conceivable regulated enzyme mechanism, a one-substrate, one-product form with a simple activator (see below), we found that microcomputer limitations forced us to assign rapid equilibrium between the enzymesubstrate and enzyme-product complexes. With this condition fulfilled, we were able to achieve a very nice simulation that traced the time courses of the concentrations of substrate, product and the enzyme forms (but only the sum of the concentrations of the rapid equilibrium complexes). Accepting this case as a kind of upper limit situation for the continuous approach on simple microcomputers, we further undertook to determine whether a less constrained simulation of this same system might be carried out with quite a different approach, as suggested previously (Mariju~in, 1991), that of enzymes as molecular automata. The idea can also be found in previous literature. M. Sugita (1961, 1963) pioneered these views, together with R. Rosen (1967, 1979). After them, diverse authors have been following logical and electric circuits criteria to study chemical reactions and enzyme networking (mainly Okamoto et al., 1980, 1983, 1989). In the field of molecular computing, M. Conrad (1972, 1985) has also discussed similar views. In the relationship between enzyme networks and parallel processing-neural networks, R. Bray (1990) must be cited. However, consideration of a full set of reactions in comparison with the corresponding system of equations and elucidation of the kinetic behavior of the system is not always the

main goal of these approaches. In this respect, though our application of the molecular automata hypothesis can be considered less sophisticated mathematically than most of the previous ones, its simplicity and directness may constitute a certain advantage. Pedagogically, it provides a visual picture that helps to capture very easily the individual and collective dynamics of enzymes; and the aspects of stochasticity that it contains may find application in some particular enzyme kinetics problems. Moreover, automata theory is a classical theoretical tool that was central for the development of computer science and that has also had relevant applications in machine theory; its elegance and simplicity in 'constructivist' applications is remarkable. It would not be surprising if some of its interpretations were also relevant for enzymology. There have been many enzyme kinetics findings in the past decades (as Sols, 1981 comments: 'Allosteric regulation of enzyme activity is the central and more basic mechanism of physiological modulation of enzyme activity. Its discovery and conceptualization opened a third dimension in physiological enzymology, dramatically recognized by Monod with the confession: 'j'ai decouvert le deuxieme secret de la vie!'); but, very possibly, the theoretical elaboration of these findings has not yet reached a complete maturity. As we will discuss in the last section of the article, a complex interaction between different disciplines somehow related with the informational view of the cell is advancing; enzymes are occupying an important place. In this context, the possibilities of the molecular automata approach and the classical automata theory to contribute to capturing the informational-mechanical capabilities of enzymes and proteins may deserve a second thought. 1.4 The model system considered Just as an isomerase-catalyzed reaction with a single substrate and a single product is ordinarily considered to be the simplest realistic formal enzyme mechanism, so also might the form given in Fig. 1 be considered the simplest

100

b~

Einact

- (k-2 + k+4) (!Ea) +(Ea*))2

/

"

da*dt

=

Fig. 1. Formal mechanism of an isomerase regulated by the activator b. a and a*, the substrate and product; (Ea) and (Ea*), the corresponding complexes with the activated enzyme.

regulated isomerase mechanism. Inactive enzyme is converted to active enzyme by binding b reversibly. Active enzyme can form an enzyme-substrate complex with a or an enzymeproduct complex with a*. These complexes can either interconvert or discharge a or a*. Net directionality around the cycle is determined by the relative concentrations of a and a* and the values of the rate constants in reactions 2 - 4 . The velocity of net isomerization ( a - - - a * or a * - - - a ) is determined by these same values and also by the concentration of b and the values of the rate constants for the association and dissociation directions of reaction 1. The following set of differential equations constitutes a mathematical description of this system that can be used by a simple computer to display its behavior when a and b are maintained at steady state concentrations and a* is allowed to accumulate: dEinact

dt

- k _ l E a c t - k+lbEinact

dEact - (k_2 + k+4) ( ( E a ) dt + k+lbEinac t - (k+2a +

d[(Ea) + (Ea*)] dt

= (k+2a +

+2 (Ea*)) k_4a*

+ k_l)Eact

k_aa )Eact

- k+4 ((Ea) .+ (Ea)) 2

- k-4aE

The Franco and Canela program, running in an Apple IIe microcomputer, can produce a suitable simulation of the behavior of this model when the complexes (Ea) and (Ea*) are considered collectively, each at a concentration equal to half of the total (i.e., when these species are considered to be in rapid equilibrium undisturbed by the fluxes involving a and a*). Figure 2 displays such a simulation, which was used as a standard for the other simulation, based on automata.

II. The molecular automata hypothesis: a numerical approach

2.1. The diagram and the logical table of the automaton An automaton is a computing entity which can adopt a variety of internal states, shifting from state to state depending on the values of the variables of state, and developing in each state a particular function or computation. Mathematically, it can be expressed by means of functions of transition of state and logical tables, or by formal languages and generational grammars (Sampson, 1976). In this case, we have to transform the previous system of equations into an equivalent automaton whose changes of state and variables of command must reproduce a dynamics similar to that of the equations of the system. The problem, as R. Rosen (1979) points out, is a solvable one. In general, one can create an automaton regulated by activation and inhibition rules which can parallel the behavior of any system of rate equations. Even more, as this author states, there is a sense in which automata formulations of activation-inhibition patterns have more generality than rate equations: very often in biological systems we are in a position to infer

101

1 O0

80

o o

60

.."

x

ca

j,,,"

40



20



B

Bm

|,, m 0

0

!

I

10

20

'

30

TIME Fig. 2. Time courses of a* accumulation in the enzyme mechanism shown in Fig. 2 with a at unit steady-state concentration in the presence of twice the unit concentration of b (upper line) or half the unit concentration of b (lower line). Both curves approach 100 on the vertical scale at equilibrium. Superimposed segments of automata-simulated behavior are also shown on both curves (11.5-16.5 time units on the upper curve and 8.5-11.5 time units on the lower curve).

patterns of activation and inhibition between observable quantities, but not directly to write down a system of rate equations. The solution envisioned in this approach looks both for a resemblance to the biochemical transformations which occur to an individual molecule of enzyme and for a parallel with the changes indicated by the equations of the system. Concretely, four states have' been considered in the automata model: I, A, X, T. The state I represents the inactive conformation of the enzyme, the state A the activated conformation, X the enzyme-substrate complex and T the enzyme-product complex. They are interrelated

by the series of transitions of state shown in Fig. 3. The variables of state are a, b, a*, k_], k_2, k+3, k_3, k+4. They all are binary. In the case of a, b and a* (substrate, activator and product, they include the constants k+2, k+l, k_4) the 1 value represents literally the impact and binding process of the respective substance upon the enzyme and the value 0 means the absence of binding during the time considered. The k_l, k_2 and k+4 represent the spontaneous processes of dissociation (first order reactions); the k+l, k+2 and k_4 refer to the reactions that go in the opposite direction (second order reactions); the k+3, k_3 are isomerization reactions. In all the

102

b(0)

a(0) a*(O)

k.2(O)

k_l(0)

k+3(O)

a(1)

~

r t --k_l(1 )

f

-k-2(1)

a

b

l*(1)

k+3(1) k_3(1

a*

k+4(1)

k.3(0) k+4(0) Fig. 3. Automata-based model of the regulated isomerase reaction illustrated in Figs. 1 and 2. I, inactive enzyme; A, active enzyme; X, complex of active enzyme with substrate a; T, complex of active enzyme with product a*.

cases, the 1 and 0 values represent the occurrence or non-occurrence of the phenomenon. With respect to the transitions between states, there appear the following possibilities: • In I, the inactivated state of the enzyme, the possibilities are two. The molecule can remain in the same state ('loop' case, b = 0), or can go to A (activated state, b = 1). • In A, the active state, there are four options. Returning to I (inactive state, a = 0, b = 0, a * = 0, k_l = 1). Going to T (enzymeproduct complex, a = 0, a* = 1, k_l = 0). Going to X (enzyme-substrate complex, a= 1, a * = 0, k _ l = 0). And finally, remaining in A (loop, a = 0, b = 0, a* = 0, k_l = 0). • In X, the enzyme-substrate complex state,

the options are three. Returning to A (active state, k_2 = 1, k+3 = 0, output a = 1). Going to T (enzyme-product complex, k_2 = 0, k+a = 1). And remaining in X (loop, k_2 = 0, k+3 = 0). • In T, the enzyme-product complex state, there are three options. Returning to X (enzyme-substrate complex, k_3 = 1, k÷4 = 0). Going to A (active state, k_3 -- 0, k÷4 = 1, output a* -- 1). And remaining in T (loop, k_~ -- 0, k+4 = 0). All these cases are summarized in the 'logical table' of Table 1. The solidus sign,/, in some of the variables of state means that their particular values do not count for the transition considered. All the other variables intervening in the transition bear the corresponding 1 or 0. It

103 T a b l e 1. Logical table of t h e a u t o m a t o n States

Inputs

Outputs

Previous

Next

k+2a

k+lb

k_4a*

k_1

k2

k+3

k3

k+t

a

a*

I I A A A A X X X T T T

I A I A X T A T X A X T

/ / 0 0 1 0 / / / / / /

0 1 / / / / / / / / / /

/ / 0 0 0 1 / / / / / /

/ / 1 0 0 0 / / / / / /

/ / / / / / 1 0 0 / / /

/ / / / / / 0 1 0 / / /

/ / / / / / / / / 0 1 0

/ / / / / / / / / 1 0 0

0 0 0 0 0 0 / 0 0 0 0 0

0 0 0 0 0 0 0 0 0 1 0 0

has to be remarked that in the transitions from X to A and from T to A there appear respectively the outputs a and a*. They imply the net increase of the concentration of a and a* by one unit -- for a net balance we have to take into account simultaneously the corresponding inputs of the table. In all the other transitions the value of the outputs is 0. In the arrangement of the table, it is also remarkable that the intersection of the three first columns and the three first rows contains the whole of the external command variables of the automaton. Somehow they constitute the initial conditions of this system and they can be controlled and manipulated either experimentally or by the cell. The other ki variables are out of immediate control and they correspond to the internal properties of the system; however, we can see in the table that it is these internal variables which give the outputs a and a*; certainly they constitute the hidden part that contains the 'internal wisdom' of the system. With the development of all the transitions, outputs and whole cycles performed by the automaton under the influence of both its internal and external variables, a set of final conditions is achieved. Computationally, it represents the answer that the system gives to the initial conditions, the 'solution' produced by the

automaton to the biochemical problem introduced by the initial variables.

2.2 Probabilities of transition In a simplified view of the action of the automaton, we can say that several trains of binary inputs, with a density of 0 and 1 proportional to the concentration of each substance, are continuously entering into the automaton (substrate plus modulator plus internal variables) and that the output is also a train of binary impulses (the product). What we are introducing now as probabilities of transitions is just the density or relative proportions of 0 and 1 that appear in each stochastic variable of the logical table; the efficacious 'impacts' per unit of time that each substance is causing upon the enzyme. For the calculation of these probabilities, the fundamental parameters are the rate constants of the reactions ( k + i , k_i) and the concentrations of the substances; also there are two additional parameters that have to be determined beforehand, At and Aa*. At is the unit of time, the duration attributed to each transition of the automata. In principle, all the transitions have the same theoretical value At; afterwards, the corresponding probability is what gives the average real time invested in each transition.

104

~a* is the unitary increment or decrement of product concentration which is associated with each net cycle performed by the automata, either direct or in reverse; its value is the concentration S of the enzyme multiplied by an arbitrary scaling factor. Both At and Aa* can be adjusted in advance, but they must respect some conditions that we will specify below. Each probability of transition has to be obtained from the system of equations by the integration of one of the partial terms in the corresponding equation of the system (each equation corresponds to one of the states). For example, in the state A the equation has six terms. Three with the sign - (outputs) and three with the sign + (inputs). The equation has to be integrated with respect to time - between 0 and At - independently for all the three terms. On each occasion only one of the terms counts, all the other ones are nil. Thus, we obtain three expressions of the type: Pi - 1 - exp (-kiAt/Aa*) The expressions corresponding to the state A have two cases in which the concentration of an external variable intervenes (a and a*). The ki of the formula has to be multiplied, then, by the corresponding value of the concentration. In general, the conditions that these probabilities must accomplish (conditions that limit strongly the range of possible values for At and Aa*) are: • All the Pi must be _ 0. • The sum of the Pi for each state must b e < 1. • The 'loop' probability is the difference between 1 and the sum; as the other probabilities, it also has always to be _> 0. • The probabilities of the set must be big enough to allow for a meaningful dynamics of the automata (very low values would give almost no transitions). Another condition related to the value of At is that, in order to obtain a minimum of convergence in the iterations, it must be reasonably small in comparison with the values of the k÷i.

The product At times k÷i has to be ~ 1. Something similar happens with Aa*; it has to be reasonably smaller than the concentration of the enzyme. Therefore, the determination of At, Aa* and the corresponding sets of probabilities implies some preliminary steps. Diverse families of values can be tried until a reasonable set of probabilities is achieved. The search can be done rapidly because of the quotient ,~t/,~a* that appears in the expressions of the probabilities; if both values are modified in the same proportion, the probabilities remain constant. After an appropriate initial set of probabilities has been established, the probabilities related to some concentrations will change depending on the cycles completed by the automaton in one or another sense; they must be recalculated systematically. The modification of the probabilities of transition as a function of the actions performed by the automaton is one of the aspects that gives to this stochastic model and in general to the action of the enzyme, its flavor of 'intelligent machine'. But perhaps the crucial aspect of the enzyme action is the influence of the modulator b in the development of the cycle; it determines directly the proportion of time that the automata will have to remain in an active or inactive state. And by the sharing of substrates, products and modulators between different enzymes of different circuits, it makes possible an integrated collective functioning. The relative presence or absence of the regulatory substances 'informs' each enzyme, each molecular automaton, about the results of the operations performed by the other automata. The reversibility of the operations in many metabolic enzymes and also in our model, is an additional trait that contributes to the flexibility of the functions performed by each enzyme and by the whole metabolic network. Let us comment, finally, that the introduction of probabilities of transition in our automata model as an auxiliary concept to produce the strings of binary inputs and outputs has a clear phenomenological basis. The expression we have found for the probabilities is congruent with the common integrated expression [1 - exp (-kt)]

105

for product formation in any (pseudo-) first order reaction; moreover, it echoes the Boltzmann probability formula for the formation of a product from its components (Fox, 1988). Based precisely on these probabilities of transition, we will introduce later the notion of 'informational entropy'.

2.3 Numerical simulation As was said in the Introduction (1.3), the automata model has been tested in relation to the performance of a conventional procedure (Franco and Canela, 1984). Some of the assumptions adopted for the latter are unnecessary for the automata 1 (in particular, forcing the values of k+~ and k_a to be at least 10 times longer than the other ki has been clearly unfavorable for establishing the set of probabilities of transition). For simplicity, in the simulation presented we have held both a and b in the steady state, as they might be physiologically (in the short term), allowing only a* to accumulate. The values of the variables in both models are: a -- 1 (constant); b = 2 (constant); a* = 0 (initial value) k+l = 2;k_1 = 4 k+2 = 10; k_e = 10 k+~ = 100; k_a = 100 k+4 = 6;k_4 = 6 S = 0.1 (concentration of the enzyme) a t = 0.0001; Aa* = 0.001 The automata simulation 2 starts by asking for the initial values of the constants and concentrations and also for the parameters a t and ~a*. With these values a set of probabilities of transition is calculated. The program tests whether

lWe are of course aware that some of the advantage of the automata approach in the present work derives from the use of a microcomputer. As the systems considered grow more complex, however, it is inevitable that computational limitations will become a factor no matter what equipment is used. 2This program in BASIC is available from the authors on request.

the probabilities respect the conditions listed above (2.2); also it shows on the screen the values of the probabilities and their partial accumulated sums. New values can be tested until a meaningful set of probabilities is achieved. Obviously, a trade-off between precision and time consumption has to be established in the values assigned to At and Aa*. The rest of the program is devoted to the automaton itself. It starts from an initial state in the automaton, which has to be chosen, and immediately generates a random number (four digits) to determine the transition of state that will be performed by the automaton. The random number will be placed, depending on its value, in one or another region of the partial sum of probabilities mentioned before; it will imply the selection of the corresponding variable of state and the associated transition. By this simple means the program generates a dynamics that is similar to a binary string of random l's and O's in a proportion determined by the probability of transition (but there is little doubt that this procedure may be simplified by using a more sophisticated software). The concentration of a* changes in the transitions from A to T (input a*) and from T to A (output a*); in both cases the program has to recalculate the corresponding probabilities of A (only Pa* and Ploop change, Pk-1 and Pa do not change in A). Regularly, the program shows on the screen, and lists on the printer, the successive transitions of state and the accumulated value of a*; and finally, it lists the whole set of parameters and variables of the system, including the initial and final distribution of probabilities. With respect to the results obtained, it must be said that the automaton behaves very appropriately in most of the regions of the curve shown in Fig. 2 (the difference is practically indistinguishable at that scale). The accumulation of a* follows nicely the values of the reference model in all the cases. In many of the simulations performed, the difference was less than 4 or 5 cycles (in advance or delay); this seems remarkable after some tens of thousands of transitions and some hundreds of net cycles performed.

106

50

[] BH

40 umm imO

O c~ o T"

m%"

30

m •

X

Feg

2O

.f OlD

10

e~m m

/-I



oo*," mm

I

o .:--"" 0.0

0.1

0.2

0.3

0.4

0.5

TIME Fig. 4. Simulation during the transient phase. [i], numerical integration model; • AUTOMATA model. Note the fiftyfold difference in scales relative to Fig. 2.

But in the starting part of the curve there appears a systematic difference. It appears to be caused by two simplifications. The first one is due to the assumptions necessary to run the reference model (the assumption of rapid equilibrium between X and T). And the second one is in the automaton: to obliterate the cumbersome effects of the 'transient phase' (which otherwise would necessitate a systematic recalculation of all the probabilities of transition in all the states in response to the changes of concentration of the species of the enzyme). For the sake of a compact nucleus of the automata, the option chosen has been to simplify this phase and to adopt a steady-state dynamics; besides, this problem affects only a short period of time, within the first time unit in the example (see Fig. 4) and produces a slight bias between 5 and

15 cycles, first in advance and afterwards in retard. In the middle part of the curve the approaches are remarkably good; in many occasions the discrepancy ranges below the 5 cycles, rarely with larger deviations which recuperate soon. However, the last part of the curve entering the equilibrium area -- where the probabilities Pa and Pa* for the transitions of A are closer -shows some cases of relative 'bursts' of overproduction or underproduction. It is possible to accumulate differences of 10 - 20 cycles during a certain interval of time; but in the long term all the discrepancies between the automata and the reference model disappear. In general, we have recognized a certain sensitivity in the amplitude of the bursts and random oscillations with respect to the value of the modulator b. Certain

107

values of this variable seem to produce bigger and more frequent oscillations. We are in doubt whether these slight differences are only an artifact of the model or have to be considered of biochemical significance. Perhaps, with certain concentrations of activator the pace of the reaction is more susceptible to irregularities.

2.4 Other possibilities of the model Though the exploration we have done of the numerical performance of this automata model is just a brief initial one, in our opinion it justifies attempts with more ambitious theoretical and experimental goals. The third part of this article will deal with some of the theoretical possibilities to which this hypothesis is related -- its conceptual background in the informational-biological field. Now, from a more pragmatic point of view, we would like to suggest some areas of enzyme kinetics where this stochastic approach can be introduced, perhaps yielding interesting results. All of these applications have a certain informational or computational flavor. As we will discuss in the third part of the article, perhaps the main virtue of the molecular automata hypothesis, apart from its numerical possibilities in complementing the conventional procedures, is that it presents the network of cytoplasmic processes in a different perspective, the informational one. Enzyme networks. This is the main direct application of the present model. Connecting systems of equations is awkward; connecting automata tables seems easier. Metabolic networks, and in general enzymic networks, would be the natural field for testing the possibilities of the automata (see Okamoto et al., 1983, 1989, 1990; Bray, 1990). As far as whole metabolic networks of the cell could be put into a parallel supercomputer -- probably expressed in binary and stochastic terms similar to ours -- a new way to explore the characteristics of the collective functions of the cell would be open. Among the topics that can be influenced by these views are: the energetics of the cell, the stability and flexibility of the circuits, network modification, the relationships with gene expression, cellular differentiation, etc.

Channels and transport proteins. Multimodulation. 'Memory-enzymes'. The automata expressions are easily modifiable and can be accommodated to formulate empirically the dynamics of different enzymes and proteins whose equations are too complex to be established rigorously, as usually happens with the dynamics of channels and transport proteins. In multimodulation, enzymes of an 'impossible' character (like phosphofructokinase, where more than 20 modulators have been proved, see Sols 1981) might receive a certain formalization by means of logical tables. In the case of enzymes with internal 'memory' (Jarabak and Westley, 1974; Stock, 1987), the states of the automata seem the most accurate way to express the correlation between internal conformations and external variables. Also, the informational entropy of the automaton can give a more realistic appreciation of how much information such memory enzymes contain (Stock, 1987).

Excess of enzyme sites relative to free ligands. In certain circumstances of the cell metabolism (particularly in the reactions of the mitochondria), different kinds of enzymes have to compete for scarce free ligands, less numerous than the concentration of active sites waiting for them. In such situations, t h e continuous approach has serious difficulties; the alternative stochastic approach seems better equipped to deal with it.

'House-keeping' functions and 'meta-logical' functions. These terms (the first one applied to enzyme functioning by H. Padh, (pers. commun.) refer to collective functions that appear, usually not in a well defined way, due to the general interconnection of phenomena that occurs between the networks. They correspond to necessities of the system that are beyond the competence of particular circuits and they have to be solved either by creating new enzymes and new circuits with wholistic functions ('house-keeping'), or by canalizing in an adequate way other properties of non-logical nature ('meta-logical', that are outside the range that the logical table shows). A good example of the first case is the role of ascorbic acid (Padh,

108 1990); to the second case correspond phenomena such as clustering, channeling, synergistic effects, membrane and cytoskeleton interactions, etc. The study of the collective properties that emerge from the networks of automata can help to establish a logical background to define and to contrast these diffuse functions. Informational entropy. Information and informational entropy are difficult concepts that have received many biased interpretations in biological systems (Brooks and Willey, 1986; see Rosen, 1985b for a critical view). Only as a suggestion, we want to point out that, related to the actions of the automaton, it seems we can define an entropy function taking account of the changes that appear in the probabilities of transition. The expression that may be more adequate is the sum of the partial entropy of each state (-~Pi" log 1/Pi) multiplied in each case by the corresponding probability of transition. What must count here is the difference between the entropies of the system in different times: how much net information has been processed by the automata. The expression has to obviate attribution of any change in informational entropy to an inactive enzyme, or to an equilibrium state; a determined change in informational entropy in a determined time has to correspond to the activities performed by an enzyme that effectively change the concentrations -- the probabilities -of transition. Other aspects related to the definition of this entropy function are its energetic equivalence and the differences introduced by changing the interconnection procedures between enzymes.

III. The automata hypothesis inside the framework of theoretical biology: suggestions for cell organization 3.1 Collective phenomena in the cellular system. Theoretical and experimental approaches The emergence of collective functions in the networks of enzymes is related with other theoretical and applied areas of research which are actively contributing to the renovation of present-day theoretical biology. They are: (1)

boolean networks and 'relational biology', (2) artificial biochemistry, molecular computing and metabolic engineering. We may say that they all share a pioneer character and an openly speculative style; this will allow us to introduce some other suggestions in relation with the automata hypothesis. 3.1.I Boolean networks have been objects of research in the cell especially by S.A. Kauffman (1974, 1985, 1987, 1991). He has explored the functional properties of different kinds of massive networks built from boolean functions which are interconnected in a random way. The most interesting collective properties emerge precisely in networks that are very close to the basic characteristics of enzymes. Either their individual functions must have 'canalizing' variables (such as inhibitors), or they must have only 2 or 3 inputs (the normal number of variables that enzymes recognize). Networks built from such boolean functions self-organize their dynamics in stable cyclic patterns, forming a determined number of big cycles of activation (approximately, x/n different ones, n being the total number of individual functions). The whole space of activations of the functions becomes, then, divided into something very similar to 'basins of attraction', separated by a certain distance of perturbations between them. Kauffman has interpreted the emergence of these robust statistical patterns of cyclic behavior as a model for the appearance of differentiated cell types in multicellulars (taking the whole genome as a network, and the genes as individual switches, with on-off states arranged following a boolean table). It is to be noted that, when arguing about the biological verisimilitude of the boolean gene-functions, he refers explicitly to the properties of enzymes and proteins as material bases for building and interconnecting such functions (Kauffman, 1987). Let us emphasize this last point. It seems that we can legitimately connect the Kauffman idea with our automata hypothesis. The suggestion we want to introduce is that, beyond the genes, the ideal system to check for the emergence of statistical properties of boolean-like networks

109

(Martland, 1989) seems to be the cytoplasmic network of enzymes, understanding them in a simplified way as molecular automata. There are important differences between the two kinds of functional domains, genes and enzymes, but this is precisely the most interesting aspect. For instance, we have seen that the automata contain a 'hidden part' which does not show up from the point of view of the external variables, but it is there, working for the system. Besides, in many cases the function of the automata is reversible. Also, the connections between the external variables can be randomized only relatively, because the biochemical rules of constructivity have always to be respected. And in this respect the relationship of substrates to products admits little randomness, if any (though a certain dose of randomness becomes necessary for the 'invention' of new pathways); the other connections of modulation are less rigid and they are open to the wildest experimentation and testing for the emergence of collective functions; the same happens with most of the internal variables of the table... All these differences would give to the networks of automata a peculiar blend of collective properties with respect to the standard boolean functions. It then becomes possible to compare the differentiated emergent properties of the massive networks of automata with the numerous wholistic phenomena that occur in the machinery of the cell. Or putting the suggestion in a different way: perhaps, some of the global -- and not yet very well explained -- properties of the enzyme networks (such as clustering, channeling of substrates, accumulation of modulators upon crucial enzymes, metabolic 'key junctions', 'house keeping' functions, proliferation of second messengers, etc.) would have to be looked at not only as biochemical phenomena, but as materializations of emerging properties of mathematical sets of functions as well. To be more precise, these diffuse phenomena would be counter-properties of the system. They would be equivalent to 'evolutionary tricks' adopted by the cell to solve inconvenient mathematical-

biochemical situations provoked by the limitations of the network of automata. These limitations become particularly acute in cellular specialization, when networks of different cells have been modified in a complementary way and special means of communication have to be established in order to coordinate their functioning (see Tomkins, 1975; Welch, 1987b; Bray, 1990). It can be argued that Kauffman's proposal and our own suggestion are parallel. One can be applied to the dynamics of activations and inhibitions of the genes in the nucleus and the other to the enzyme networks in the cytoplasm. An interesting aspect of this double conjecture is that both worlds of activations and deactivations, both families of collective functions, have necessarily to be related. As far as the genes are concerned, we know that they are always 'watching', for their activation and inhibition, the phenomena occurring in the cytoplasm. We may argue that the cellular system has organized a 'representational' relationship between the development of the collective functions of both worlds. Statically, it is clear that there is a univocal mapping relation -- just a coding -- between enzymes and genes. Dynamically, the relationship between the activations and inhibitions of both worlds shows a number of individual and collective interactions between them. As a matter of fact, many findings in the dynamics of groups of genes have pointed to the evidence of globalized systems of control which imply the simultaneous surveillance by each group of genes of a number of cytoplasmic variables (Gottesman and Neidhart, 1983). How are we to deal theoretically with this kind of bizarre representational relationship? Historically, the mathematical interpretation of this subject was for many years the main thrust of a pioneer of theoretical biology: N. Rashevsky. Almost without any of the solid empirical correlates that molecular biology would provide later, he started to delineate (1954) a 'relational biology' which could handle in abstract terms the network of operations of the cytoplasm in its relationship with the nuclear

110

phenomena. Furthermore, he tried to apply similar points of view to the dynamics of nervous systems and societies. His work in relational biology has been continued especially by R. Rosen -- in Rosen (1985b) can be found an accurate exposition of the subject -- it has been explored also by other authors starting from different orientations (see Bagley et al., 1989, about metadynamic models; Tchuraev, 1991, about automata models of gene expression). The mathematical facets of this subject, the linkage between the collective logical (and metalogical!) operations of the two related worlds, have been worked out to a limited extent; probably, the articulation of very different theoretical tools will be necessary. In this respect, we believe -though only as a guess -- that a network of 'molecular automata' in the cytoplasm (with all their differential properties: hidden part, reversibility, variable connectivity, energetic repercussions, etc.) has to be correlated with a 'Kauffman network' in the nucleus and that the conceptual tool to mediate between them is a kind of Rashevskyan algebra. Let us finally remark that this line of thought is not very distant from that of some contemporary elaborations of 'representation theory' in artificial intelligence (Palmer, 1978). A partial coincidence with notions of artificial intelligence is not far-fetched (Conrad, 1987). We confront a system, the cell, which by means of the manipulation of its internal informationprocessing worlds is endowed with an impressive property of 'adaptive self-modification' in the face of changes in its internal or external environment. And this kind of property is what constitutes the genuine fundamentals of intelligence, either natural or artificial (Maturana and Varela, 1987; Beer, 1990; Marijuan, 1991). 3.1.2 In the second group of fields -- artificial biochemistry, molecular computation and metabolic engineering -- some of the studies have a mixed theoretical and applied side (especially, in molecular computing), whereas the first ones are mainly theoretical and the latter are applied. Artificial biochemistry may also be called 'computational metabolism' and is an attempt at modelling the basic conditions of

biochemical environments that can give rise to life-like behaviors. It is a new area developing inside the multidisciplinary studies known as 'artificial life' (Langton, 1988) and it is particularly related with cellular automata and other computational concepts; it also maintains procedural relationships with artificial intelligence. The second field, molecular computation, is a more time-honored pursuit that proposes the use of organic molecules as auxiliary -- or central -- tools for information processing. A review of the theoretical and experimental realizations in this area is beyond the scope of this paper (see Conrad and Liberman, 1982; Conrad, 1985; Hameroff, 1988; Conrad, 1990). Indeed these studies have raised high expectations and have shown a large breadth which range from neurosciences (the computational role of microtubules and cAMP in real neurons), up to neural networks (artificial neurons endowed with 'enzymic networks'), analysis of cell-cell communication, artificial sensors, memory technologies, processing of images, etc. And with respect to the last field of the group, metabolic engineering, its development corresponds to the necessities and resources of molecular biology and DNA research (biotechnologies based in recombinant DNA). Its concern is the re-design of metabolic networks in altered microorganisms, mainly for biotechnological or industrial purposes (Stephanopoulus and Vallino, 1991). Examples of its scope are: analysis of network responses, node rigidity, redirection of metabolic flows, cellular synthesis of new products, completion of pathways, etc. As Bailey (1991) points out, some of the experimental and mathematical tools required for rational metabolic engineering are available, but complex cellular responses to genetic perturbations can complicate predictive design; the development of sophisticated mathematical structures is the only way that the net consequences of simultaneous, coupled and often counteracting processes can be simulated in the computer and evaluated consistently and qualitatively. We have already cited in the Introduction (1.3) the proposal of R. Bray (1990) relating the physiological analysis of enzyme

111

networks with parallel processing and neural networks models. A 'computational physiology' based primarily on realistic models of the simplest cellular systems looms. 3.2 The informational view of the cell. Towards a bio-information discipline? The disparity of fields which can be related to the informational view of the cell deserves a brief analysis. As A. Szent Gyorgyi (1968) wrote in the preface of his last book: 'If you would ask a chemist to find out for you what a dynamo is, the first thing he would do is to dissolve it in hydrochloric acid. A molecular biologist would, probably, take the dynamo to pieces, describing carefully the helices of wire. Should you timidly suggest to him that what is driving the machine may be, perhaps, an invisible fluid, electricity, flowing through it, he would scold you as a 'vitalist'... The cell is a machine driven by energy. It can thus be approached by studying matter, or by studying energy. The study of matter, or structure, leads to molecular biochemistry and what D.D. Eley calls the steric factor approach which dominates present biochemistry. ,I will approach from the energy side... It can also be called organization... This book is devoted to the question whether or not there is a closer analogy between the dynamo and the living system. This latter, too, may be permeated by an 'invisiblefluid,'the particlesof which, the electrons,are more mobile than molecules and carry energy, charge and information and act as the fuel of life,,..'

Bioenergetics has made big strides in understanding the 'metabolic dynamo' and it is curious that the image of the dynamo has persisted as adequate (see Fox, 1988: 'following the Lipman cycle, the metabolic dynamo generates - P current. This is brushed off by adenylic acid, which likewise functions as the wiring system, distributing the current. Creatine-P, when present, serves as P accumulator. Components of the metabolic wheel include glycolysis, the citric acid cycle and the electron transport chain'). With respect to the other aspect that Szent Gyorgyi relates to energetics, the information-organizational one, the situation has been different. Here, the development of the ideas has been more convoluted and the image of the dynamo has been far less inspiring; instead, most of the models have been taken from the processes of the computer.

As a matter of fact, the task that Szent Gyorgyi was proposing, understanding the information conveyed by the 'invisible fluid' that permeates the 'cellular dynamo', has appeared as a formidable one. It imposes an integration of knowledge from many separate disciplines: computation, biochemistry, mathematics, molecular biology, biophysics. Many of the processes of convergence between these disciplines are already taking place. What we have been surveying from the recent panorama, just to contrast and to complement our version of the automata hypothesis, has shown a real proliferation of theoretical attempts and multidisciplinary points of view -- and a half dozen more areas could be cited, either closer to molecular biology and molecular computing or to artificial intelligence and artificial life; e.g., computational biology, molecular dynamics, 'radical computing', nano-molecular systems, genetic algorithms, typogenetics... (crossing borders between disciplines is here not a deterrent but the norm!). In many cases the differences between the contents of the fields are minor and the subsequent repetition and overlapping may contribute to giving a rather incoherent overall image of the studies. Besides, it is possible that many of the present attempts and hypotheses are yet unrealistic and unsophisticated, and they will be superseded quite soon; then the theoretical multiplicity would diminish gradually by itself. But there is a compelling rationale behind the present-day expanding multiplicity. Under the influence of information and computer sciences, theoretical biology has left definitely behind it a long period of selfrestriction to Neo-Darwinian models of population genetics. We are advancing now towards the development of many partial theories of information dynamics and organization in the cellular system (at least, covering most of its basic characteristics). In a relatively short term, the success of some of these attempts is predictable and it will be possible, then, to open a firmer integrative process. This may have significant repercussions in unsolved questions of present-day biology: ontogenic development,

112

morphology, origin of evolutionary novelties, biological intelligence. Achieving soon the fundamentals of a sound informational perspective of the cell (based on the computation of enzymes?, based on the emergent properties of a population of 'molecular automata' interacting with a 'representational world'?) may be a strategic factor for precipitating such success. It is always difficult to dovetail concepts from different realms. What we are contemplating now, what we are immersed in, is the struggles of the recently framed Information Science (computation) to develop a true 'bio-information' scientific discipline. As previously with the application of the chemical and physical perspectives -- 'bio-chemistry' and 'bio-physics', respectively -- it is now the turn of Information Science to provide a new perspective to illuminate the study of life.

Acknowledgments The authors would like to acknowledge the help of R. DeJeu, Department of Mathematics, University of Chicago and a fellowship (PCM) from the Spanish Ministry of Education, Exp 89-16496639.

References Bagley, R.J., Farmer, J.D., K a Z a n , S.A., Packard, N.H., Perelson, N.H. and Stadnyk, I.M., 1989, Modeling adaptive biological systems. BioSystems 23, 113-138. Bailey, J.E., 1991, Toward a science of metabolic engineering. Science 252, 1668-1674. Beer, R.D., 1990, Intelligence as Adaptive Behavior: an Experiment in Compuational Neuroethology. (Academic Press, San Diego, CA). Bray, D., 1990, Intracellular signalling as a parallel distributed process. J. Theor. Biol. 143, 215-231. Brooks, D.R. and Willey E.O., 1986, Evolution as Entropy: Toward a Unified Theory of Biology. (Univ. of Chicago Press, Chicago). Conrad, M., 1972, Information processing in molecular systems. Curr. Mod. Biol. 5, 1-14. Conrad, M., 1985, On design principles for a molecular computer. Commun. ACM 28, 5, 464-480. Conrad, M., 1987, Rapprochement of artificial intelligence and dynamics. Euro. J. Oper. Res. 30, 280-290. Conrad, M., 1990, Molecular computing, in: Advances in Computing, M. Yovits (ed.) (Academic Press, New York).

Conrad, M. and Liberman E.A., 1982, Molecular computing as a link between biological and physical theory. J. Theor. Biol. 98, 239-252. Crabtree, B. and Newsholme, E.A., 1987, A systematic approach to describing and analysing metabolic control systems. Trends Biochem. Sci. 12, 4-12. Fox, R.F., 1988, Energy and the Evolution of Life. (W.H. Freeman and Company, New York). Franco R. and Canela E.I., 1984, A program for the numerical integration of enzyme kinetic equations using small computers. Int. J. Bio-Med. Comput. 15, 419-432. Gottesman, S. and Neidhart, F.C., 1983, Global control systems, in: Gene Function in Prokaryotes, J. Beckwith, J. Davier and J.A. Gallant (eds.) (Cold Spring Harbor Library, New York). Hameroff, S., Rasmussen, S. and Mansson, B., 1988, Molecular automata in microtubules: basic computational logic of the living state, in: Artificial Life, C. Langton (ed.) (Addison-Wesley Publishing Co., Redwood City, CA). Jarabak, R. and Westley, J., 1974, Enzymic memory: a consequence of conformational mobility. Biochemistry 13, 3237- 3239. Kacser, H., 1983, The control of enzyme systems in vivo: elasticity analysis of the steady state. Biochem. Soc. Trans. 11, 35-40. Kauffman, S.A., 1974, The large scale structure and dynamics of gene control circuits: an ensemble approach. J. Theor. Biol. 44, 167-189. Kauffman, S.A., 1985, Self-organization, selective adaptation and its limits, in: Evolution at a Crossroads, Depew, Weber (eds.) (MIT Press, Cambridge, MA). Kauffman, S.A., 1987, Developmental logic and its evolution. BioEssays 6, 2, 82-87. Kauffman, S.A., 1991, Antichaos and adaptation. Sci. Am., Aug. 78- 84. Langton, C.G., 1988, Artificial Life. SFI Studies in the Sciences of Complexity, C. Langton (ed.) (Addison- Wesley Publishing Co., Redwood City, CA). Marijuttn, P.C., 1991, Enzymes and theoretical biology: sketch of an informational perepective of the cell. BioSystems 25, 259-273. Martland, D., 1989, Dynamic behaviour of boolean networks. In: Neural Computing Architectures, Igor Aleksander (ed.) (Cambridge MA: MIT Press). Maturana, H. and Varela, F., 1987, The tree of knowledge (New Science Library, Shambhala, Boston, MA). Okamoto, M., Katsurayama, A., Tsukiji, M. and Hayashi, K., 1980, Dynamic behavior of enzymatic system realizing two factor model. J. Theor. Biol. 83, 1-16. Okamoto, M. and Hayashi, K., 1983, Dynamic behavior of cyclic enzyme system. J. Theor. Biol. 104, 591- 598. Okamoto, M., Sakai, T. and Hayashi, K., 1989, Biochemical switching device: how to turn (off) the switch. BioSystems 22, 155-162. Okamoto, M. and Hayashi, K., 1990, Network study of integrated biochemical switching system I: connection of basic elements. BioSystems 24, 39- 52.

113 Padh, H., 1990, Cellular function of ascorbic acid. Biochem. Cell Biol. 68, 1166-1173. Palmer, S., 1978, Fundamental aspects of cognitive representation, in: Cognition and Categorization, Rosch, Lloyd (eds.) (Lawrence Erlbaum, New York). Rashevsky, N., 1954, Topology and life: in search of general mathematical principles in biology and sociology. Bull. Math. Biophys. 16, 317-348. Rosen, R., 1967, Two-factors models, neural nets and biochemical automata. J. Theoret. Biol. 15, 282-297. Rosen, R., 1979, Some comments on activation and inhibition. Bull. Math. Biol. 41, 427-445. Rosen, R., 1985a, Information and cause. In: Information Processing in Biological Systems, B. Kursunoglu, S.L. Mintz, A. Perlmutter (eds.) (Plenum Press, New York). Rosen, R., 1985b, Organisms as causal systems which are not mechanisms. In: Theoretical Biology and Complexity: Three Essays on the Natural Philosophy of Complex Systems, R. Rosen (ed.) (Academic Press, Orlando). Sampson, JR., 1976, Adaptive Information Processing (Springer, New York). Sols, A., 1981, Multimodulation of enzyme activity. Curr. Topics Cell. Regul. 19, 77-101 (New York: Academic Press). Srere, P.A., 1988, Complexes of sequential metabolic enzymes. Annu. Rev. Biochem. 56, 39-124. Stephanopoulus, G. and Vallino, J.J., 1991, Network rigidity and metabolic engineering in metabolite overproduction. Science 252, 1675-1681.

Stock, J., 1987, Mechanisms of receptor function and the molecular biology of information processing in bacteria. BioEssays 6, 5, 199-203. Sugita, M., 1961, Functional analysis of chemical systems in vivo using a logical circuit equivalent. J. Theor. Biol. 1, 415- 430. Sugita, M., 1963, Functional analysis of chemical systems in vivo using a logical circuit equivalent. II. The idea of a molecular automation. J. Theor. Biol. 4, 179-192. Szent GySrgyi, A., 1968, Bioelectronics (Academic Press, New York). Tomkins G.M., 1975, The metabolic code. Biological symbolism and the origin of intercellular communication is discussed. Science 189, 760-763. Tchuraev, R.N., 1991, A new method for the analysis of the dynamics of the molecular genetic control systems. I. Description of the method of generalized threshold models. J. Theor. Biol. 151, 71-87. Volini, M. and Westley, J., 1966, The mechanism of the rhodanese-catalyzed thiosulfate-lipoate reaction. J. Biol. Chem. 241, 5168-5176. Welch, G.R. and Clegg, J.S., 1987a, Organization of Cell Metabolism. G.R. Welch and J.S. Clegg (eds.) (Plenum Press, New York). Welch, G.R., 1987b, The living cell as an ecosystem: hierarchical analogy and symmetry. TREE 2, 10, 305-309. Westley, J., 1969, Enzymic Catalysis (Harper & Row, New York).

Enzymes as molecular automata: a reflection on some numerical and philosophical aspects of the hypothesis.

Enzymes, by means of their properties of specific recognition and allosteric modulation, are able to integrate many separate processes into systemic u...
1MB Sizes 0 Downloads 0 Views