Behavioural Processes 114 (2015) 63–71

Contents lists available at ScienceDirect

Behavioural Processes journal homepage: www.elsevier.com/locate/behavproc

Autoshaped choice in artificial neural networks: Implications for behavioral economics and neuroeconomics José E. Burgos ∗ , Óscar García-Leal CEIC, University of Guadalajara, Mexico

a r t i c l e

i n f o

Article history: Available online 4 February 2015 Keywords: Autoshaped choice Pavlovian contingencies Behavioral economics Neuroeconomics Neural networks

a b s t r a c t An existing neural network model of conditioning was used to simulate autoshaped choice. In this phenomenon, pigeons first receive an autoshaping procedure with two keylight stimuli X and Y separately paired with food in a forward-delay manner, intermittently for X and continuously for Y. Then pigeons receive unreinforced choice test trials of X and Y concurrently present. Most pigeons choose Y. This preference for a more valuable response alternative is a form of economic behavior that makes the phenomenon relevant to behavioral economics. The phenomenon also suggests a role for Pavlovian contingencies in economic behavior. The model used, in contrast to others, predicts autoshaping and automaintenance, so it is uniquely positioned to predict autoshaped choice. The model also contemplates neural substrates of economic behavior in neuroeconomics, such as dopaminergic and hippocampal systems. A feedforward neural network architecture was designed to simulate a neuroanatomical differentiation between two environment-behavior relations X-R1 and X-R2, where R1 and R2 denote two different emitted responses (not unconditionally elicited by the reward). Networks with this architecture received a training protocol that simulated an autoshaped-choice procedure. Most networks simulated the phenomenon. Implications for behavioral economics and neuroeconomics, limitations, and the issue of model appraisal are discussed. © 2015 Elsevier B.V. All rights reserved.

1. Introduction This paper describes the first use of an existing neural network model of conditioning to simulate a phenomenon that is relevant to behavioral economics, the study of the environmental factors that affect economic behavior defined as the allocation of actions across alternatives according to their values. The phenomenon can be called “autoshaped choice” and was first reported by Picker and Poling (1982). In their basic procedure, they used an autoshaping arrangement for baseline training, where pigeons received randomly intermixed trials of two keylight colors X and Y (e.g., red and yellow), separately paired with food (grain access) in a forward-delay manner, independently of responding. Fifty-percent of X trials and 100% of Y trials was paired with food. Then, pigeons received choice tests where X and Y were presented concurrently and unreinforced. Pigeons tended to choose Y more frequently. Most features of this phenomenon fit well with behavioranalytic approaches to behavioral economics (e.g., Green and

∗ Corresponding author at: 180 Fco. de Quevedo, Col. Arcos de Vallarta, Guadalajara, Jal. 44130, Mexico. Tel.: +52 33 37771150x33316. E-mail address: [email protected] (J.E. Burgos). http://dx.doi.org/10.1016/j.beproc.2015.01.010 0376-6357/© 2015 Elsevier B.V. All rights reserved.

Rachlin, 1975; Hursh, 1980, 1984; Kagel et al., 1975, 1981; Kagel and Winkler, 1972; Lea, 1978). To begin with, the phenomenon involves key pecking, a prototypical emitted, skeletal response widely used in these approaches. This response is emitted in that it is not unconditionally elicited by the reward, at least as demonstrably as typical unconditional responses (e.g., nictitating-membrane responses unconditionally elicited by air puffs or electric shocks to a rabbit’s eye). Such approaches view economic behavior as emitted in this sense, rather than unconditionally elicited. Also, autoshaped choice involves the use of food as a reward. In behavior-analytic approaches to economic behavior, food is analogs to a good or commodity that satisfies some need when consumed. This good occurs more or less frequently, depending on certain conditions. This feature is also defining of autoshaped choice: Food occurs more frequently for one response alternative than another. And choice has been central to behavioral economics as well. In fact, economic behavior is conceived of as a kind of choice behavior. This feature has encouraged the use of the sort of mathematical models that are used in behavioral studies of choice (e.g., hyperbolic discounting). The use of a neural network model here echoes this use of mathematical models, although, as will become apparent later on, the present model differs from others. In any case, those models describe a phenomenon that has been widely studied in

64

J.E. Burgos, Ó. García-Leal / Behavioural Processes 114 (2015) 63–71

behavioral economics: Preference for a response alternative that is more valuable in virtue of its correlation with a more frequent occurrence of a good. This preference is also observed in autoshaped choice and, as will be shown, is also predicted by the model used here. Autoshaped choice can thus reasonably be viewed as a form of economic behavior, or at least a simple one. Still, autoshaped choice has a feature that fits less well with behavior-analytic approaches to behavioral economics: The use of Pavlovian contingencies, where the reward is paired with discrete trials of exteroceptive stimuli independently of responding. Behavior-analytic approaches to economic behavior, in contrast, have emphasized operant contingencies, where reinforcement is explicitly scheduled to partly depend on responding and not signaled by discrete trials. Moreover, in such approaches, the acquisition and maintenance of economic behavior are supposed to require operant contingencies. However, the discovery of autoshaping (Brown and Jenkins, 1968) and automaintenance (Schwartz and Williams, 1972; Williams and Williams, 1969) showed the possibility that emitted responding can be acquired and maintained only through Pavlovian contingencies, without operant contingencies. Despite this, such a possibility has not yet been considered for economic behavior. The present study is the first one to do so. McSweeney and Bierley (1984) discussed autoshaping in relation to consumer behavior. However, they did not elaborate exactly how autoshaping contributed to economic behavior. Nor did they discuss autoshaped choice. Still, autoshaped choice provides the strongest link thus far between autoshaping and economic behavior. This link implies that the acquisition and maintenance of a simple form of economic behavior do not require operant contingencies. Of course, this does not mean that operant contingencies do not influence the economic behavior. They certainly do. However, autoshaped choice implies that they are not necessary for the acquisition and maintenance of a simple form of economic behavior. There have been discussions about the influence of Pavlovian contingencies on economic behavior (e.g., Camerer, 2011; Daw and O’Doherty, 2014; Daw and Tobler, 2014; Seymour and Dolan, 2008). However, they still assume that operant contingencies are necessary for the acquisition and maintenance of economic behavior. They have not considered the possibility that economic behavior can be acquired through Pavlovian contingencies alone, without operant contingencies. Autoshaping choice precisely shows this possibility. The relevance to neuroeconomics is that the present model takes into account brain structures that have been propounded as neural substrates of economic behavior, especially dopaminergic and hippocampal systems (e.g., Camerer et al., 2005; Daw and O’Doherty, 2014; Daw and Tobler, 2014; Dayan and Seymour, 2009; Salamone et al., 2009; Seymour and Dolan, 2008). The model hypothesizes that such systems also underlie autoshaped choice. More specific implications for neuroeconomics will become apparent later. Suffice it to say for now that the model is the first to theoretically link autoshaping, behavioral economics, and neuroeconomics. The next section gives a general description of the model. Section 3 describes a simulation of autoshaped choice with the model. The paper ends with concluding remarks about some implications for behavioral economics and neuroeconomics, and a philosophical (methodological) issue about explanation in neural-network modeling. 2. The model: general description The model has a computational part and a network part. The computational part consists of the activation and learning rules. The network part is a classification of the types of units that can

constitute a neural network and guidelines on how to connect them, according to basic general principles of gross vertebrate neuroanatomy. This section focuses on describing the network part in terms of the network architecture that was used for the simulation. The computational part is described in the appendix. The model was proposed as a unified connectionist approach to Pavlovian and operant conditioning (Donahoe et al., 1993). It is a relatively high-level, neural-systems model with a strong emphasis on basic general principles of gross vertebrate neuroanatomy. It is behaviorally guided by a unified reinforcement principle (Donahoe et al., 1982). Central to this principle is the notion of a stimulus discrepancy, the absence immediately followed by the presence of a Pavlovian conditioned or unconditioned stimulus. The principle thus hypothesizes that other stimuli that are temporally contiguous with such discrepancy tend to acquire control over responses that are temporally contiguous with the discrepancy. The model is also guided by evidence that hippocampal and dopaminergic systems play a role in operant and Pavlovian conditioning. In particular, such systems are hypothesized to contribute to the discrepancies that promote conditioning, operant as well as Pavlovian (see Appendix, Eq. (2)). As a connectionist approach, the model is also guided by the notion of parallel distributed processing. Applied to conditioning, this notion assumes that neural substrates of conditioning are distributed throughout circuits of interconnected neurons that function more or less homogeneously and simultaneously. Such circuits are theorized as artificial neural networks that consist of abstract neural units connected by abstract synapses. Units are activated at a moment in time according to an activation rule that returns a number between 0.0 and 1.0 (see Appendix, Eq. (1)). The strength of a connection is numerically represented as a weight (also a number between 0.0 and 1.0) that changes according to a learning rule (see Appendix, Eq. (2)). Pavlovian conditioning phenomena that have been simulated with the model include acquisition, extinction, and faster reacquisition or “savings” (Donahoe et al., 1993); ISI functions (Burgos, 1997); latent inhibition (Burgos, 2003); C/T ratio effects (Burgos, 2005); context-shift effects (Burgos and Murillo-Rodríguez, 2007); simultaneous conditioning (Burgos et al., 2008); second-order conditioning and resistance to extinction (Sánchez et al., 2010); blocking and overshadowing (Burns et al., 2011). Simulations of operant conditioning have been less extensive. So far, the model has simulated acquisition, extinction, faster reacquisition, generalization, and discrimination (Donahoe et al., 1993); timing in a fixed-interval schedule (Burgos and Donahoe, 2000; Donahoe and Burgos, 1999); and reinforcement revaluation (Donahoe and Burgos, 2000). It remains to be seen whether the model can also simulate other operant conditioning phenomena (e.g., shaping, behavioral contrast, negative reinforcement, positive punishment, fixed-ratio performance, matching law, matching to sample, etc.). Still, the model can also simulate autoshaping and automaintenance, positive as well as negative (Burgos, 2007). These phenomena have been central to discussions on the role of Pavlovian contingencies in the acquisition and maintenance of emitted responding (see Schwartz and Gamzu, 1977). The model thus allows for theorizing about this role (Burgos, 2010). No other model of conditioning can do this, whether Pavlovian (e.g., Gibbon and Balsam, 1981; Klopf, 1988; Rescorla and Wagner, 1972; Schmajuk et al., 1996; Stout and Miller, 2007; Sutton and Barto, 1981) or operant (e.g., Dragoi and Staddon, 1999; Killeen, 1994, 2011; Machado, 1997; Staddon and Zhang, 1991). The former do not contemplate emitted responding. The latter do but assume that its acquisition and maintenance require operant contingencies. The present model is the only one that can simulate acquisition and maintenance of emitted responding through Pavlovian contingencies.

J.E. Burgos, Ó. García-Leal / Behavioural Processes 114 (2015) 63–71

65

Fig. 1. Network architecture used in the simulation. X, Y: Two discrete exteroceptive cues (e.g., a red light and a yellow light). Sr: A biologically significant reward (e.g., food). Dashed arrows from X, Y, and Sr: sensory transduction process (not simulated). S : input layer, intended to simulate primary-sensory neuronal groups. Filled square labeled as S 1 : input unit whose activation is intended to simulate primary-sensory effects of X. Empty square labeled as S 2 : input unit whose activation is intended to simulate primary-sensory effects of Y. Hexagon labeled as S*: input unit whose activation is intended to simulate primary-sensory effect of Sr (see Appendix, Eq. (1)). Thick line from S* to D: fixed maximally strong connection. S : polysensory or sensory-association layer. Circles: neurocomputational units whose activations are computed according to the activation rule (see Appendix, Eq. (1)). Thin lines with button endings: variable initially weak connections, whose weights change according to the learning rule (see Appendix, Eq. (2)). S 1 , S 2 : units intended to simulate polysensory neuronal groups. H1 and H2 : units intended to simulate some hippocampal area (e.g., ca1), sources of dH , t . Shaded rectangles: diffuse discrepancy signals that modulate changes in connection weights (dH , t and dD , t ; see Appendix, Eq. (2)). M : motor-association layer. M 1 , M 2 : secondary-motor units. D: dopaminergic-like unit (source of dD,t ; see Appendix, Eq. (2)). Curved arrow: amplification of dH , t by dD , t . M : primary-motor (output) layer. Filled circle labeled as M 1 : output unit whose activation is intended to simulate a primary-motor precursor of a response R1. Empty circle labeled as M 2 : output unit whose activation is intended to simulate a primary-motor precursor of a response R2. R1 and R2 are simulated as emitted in that they are not activated by S*. Dashed arrows from outputs: primary-motor-to-effector transduction (not modeled). (For interpretation of the references to color in the text legend, the reader is referred to the web version of this article.)

Neural networks in this model are designed according to basic general principles of vertebrate gross-neuroanatomy. It is at this network level where the model makes the distinctions that are found in discussions about the operant-Pavlovian distinction. Such distinctions are between two types of stimuli (conditioned or discriminative vs. unconditioned of primary reinforcers), reinforcement procedures (response-independent vs. response-dependent), and responses (emitted vs. elicited). The model makes these distinctions in terms of formal interpretations of their neural substrates. For example, Fig. 1 shows the architecture of the networks that were used for the simulation (see 3). This architecture is just one among indefinitely many architectures that are possible in this model. All networks and parts of networks are intended only as minimal theoretical structures that are capable of simulating the phenomenon of interest. The network has a feedforward structure where units are organized into one input (S’, for “primary-sensory”), two hidden (S for “sensory-association,” M for “motor-association”), and one output layer (M , for “primary-motor”). S has three units (shaded square labeled as S 1 , empty square labeled as S 2 , and hexagon labeled as S*). Each hidden layer has two units (S 1 and S 2 , and M 1 and M 2 , respectively). M’also has two units (M’1 and M’2 ). These are the architecture’s output units. Activations propagate in one direction, from input to hidden to output. The S layer includes two H (for “hippocampal”) units (H1 , H2 ), and the M layer includes a D (for “dopaminergic”) unit. The reason to have two H and one D unit was to ensure continuity with previous simulation research that showed that this feature allowed for better simulations of certain phenomena (see Sánchez et al., 2010). The network’s connectivity is as follows. S 1 and S 2 are separately connected to S 1 and S 2 , respectively. S 1 is connected to M 1 and H1 , whereas S 2 is connected to M 2 and H2 . M 1 is connected to M 1 and D, whereas M 2 is connected to D and M 2 . This basic S’–S –M –M’ connectivity is found in all vertebrate brains. All these connections are variable, initially weak, and their weights change according to the learning rule (see Appendix, Eq. (2)).

Changes in the weights of these connections are modulated by diffuse discrepancy signals (shaded rectangles arising from the H and D units). The signal that arises from H1 and H2 (dH,t ) modulates changes in the weights of the S’–S and S –H connections. The signal that arises from D (dD,t ) modulates changes in the weights of the S –M , M –D, and M –M’ connections and amplifies dH,t (curved arrow). The connection from S* to D (thick line) is fixed and maximally strong. Other network architectures include output units (R*) that also receive this kind of connection from S*, to simulate unconditioned responding (see first condition of Eq. (1) in the Appendix). But as a simplification, the present architecture did not have this type of output unit. Both outputs in this architecture (M 1 and M 2 ) could be activated only by S 1 and S 2 via the S and M units. The S’ (input) units are activated according to a training protocol that simulates a conditioning procedure of interest. S1 activations are intended to simulate primary-sensory effects of a discrete exteroceptive cue X (e.g., a red keylight). S2 activations are intended to simulate primary-sensory effects of another discrete exteroceptive cue Y (e.g., a yellow keylight). A cue’s duration is simulated by a number of timesteps at which a certain input unit is activated. The activation of S* is intended to simulate primary-sensory effects of a biologically significant reward Sr (e.g., food). The dashed arrows from X, Y, and Sr depict sensory transduction processes that are not simulated by the model. M 1 and M 2 (output) activations are intended to simulate primary-motor precursors of two responses R1 and R2 (e.g., peck a red light vs. peck a yellow light). Both responses are assumed to be emitted in that they are not unconditionally elicited by the reward (i.e., S* cannot activate M 1 or M 2 ). The model does not simulate responses per se, in two senses. First, responses require effectors, which the model does not simulate. The model only simulates primary-motor precursors of responses. Second, responses are discrete (and usually binary), whereas activations in this model are continuous. To simulate the discrete character of responses, a response rule is required that transforms output activations into binary events. In previous simulation research, a rule has been used that defines a response as an output activation of 0.5 or more.

66

J.E. Burgos, Ó. García-Leal / Behavioural Processes 114 (2015) 63–71

For simplicity, this network does not simulate another feature of emitted responding, namely, directedness (cf. Burgos, 2007). Another feature is that its output units can be highly activated simultaneously. The implication is that R1 and R2 are not mutually exclusive and hence can occur simultaneously. That is to say, R1 and R2 are assumed to not compete with one other (i.e., one does not prevent the other). Nor did X and Y compete for control of R1 or R2 either, as S 1 and S 2 each had its own pathway to their respective output units (M 1 and M 2 , respectively), and both pathways could be activated simultaneously (this exploits the parallel character of neural network processing). This feature differs from those typically used in behavioral research on choice, autoshaped choice included, where animals can physically respond to one option at a time. The network’s connectivity was designed so that M 1 could be activated only by S 1 via S 1 and M 1 , and M 2 could be activated only by S 2 via S 2 and M 2 . Behaviorally, the working hypothesis is that, after a training protocol that simulates Picker and Poling (1982) basic procedure, X will control R1 and Y will control R2. If half of X trials and all Y trials are reinforced, then it is expected that X will control R1 more weakly than Y control R2. Thus, in choice trials where X and Y are concurrently present, R1 would be expected to be activated more weakly than R2, which would simulate a preference for Y. Such connectivity is not arbitrary. Not only is it consistent with the intensely distributed and parallel character of processing in connectionist systems. It also is grounded in neuroanatomical knowledge (for a review, see Mesulam, 1998). It is known that different exteroceptive stimuli affect different parts of primarysensory cortex. Different wavelengths (e.g., red and yellow), for example, affect different parts of primary-visual cortex (Crick and Koch, 1995). This difference is simulated in the network by the differentiation between S 1 activations and S 2 activations, which, again, are intended to simulate primary-sensory effects of different exteroceptive cues. It is also known that different parts of primary-visual cortex, in turn, project to different parts of unimodal visual association cortex (simulated by S 1 and S 2 ). These, in turn, project to different parts of motor-association cortex (simulated by M 1 and M 2 ). And these project to different parts of primary-motor cortex (simulated by M 1 and M 2 ). Also, different parts of primary-motor cortex are precursors of different responses. Of course, natural brains do not show such a neat anatomical differentiation, so it is not propounded here to emulate any particular natural brain. Rather, it is used only as a theoretical simplification.

3. A simulation A simulator designed and coded by the first author was used for the simulation. Nine naive networks with the architecture shown in Fig. 1 received a training protocol that simulated a simplified version of the procedure that Picker and Poling (1982) used in their experiment 1. All networks were naive in that their initial connection weights were low (0.2 for the S’–S and S –H connections, 0.01 for the rest; see Fig. 1). All networks first received 200 X trials randomly interspersed with 200 Y trials. The primary-sensory effect of an X trial was defined as the maximal activation (1.0) of S 1 , for five timesteps. The primary-sensory effect of a Y trial was defined as the maximal activation of S 2 , also for five timesteps. The hypothesis here is that different exteroceptive cues activate different neurons in primary-sensory (e.g., V1, in the case of visual cues) cortex. The primary-sensory effect of Sr (the reward) was defined as the maximal activation of S* at the last timestep of either an X or a Y trial, to simulate a forward-delay Pavlovian procedure. A random 50% of

Fig. 2. Mean M 1 and M 2 (output) activations across 20 choice test trials for all networks (a panel per network), after a training that consisted in 200 X trials randomly interspersed with 200 Y trials, where a random 50% of X trials and 100% of Y trials were paired with Sr (a simulated biologically significant reward). Sr was responseindependent in that it did not depend on the output activations. Output activations were intended to simulate primary-motor precursors of responding that is emitted in that it is not unconditionally elicited by the reward, which was simulated by output units (M 1 and M 2 ) that could be activated only by M 1 and M 2 , not S*.

X trials was paired with Sr and the other 50% occurred without the reward, whereas 100% of Y trials was paired with Sr. Then, all networks received 20 choice trials where S 1 and S 2 were activated simultaneously to simulate the concurrent occurrence of X and Y (i.e., a compound cue). For simplicity, the learning rule was disabled during this choice test phase, to evaluate the effects of the training without weight change. The intertrial interval was assumed to be long enough for all activations to decrease to near-zero values (the logistic function with an argument of 0.0, which simulates a spontaneous activation; see Appendix, Eq. (1)). The results are shown in Figs. 2 and 3. Fig. 2 shows the mean M 1 (filled bars) and M 2 (empty bars) activations across the 20 choice test trials for each individual network (a panel per network). As this figure shows, M 1 activations were near zero and M 2 activations were well above zero (and above a 0.5 output activation response criterion) in all networks, except for N61, which simulated an indifference between X and Y. These results simulate a choice of or preference for Y, which had been paired with the reward 100% of the trials. The choice resulted from a form of autoshaping and (positive) automaintenance separately for X and Y, where the reward was simulated as response-independent and responding was simulated as emitted in that it was not unconditionally elicited by the reward. The model can thus simulate autoshaped choice as defined in 1. The model does this by exploiting the distributed and parallel features of the neural network architecture shown in Fig. 1. This architecture was designed to simulate a neuroanatomical differentiation between two emitted responses controlled by two discrete cues that were separately, independently, and differentially reinforced. This architecture and training afforded differential weight changes in different pathways. Because Y was reinforced more frequently than X, connections in the pathway that was activated only by Y (S 2 –S 2 –M 2 –M 2 ) gained more weight than those in the Y pathway (S 1 –S 1 –M 1 –M 1 ). Parallel processing was exploited in the choice tests, where both pathways were activated concurrently by presenting X and Y as a compound.

J.E. Burgos, Ó. García-Leal / Behavioural Processes 114 (2015) 63–71

67

activations of the network pathways that mediated the X-R1 and YR2 relations. As a result, such activations showed a stronger control of R2 by Y. No other model of conditioning can account for this phenomenon. Previous models of Pavlovian conditioning (e.g., Gibbon and Balsam, 1981; Klopf, 1988; Rescorla and Wagner, 1972; Schmajuk et al., 1996; Stout and Miller, 2007; Sutton and Barto, 1981) do not contemplate emitted responding. And previous models of operant conditioning (e.g., Dragoi and Staddon, 1999; Killeen, 1994, 2011; Machado, 1997; Staddon and Zhang, 1991) assume that operant contingencies are necessary for the acquisition and maintenance of emitted responding. Consequently, there are no alternative hypotheses that are ruled out by the results. Only the present model can explain the phenomenon. 4.1. Implications for behavioral economics

Fig. 3. Mean S 1 –M 1 (filled bars) and S 2 –M 2 (empty bars) connection weights (see Fig. 1) across the 20 choice test trials after a training that consisted in 200 X trials randomly interspersed with 200 Y trials, where a random 50% of X trials and 100% of Y trials were paired with Sr (a simulated biologically significant reward), independently of output activations, to simulate response-independent reinforcement.

Fig. 3 shows an example of how differential weight changes occurred for two connections, S 1 –M 1 (sensory-motor interface of the X pathway) and S 2 –M 2 (sensory-motor interface of the Y pathway). As the figure shows, S 2 –M 2 , the pathway that mediated the Y-R2 relation, had substantially more weight than S 1 –M 1 , the pathway the mediated the X-R1 relation. As a result, Y controlled R2 more strongly than X controlled R1 (i.e., S 2 activated M 2 via S 2 and M 2 more strongly than S 1 activated M 1 via S 1 and M 1 ) in most of the networks. The exception to this was N61, the network that simulated indifference between X and Y. This indifference was simulated as high output activations of both output units. This result was due to the Gaussian reactivation threshold in the activation rule, a stochastic factor of the model (see Eq. (1) in Appendix). The parameters of this threshold allowed for a small chance (1/9) that a network showed indifference. This result reproduces Picker and Poling (1982, Experiment 1) finding that also one out of nine pigeons (P23) showed indifference. 4. General discussion The results show that the model can simulate autoshaped choice, a simple form of economic behavior, by exploiting the distributed and parallel character of connectionist systems. The distributed character was exploited with an architectural differentiation between two network pathways (S 1 –S 1 –M 1 –M 1 vs. S 2 –S 2 –M 2 –M 2 in Fig. 1) that are separately affected by two environment-behavior relations (X-R1 vs. Y-R2, respectively). Such differentiation allowed the baseline training to promote differential weight changes in different pathways, precisely, faster weight gain in the pathway that mediated the Y-R2 relation, due to the more frequent reinforcement of Y. The parallel character of connectionist processing was exploited in the choice tests, with the concurrent

The implications of the results for behavioral economics, beyond the phenomenon itself, arise from two novel predictions. One both output units could be highly activated simultaneously, as shown by N61, the network that simulated indifference. This simplification implies that R1 and R2 are not mutually exclusive (i.e., do not compete with one another) and hence could occur concurrently. Despite this, all the other networks simulated a preference of Y over X. In contrast, the alternative responses in Picker and Poling (1982) study were mutually exclusive. The implication of the present results is that competitive or mutually exclusive responses are not necessary for autoshaped choice. It remains to be seen whether this prediction is experimentally confirmed, for autoshaped choice and more complex forms of economic behavior. Strictly, however, the model only predicts that activation competition between primary-motor precursors of responses is not necessary for autoshaped choice. This prediction does not preclude the possibility that response competition can depend on other motor areas that the model does not simulate (e.g., subcortical motor nuclei such as the basal ganglia and red nucleus, or perhaps even on the spinal cord). More complex forms of economic behavior are explicitly trained with concurrent operant contingencies. It remains to be seen whether autoshaped choice significantly contributes to such training. If it does, a more general implication of the present results is that response competition is not necessary for concurrent operant performance (cf., Catania, 1969; Dragoi and Staddon, 1999; Navakatikyan, 2007). The model, however, can also simulate primary-motor activation competition with lateral inhibitory units in the output layer, but this is a complication that is better left for future research. It is unclear at this point whether and how the model can simulate high concurrent output activations and mutually exclusive choices. It is thus unclear whether and how the model can make contact with detection-theory models of choice (e.g., Davison and Jenkins, 1985). Another prediction that is relevant to behavioral economics has to do with the use of a network architecture without output units to simulate responding that is unconditionally elicited by the reward. Although this was intended as a strategic simplification, it allowed for a novel prediction: Pavlovian conditioning is not necessary for autoshaped choice and economic behavior, insofar as the former is relevant to the latter. In this prediction, Pavlovian conditioning is defined traditionally, in terms of an unconditioned reflex (Gormezano and Kehoe, 1975). The conceptual clarification here is that Pavlovian conditioning is not the same as Pavlovian contingencies. The two are closely related but need not go hand in hand, at least theoretically. Pavlovian conditioning, traditionally defined, requires an unconditioned reflex. Pavlovian contingencies are a way to promote Pavlovian conditioning whereas Pavlovian conditioning is an effect or result

68

J.E. Burgos, Ó. García-Leal / Behavioural Processes 114 (2015) 63–71

of this procedure, but Pavlovian contingencies do not logically require an unconditioned reflex. On this rationale, no unconditioned response implies no unconditioned reflex, as in the networks used for the simulation, and this implies no Pavlovian conditioning. But there can be Pavlovian contingencies without Pavlovian conditioning. Thus, on this model, there can be Pavlovian contingencies without Pavlovian conditioning (although not vise versa). Of course, this possibility has yet not been observed in animals. Thus far, all rewards are supposed to unconditionally elicit some response or other, whether or not observed. Still, laboratory manipulations are possible to test the prediction empirically. The specific prediction is that primary-motor precursors of an unconditioned response are not necessary for autoshaped choice. Such manipulations would involve interfering with these precursors (e.g., surgically removing or specifically chemically blocking them). The prediction is that this interference will not significantly prevent autoshaped choice. 4.2. Implications for neuroeconomics Both predictions are also relevant to neuroeconomics in that they refer to neural substrates of economic behavior. Again, some of the model’s neuroanatomical categories refer to dopaminergic and hippocampal systems, which have been proposed as neural substrates of economic behavior. In agreement with evidence from neuroeconomics (see Camerer et al., 2005; Daw and O’Doherty, 2014; Daw and Tobler, 2014; Dayan and Seymour, 2009; Seymour and Dolan, 2008), the model hypothesizes that dopaminergic and hippocampal systems play a role in a simple form of economic behavior. This role, or at least part of it, is hypothesized in the model to consist of diffuse discrepancy signals that arise from certain hippocampal areas (e.g., CA1) and dopaminergic nuclei (e.g., VTA) and modulate changes in efficacies of sensory- and motor-association synapses, respectively. In the model, these signals are simulated as dH,t and dD,t , respectively, which modulate weight changes in the learning rule (see Fig. 1 and Appendix, Eq. (2)). According to the model, however, this role is not restricted to autoshaped choice, but is common to all conditioning, Pavlovian as well as operant, at least in vertebrates. Also related to neuroeconomics, the model hypothesizes that a significant neuroanatomical differentiation between sensory and motor pathways that are affected by X and Y might be necessary for autoshaped choice and perhaps more complex forms of economic behavior. This implication obtains even if there is competition between primary-motor precursors of responses. Perhaps, such competition intensifies the effects of that differentiation. 4.3. Other simplifications Three other simplifications were the neat neuroanatomical differentiation between the X and Y pathways, the absence of a simulated context, and the disabling of the learning rule during the choice tests. Regarding the first, obviously no natural brain shows such a neat architectural differentiation between different stimulus pathways. It thus is not propounded to correspond directly to any natural brain, but only as a counterfactual limit condition for the purpose of theoretical abstraction. The model allows for somewhat more realistic network architectures that are not as neatly differentiated as the one used here, but this is yet another complication that will have to be left for future investigation. A conceptual aspect of this simplification relates to the emitted–elicited distinction. This distinction was made here too sharply and simply. It focused on just one feature of emitted responding, namely, being evoked by stimuli other than, without being unconditionally elicited by, the reward. However, this was

not intended as the only defining feature of emitted responding. Emitted responding has other features that were not simulated here, again for the sake of simplification, such as directedness (i.e., making physical contact with some part of the environment, such as an operandum), involving skeletal (striated) muscles, and so on. The model, again, can simulate directedness (see Burgos, 2007), but it remains to be seen what the model predicts about its role in autoshaped choice. As a second simplification, there was no attempt to simulate the context, despite abundant evidence that shows conditioning, operant and Pavlovian, to be strongly context-dependent. There is no reason to believe that autoshaped choice is an exception to this. The model allows for the simulation of a context (e.g., Burgos and Murillo-Rodríguez, 2007) and an explicit intertrial interval. However, future simulation research will determine what the model predicts about the effects of both on autoshaped choice. The third simplification, disabling the learning rule during the test, is common in neural network modeling. However, it still is worth saying what happens if the learning rule is enabled during the test. What happens is that extinction occurs, lowering the output activations. The net effect is a reduction in the mean M 2 activation. This mean, however, is still higher than the M 1 activation. The basic result is thus preserved. 4.4. Philosophical issue: model appraisal The present study, then, involved a great deal of simplification. Simplification, of course, is a hallmark of modeling, and neural network modeling is not the exception. There is no objective criterion to decide how much simplification is too much. However, there may well be such a thing as too much realism, perhaps even an objective criterion for it. A major issue in connectionist modeling revolves around the so-called “Bonini’s paradox” (e.g., Dawson, 2004, pp. 17–18, 20, 121), after Stanford business professor Charles Bonini (1963), who explained the difficulty in simulating complex systems such as firms. Brains are far more complex than firms. The paradox thus arises even more intensely in neural network simulations of brain functioning in conditioning. In essence, the paradox is that model realism and usability for explanation and prediction are inversely related. The more realistic a model is, the less usable it will be for explanation and prediction purposes. Hence, models should not be too realistic. Otherwise, they will be unusable for explanation and prediction purposes. Despite the extreme simplicity of all artificial neural networks, it still is very difficult to understand how they simulate what they simulate. In the present case, a complete detailed explanation of the results would have to include all the unit activations and connection weights across all trials and timesteps of the training phase. Clearly, such an explanation would be too complicated to be useful. Hence, it is not sensible to give such complete detailed explanations in neural network simulations. Instead, incomplete, informal explanations in terms of a few key aspects of a network’s structure and functioning like the one given here are more viable and useful. To be sure, the individual units and connections are part of such structure and functioning. However, neural-network explanations are not (and should not) be restricted to the functioning of a single unit or connection, but include the whole network. A complete description of how a single unit functions, as given by the activation rule (see Appendix, Eq. (1)), or how a single connection works, as given by the learning rule (see Appendix, Eq. (2)) is necessary but not sufficient to understand how a network functions. Neural network explanations also include how units are connected to and affect one another. All of this raises the philosophical issue of model appraisal, the criteria for evaluating and selecting models. One criterion is empirical accuracy, which is often identified with a model’s truth

J.E. Burgos, Ó. García-Leal / Behavioural Processes 114 (2015) 63–71

and realism. Models are supposed to be true in the sense of being realistic. But then again, there is Bonini’s paradox, which forces us to simplify and, to this extent, make our models false. And empirical accuracy is not the only criterion for model evaluation. Other criteria are conceptual adequacy, scope (generality, unifying power), parsimony (simplicity, elegance), explanatory power, heuristic value (predictive power, fruitfulness, fertility). Not only are all these criteria defined differently. They also are weighed differently and interact in complex, often competing ways. Also, most apply only to future applications and, hence, prevent a fair appraisal at present. Model appraisal, then, is a very complex task. How is it to be achieved? One answer can be found in the distinction that Laudan (1977) made in his methodology of research traditions between two modalities of model (or theory) appraisal, namely, acceptance and pursuit. Acceptance is commitment to a model as if it were true. Laudan (1977) begins by dispensing with truth as an unresolvable issue and replacing it with problem-solving as the actual aim of science (not practical, engineering problems, but scientific problems, which he divides into empirical and conceptual). Acceptance, then, is determined by a model’s overall (historical) problem-solving effectiveness. Previous methodologies have emphasized exclusively on acceptance as the only modality of model appraisal. In such methodologies, truth is the primary if not only criterion for model appraisal. The same obtains for information-theoretic approaches to model appraisal and selection (e.g., Burnham and Anderson, 2002). The methodology of research traditions, in contrast, proposes another modality, in addition to acceptance, and also in terms of problem-solving, not truth. This other modality if pursuit, which is to work with a model tentatively, treating it as promising, per its problem-solving rate, without accepting it. Pursuit opposes the traditional view that models are appraised only to be accepted as “true” or rejected as “false.” On this methodology, then, it is entirely rational to simultaneously pursue some models and accept others. And models can be pursued for as long as it is necessary. There is no objective criterion to decide how much time pursuing a model is too much. The presence of multiple models in a scientific tradition, then, need not cause any discomfort. Different models can peacefully co-exist, and acceptance of some need not preclude pursuit of others. Conditioning and its neural substrates are very complex, so there still is much to be discovered about them. Hence, to adopt a strict acceptance modality to appraise models of conditioning seems premature, especially if its neural substrates are taken into account, as it is in neural network modeling. So far, all models, the present one included, have been falsified in one way or another. And chances are that new models will be falsified by new evidence. Perhaps, then, a more viable strategy is to emphasize pursuit over acceptance. Acknowledgments We thank Lewis Bizo for his invitation to give the talk that resulted in this paper at the 37th Annual Meeting of the Society for Quantitative Analyses of Behavior, and to submit the paper to this issue. We also thank two anonymous reviewers for useful comments to a previous draft. The executable file of the simulator, the necessary files to run the simulations, the source code, and simulation data are available upon request to the first author. Appendix. Mathematically, the model consists of an activation rule (or function) and a learning rule. The activation rule is used to compute the level of activation of every neurocomputational unit (circles in Fig. 1) at every moment of every trial. The learning rule is used

69

to change weights of variable connections (thin lines with button endings in Fig. 1). All activations and weights are numbers between 0.0 and 1.0. For both rules, time is conceived as consisting of discrete moments, occasions, epochs, or timesteps of indefinite but relatively short duration (say, 500–1000 ms, give or take, if a value is needed, but no particular duration is used for computational purposes). Each rule is described in turn. Their neuroscientific rationale has been described in detail in earlier papers, so it will only be briefly described here. Activation rule This rule has two modes: unconditional (automatic, “innate”) and conditional (“acquired”). Unconditional activation does not require any learning (weight change) and obtains when the activation of a certain input unit (S* in Fig. 1) at t is larger than zero, and the unit whose activation is being computed (j) is a D (dopaminergic unit) or R* (Pavlovian output unit; the architecture in Fig. 1 did not have this type of output unit). Otherwise, conditional activation obtains. Conditional activation requires learning (weight change, usually gain) and has two modes: reactivation and decay. All these modes of activation relate mathematically according to Eq. (1) below, which is a simplified version adapted for the present study, without inhibition or output units to simulate responding that is unconditionally elicited by the reward.

⎧ ⎨ aS∗,t , ifaS∗,t > 0 and j is D or R ∗ (unconditional activation); otherwise, aj,t =



L(excj,t )+j L(excj,t − 1)[1 − L(excj,t )], if L(excj,t ) ≥ ␪j,t (reactivation)

(1),

aj,t−1 − j aj,t−1 (1 − aj,t−1 ), if L(excj,t ) < ␪j,t (decay)

where t is a moment in time, and L(x) =

1

1+e

−x+ ␴

is the logistic func-

tion with constant mean ␮ = 0.5, variable standard deviation  (a free parameter; typically, ␴ = 0.1, which allows for a spontaneous activation level of approximately 0.006 with initial connection weights of 0.01), and argument x =

n 

ai,t wi,j,t where n denotes

i=1

the total number of units connected to j. Fig. 4 shows a generic neurocomputational unit (amplification of any of the circles of the network in Fig. 1) for conditional activation, intended to simulate a relatively small neuronal group (without inhibition). A neurocomputational unit is intended to simulate a relatively small neuronal group. It receives a finite number of afferent activations (a1,t , . . ., ai,t , . . ., an,t ) from one or more (up to n) excitatory pre-connection units. Each pre-connection unit is connected to a post-connection process (j) through a connection with a variable weight (w1,j,t , . . ., wi,j,t , . . ., wn,j,t ) where j computes a product of activation and connection vectors aj ,t and wj ,t , respectively. This product is the net amount of excitation on j and t (excj ,t ) and is passed as an argument to a logistic function (L). Whether the rule is reactivation or decay mode at t depends on a Gaussian threshold (␪j,t ), a random number generated according to a Gaussian distribution with a mean of 0.2 and standard deviation of 0.15. ␪j,t is dynamical, as it is generated at every moment for every computational unit. Two other free parameters are temporal summation ( j ) and decay (j ), where  j = 0.1 and j = 0.1. Learning rule This rule is used to change the weight w of every connection from unit i (afferent, source, or “presynaptic” unit) to unit j (target or “postsynaptic” unit) at t (wi,j,t ) at every moment. A connection is intended to simulate a relatively small synaptic group, and a weight to simulate the efficacy of a synaptic group to allow the activation of a postsynaptic by a presynaptic neuronal group. A weight can be

70

J.E. Burgos, Ó. García-Leal / Behavioural Processes 114 (2015) 63–71

Fig. 4. Generic neurocomputational unit (a circle in Fig. 1) for learned activation (without inhibition). i: afferent unit. j: generic target unit (representing only an S , M , H, D, or output unit). ai ,t : excitatory afferent activation (from either input or excitatory computational units). wi , j ,t : weight of a connection with an excitatory afferent unit. aj,t × wj,t : product between a vector of excitatory afferent activations and a vector of corresponding weights. excj , t : net amount of excitation on j at t. L: Logistic function with excj , t as an argument. ␪j ,t : Gaussian threshold. aj ,t : Efferent conditional activation of j at t (sent to another unit if j represents an S or M unit).

interpreted as the proportion of transmitter receptors on j that are controlled by i. The rule to change weights is as follows:



wi,j,t =

␣j aj,t dt pi,t rj,t , if dt ≥ 0.05 −␤j wi,j,t−1 ai,t aj,t , o therwise

(2),

where ˛ (rate of weight increment) and ˇ (the rate of weight decrement) denote the two free parameters of the rule (˛ = 0.5 and ˇ = 0.1). The other terms of the rule are:ai,t : activation of source unit (i);aj,t : activation of target unit (j); dt = dH,t = |aH,t − aH,t−1 | + dD,t (1 − dH,t−1 ), if j is an S” or H unit; dt = dD,t = aD,t − aD,t−1 , if j is an M” or D unit (see Fig. 1); pi,t =

ai,t wi,j,t−1

rj,t = 1 −

N n 

, where N = excj,t ;

wi,j,t .

i=1

The key factor is dt , a signal that modulates changes of all weights, inspired by the roles of hippocampal (e.g., CA1) and dopaminergic (e.g., VTA) areas in conditioning. In this sense, dt is a diffuse signal. It also is a discrepancy signal in that it is defined as a temporal difference between the activations of certain units (H, D; see Fig. 1) in successive pairs of moments. In early simulations, the dt threshold was 0, but was increased to 0.001 to simulate latent inhibition (see Burgos, 2003). After this, it was further increased to 0.05 to simulate other phenomena. The pi,t and rj,t factors introduce a “rich get richer, poor get poorer” sort of competition among connections for a limited amount of weight on a common target unit. In the network that was used for the simulation (see Fig. 1), this competition took place only between M 1 and M 2 for the amount of weight available on D. The pi,t factor, like some other models, includes a Hebbian component where connection weights partly depend on the activations of the connected units. In general, connections tend to gain weight (to a greater or lesser degree, depending on how much weight they have gained) whenever S* (see Fig. 1) is activated, and lose weight whenever S* is not activated. Successive timesteps with a zero S* activation thus promote weight loss. The same learning rule is used to modify connection weights across all times, connections, networks, units (whether they simulate emitted or elicited responding), and training protocols (whether they simulate operant or Pavlovian contingencies). It is in this sense that the model makes no learning (weight-change mechanism) distinction between operant and Pavlovian conditioning. This hypothesis does not imply that the two types of conditioning do not differ in anything, anatomically, behaviorally (the model makes anatomical and behavioral distinctions), or molecularly (e.g., Brembs, 2011 the model says nothing about the molecular level), or are not similar in other respects.

All activations and weights are updated each timestep according to an asynchronous random procedure. In this procedure, a randomly-ordered list of all units (or connections) is generated at t, and new activations (or weights) are computed in that order (according to Eq. (1) for activations, or Eq. (2) for weights). The activations (or weights) from t − 1 are immediately replaced by the new activations at t. Hence, by chance, the activation of a unit at t could depend on activations at t − 1.

References Bonini, C.P., 1963. Simulation of Information and Decision Systems in the Firm. Prentice-Hall, Englewood Cliffs, NJ. Brembs, B., 2011. Spontaneous decisions and operant conditioning in fruit flies. Behav. Process. 87, 157–164, http://dx.doi.org/10.1016/j.beproc.2011.02.005. Brown, P.L., Jenkins, H.M., 1968. Auto-shaping of the pigeon’s key-peck. J. Exp. Anal. Behav. 11, 1–8, http://dx.doi.org/10.1901/jeab.196811-1. Burgos, J.E., 1997. Evolving artificial neural networks in Pavlovian environments. In: Donahoe, J.W., Packard-Dorsel, V., (Eds.), Neural Network Models of Cognition: Biobehavioral Foundations, North-Holland, Amsterdam, 58–79. Burgos, J.E., 2003. Theoretical note: simulating latent inhibition with selection neural networks. Behav. Process. 62, 183–192, http://dx.doi.org/10.1016/S0376-6357(03)25-1. Burgos, J.E., 2005. Theoretical note: the C/T ratio in artificial neural networks. Behav. Process. 69, 249–256, http://dx.doi.org/10.1016/j.beproc.2005.02.008. Burgos, J.E., 2007. Autoshaping and automaintenance: a neural-network approach. J. Exp. Anal. Behav. 88, 115–130, http://dx.doi.org/10.1901/jeab.200775-04. Burgos, J.E., 2010. The operant/respondent distinction: a computational neural-network analysis. In: Schmajuk, N. (Ed.), Computational Models of Conditioning. Cambridge University Press, Cambridge (UK), pp. 244–271. Burgos, J.E., Donahoe, J.W., 2000. Structure and function in selectionism: implications for complex behavior. In: Leslie, J., Blackman, D. (Eds.), Issues in Experimental and Applied Analyses of Human Behavior. Context Press, Reno, pp. 39–57. Burgos, J.E., Flores, C., García, O., Díaz, C., Cruz, Y., 2008. A simultaneous procedure facilitates acquisition under an optimal interstimulus interval in artificial neural networks and rats. Behav. Process. 78, 302–309, http://dx.doi.org/10.1016/j.beproc.2008.02.018. Burgos, J.E., Murillo-Rodríguez, E., 2007. Neural-network simulations of two context-dependence phenomena. Behav. Process. 75, 242–249, http://dx.doi.org/10.1016/j.beproc.2007.02.003. Burnham, K.P., Anderson, D.R., 2002. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach, second ed. Springer–Verlag, New York. Burns, R., Burgos, J.E., Donahoe, J.W., 2011. Pavlovian conditioning: pigeon nictitating membrane. Behav. Process. 86, 102–108, http://dx.doi.org/10.1016/j.beproc.2010.10.004. Camerer, C.F., 2011. Psychological influences on economic choice: Pavlovian cuing and emotional regulation. In: Delgado, M.R., Phelps, E.A., Robbins, T.W. (Eds.), Decision Making, Affect, and Learning: Attention and Performance XXIII. Oxford University Press, Oxford, pp. 39–61. Camerer, C., Loewenstein, G., Prelec, D., 2005. Neuroeconomics: how neuroscience can inform economics. J. Econ. Lit., 9–64, XLIII. Catania, A.C., 1969. Concurrent performances: inhibition of one response by reinforcement of another. J. Exp. Anal. Behav. 12, 731–744, http://dx.doi.org/10.1901/jeab.196912-731. Crick, F., Koch, C., 1995. Are we aware of neural activity in primary visual cortex? Nature 375, 121–123, http://dx.doi.org/10.1038/375121a0. Davison, M., Jenkins, P.E., 1985. Stimulus discriminability, contingency discriminability, and schedule performance. Anim. Learn. Behav. 13, 77–84, http://dx.doi.org/10.3758/BF03213368.

J.E. Burgos, Ó. García-Leal / Behavioural Processes 114 (2015) 63–71 Daw, N.D., O’Doherty, J.P., 2014. Multiple systems for value learning. In: Glimcher, P.W., Fehr, E. (Eds.), Neuroeconomics: Decision Making and the Brain. , second ed. Oxford University Press, Oxford, pp. 393–409. Daw, N.D., Tobler, P.N., 2014. Value learning through reinforcement: The basics of dopamine and reinforcement learning. In: Glimcher, P.W., Fehr, E. (Eds.), Neuroeconomics: Decision Making and the Brain. , second ed. Oxford University Press, Oxford, pp. 283–298. Dawson, M.R.W., 2004. Minds and Machines: Connectionism and Psychological Modeling. Blackwell, Malden. Dayan, P., Seymour, B., 2009. Values and actions in aversion. In: Glimcher, P.W., Camerer, C.F., Fehr, E., Poldrack, R.A. (Eds.), Neuroeconomics: Decision Making and the Brain. Academic Press, London, pp. 175–191. Donahoe, J.W., Burgos, J.E., Palmer, D.C., 1993. A selectionist approach to reinforcement. J. Exp. Anal. Behav. 60, 17–40, http://dx.doi.org/10.1901/jeab.199360-17. Donahoe, J.W., Burgos, J.E., 1999. Timing without a timer. J. Exp. Anal. Behav. 71, 257–263, http://dx.doi.org/10.1901/jeab.199971-257. Donahoe, J.W., Burgos, J.E., 2000. Theoretical article: behavior analysis and devaluation. J. Exp. Anal. Behav. 74, 331–346, http://dx.doi.org/10.1901/jeab.200074-331. Donahoe, J.W., Crowley, M.A., Millard, W.J., Stickney, K.A., 1982. A unified principle of reinforcement. In: Commons, M.L., Herrnstein, R.J., Rachlin, H. (Eds.), Quantitative Analyses of Behavior: Matching and Maximizing Accounts, vol. 2. Balinger, Cambridge, MA, pp. 493–521. Dragoi, V., Staddon, J.E.R., 1999. The dynamics of operant conditioning. Psychol. Rev. 106, 20–61, http://dx.doi.org/10.1037/0033-295X.106.1.20. Gibbon, J., Balsam, P.D., 1981. Spreading association in time. In: Locurto, C., Terrace, H.S., Gibbon, J. (Eds.), Autoshaping and Conditioning Theory. Academic Press, New York, pp. 219–253. Gormezano, I., Kehoe, E.J., 1975. Classical conditioning: some methodological-conceptual issues. In: Estes, W.K. (Ed.), Handbook of Learning and Cognitive Processes: Conditioning and Behavior Theory, vol. 2. Erlbaum Associates, Hillsdale, NJ, pp. 142–179. Green, L., Rachlin, H., 1975. Economic and biological influences on a pigeon’s keypeck. J. Exp. Anal. Behav. 23, 55–62, http://dx.doi.org/10.1901/jeab.197523-55. Hursh, S.R., 1980. Economic concepts for the analysis of behavior. J. Exp. Anal. Behav. 34, 219–238, http://dx.doi.org/10.1901/jeab.198034-219. Hursh, S.R., 1984. Behavioral economics. J. Exp. Anal. Behav. 42, 435–452, http://dx.doi.org/10.1901/jeab.198442-435. Kagel, J.H., Battalio, R.C., Rachlin, H., Green, L., 1981. Demand curves for animal consumers. Q. J. Econ. 96, 1–15, http://dx.doi.org/10.2307/2936137. Kagel, J.H., Battalio, R.C., Rachlin, H., Green, L., Basmann, R.L., Klemm, W.R., 1975. Experimental studies of consumer demand behavior using laboratory animals. Econ. Inq. 13, 22–38, http://dx.doi.org/10.1111/j1465-7295.1975.tb01101.x. Kagel, J.H., Winkler, R.C., 1972. Behavioral economics: areas of cooperative research between economics and applied behavioral analysis. J. Appl. Behav. Anal. 5, 335–342, http://dx.doi.org/10.1901/jaba.19725-335. Killeen, P., 1994. Mathematical principles of reinforcement. Behav. Brain Sci. 17, 105–172, http://dx.doi.org/10.1017/S0140525X00033628. Killeen, P.R., 2011. Models of trace decay, eligibility for reinforcement, and delay of reinforcement gradients, from exponential to hyperboloid. Behav. Process. 87, 57–63, http://dx.doi.org/10.1016/j.beproc.2010.12.016. Klopf, A.H., 1988. A neuronal model of classical conditioning. Psychobiology 16, 85–125, http://dx.doi.org/10.3758/BF03333113.

71

Laudan, L., 1977. Progress and its Problems: Towards a Theory of Scientific Growth. University of California Press, Berkeley. Lea, S.E.G., 1978. The psychology and economics of demand. Psychol. Bull. 85, 441–466, http://dx.doi.org/10.1037/0033-2909.85.3.441. Machado, A., 1997. Learning the temporal dynamics of behavior. Psychol. Rev. 104, 241–265, http://dx.doi.org/10.1037/0033-295X.104.2.241. Mesulam, M.-M., 1998. From sensation to cognition. Brain 121, 1013–1052, http://dx.doi.org/10.1093/brain/121.6.1013. McSweeney, F.K., Bierley, C., 1984. Recent developments in classical conditioning. J. Consum. Res. 11, 619–631, http://dx.doi.org/10.1086/208999. Navakatikyan, M.A., 2007. A model for residence time in concurrent variable interval performance. J. Exp. Anal. Behav. 87, 121–141, http://dx.doi.org/10.1901/jeab.200701-06. Picker, M., Poling, A., 1982. Choice as a dependent measure in autoshaping: sensitivity to frequency and duration of food presentation. J. Exp. Anal. Behav. 37, 393–406. Rescorla, R.A., Wagner, A.R., 1972. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In: Black, A.H., Prokasy, W.F. (Eds.), Classical Conditioning II. Appleton-Century-Crofts, New York, pp. 64–99. Salamone, J.D., Correa, M., Farrar, A.M., Nunes, E.J., Pardo, M., 2009. Dopamine, behavioral economics and effort. Fron. Behav. Neurosci. 3, 1–12, http://dx.doi.org/10.3389/neuro.08.013.2009. Sánchez, J.M., Galeazzi, J.M., Burgos, J.E., 2010. Some structural determinants of Pavlovian conditioning in artificial neural networks. Behav. Process. 84, 526–535, http://dx.doi.org/10.1016/j.beproc.2010.01.018. Schmajuk, N.A., Lam, Y.-W., Gray, J.A., 1996. Latent inhibition: a neural network approach. J. Exp. Psychol.: Anim. Behav. Process. 22, 321–349, http://dx.doi.org/10.1037/0097-7403.22.3.321. Schwartz, B., Gamzu, E., 1977. Pavlovian control of operant behavior: an analysis of autoshaping and its implications for operant conditioning. In: Honig, W.K., Staddon, J.E.R. (Eds.), Handbook of Operant Behavior. Prentice-Hall, Englewood Cliffs, NJ, pp. 53–97. Schwartz, B., Williams, D.R., 1972. Two different kinds of key peck in the pigeon: some properties of responses maintained by negative and positive response-reinforcer contingencies. J. Exp. Anal. Behav. 18, 201–216, http://dx.doi.org/10.1901/jeab.197218-201. Seymour, B., Dolan, R., 2008. Emotion, decision making, and the amygdala. Neuron 58, 662–671, http://dx.doi.org/10.1016/j.neuron.2008.05.020. Staddon, J.E.R., Zhang, Y., 1991. On the assignment-of-credit problem in operant learning. In: Commons, M.L., Grossberg, S., Staddon, J.E.R. (Eds.), Neural Network Models of Conditioning and Action. Erlbaum, Hillsdale, NJ, pp. 279–293. Stout, S.C., Miller, R.R., 2007. Sometimes competing retrieval (SOCR): a formalization of the comparator hypothesis. Psychol. Rev. 114, 759–783, http://dx.doi.org/10.1037/0033-295X.114.3.759. Sutton, R.S., Barto, A.G., 1981. Toward a modern theory of adaptive networks: expectation and prediction. Psychol. Rev. 88, 135–170, http://dx.doi.org/10.1037/0033-295X.88.2.135. Williams, D.R., Williams, H., 1969. Auto-maintenance in the pigeon: sustained pecking despite contingent non-reinforcement. J. Exp. Anal. Behav. 12, 511–520, http://dx.doi.org/10.1901/jeab.196912-511.

Autoshaped choice in artificial neural networks: implications for behavioral economics and neuroeconomics.

An existing neural network model of conditioning was used to simulate autoshaped choice. In this phenomenon, pigeons first receive an autoshaping proc...
654KB Sizes 1 Downloads 8 Views