Anim Cogn DOI 10.1007/s10071-014-0757-9

COMMENTARY

Risk should be objectively defined: comment on Pele´ and Sueur Thomas R. Zentall • Aaron P. Smith

Received: 3 February 2014 / Revised: 5 May 2014 / Accepted: 12 May 2014 Ó Springer-Verlag Berlin Heidelberg 2014

Abstract Pele´ and Sueur (2013) propose that optimal decisions depend on delay of reinforcement, accuracy (probability or magnitude of reinforcement), and risk. The problem with this model is delay and accuracy are easy to define, but according to Pele´ and Sueur the third, risk, depends on the animal’s perceived or ‘‘interpreted’’ risk rather than actual (experienced) risk. Thus, choice of the smaller more immediate reward over the larger delayed reward (the delay discounting function) is viewed by the authors as optimal because delay is associated with increased risk (due to potential competition or predation). But perceived risk is assessed by the decision made (e.g., the slope of the discounting function), and since there is virtually no actual risk involved, by default if there is no independent means of measuring risk, according to Pele´ and Sueur, all choices can be viewed as optimal. Thus, optimality is an untestable concept. We suggest that risk be defined by the actual risk (given sufficient experience to judge it) and under conditions in which there is no actual risk (or risk is controlled), when animals choose an alternative that provides a lower rate of access to food, that one considers such choice to be suboptimal.

Pele´ and Sueur (2013) present a model of optimal choice that emphasizes the trade-off between speed and accuracy. They suggest importantly that the diffusion model (Wald and Wolfowitz 1948; Kulldorff et al. 2011) provides a better description of this trade-off than the simpler race model (Ratcliff and Rouder 1998) because it takes into T. R. Zentall (&)  A. P. Smith Department of Psychology, University of Kentucky, Lexington, KY 40506-0044, USA e-mail: [email protected]

account not only the alternative for which the evidence reaches threshold first but also the difference in evidence that accumulates between the two alternatives. Thus, the difference in evidence between two alternatives must reach threshold before a decision is made and if the discrimination between two alternatives is difficult, the time to make a decision will be greater. The diffusion model can be tested by asking whether the speed–accuracy trade-off results in an optimal rate of reward. That is, taking more time to make a decision generally results in marginal increases in accuracy that may be offset by a lower rate of reward. But Pele´ and Sueur (2013) note that if speed and accuracy were the only important parameters, it might be optimal for the animal to wait until there was sufficient evidence for it to make an accurate decision. They suggest, however, that animals must also take into account the risk associated with a delayed response. Risk resulting from delaying the response involves not only the possibility that the resource may be depleted by competitors but also by the possibility that the animal might be detected by a predator. Thus, the authors identify time, accuracy, and risk as the three variables that must be considered when evaluating the optimality of the decision process. Creatively, they also propose that these variables might be applied to the behavior of groups in deciding when to move to a new patch for foraging (primates), or a new nest for safety (ants). If one considers the time to make a decision and the accuracy of that decision, the interaction is quite clear, and it is relatively easy to calculate the speed that will provide an accuracy resulting in maximum efficiency or rate of reinforcement. But if one includes risk, the resulting calculation becomes more complex. Actual risk likely varies as a function of context and is quite probabilistic. Also, the

123

Anim Cogn

consequences of the risk must be taken into account. The consequence of competition may not be great (immediate loss of food), but the consequence of predation would be much greater. To deal with the complexity of risk, one could create a context in which the risk was minimal (e.g., no competition and no predators) and one could give the animals sufficient experience with that context to learn that the risk was minimal. But according to Pele´ and Sueur (2013) that would not be sufficient because ‘‘interpreted’’ risk would remain (p. 549) and there is no direct independent way to assess interpreted or perceived risk. For much of their article, to account for choices by individual animals, the authors use a temporal discounting task as a means of describing how the three variables interact. With this task, animals are typically given a choice between a small reward that is provided immediately (or with a relatively short delay) and a larger reward that is provided some time later. In general, choice of the larger later reward provides the greatest amount of food per unit time and is judged to be optimal. In humans, choice of the larger later reward is said to be a measure of selfcontrol, whereas choice of the smaller sooner is said to be a measure of impulsivity (see Odum 2011, for a review). In this context, choosing the larger later reward is often described as the optimal choice of the two alternatives (e.g., Odum 2011; Tobin and Logue 1994). However, Pele´ and Sueur (2013) would describe that choice as being rational, rather than optimal, because the choice does not take into account the possible risk incurred by opting for the larger later reward (p. 552). That is, as the delay of reward increases, the animal may have a sense of risk that the larger reward may be depleted or may involve an increased risk of predation (e.g., Myerson et al. 2003). Thus, Pele and Sueur further conclude that, in this context, choice of the smaller sooner reward may actually be considered optimal. The problem is, in these delay discounting experiments, there is typically no risk of subjects losing their reward (or of falling prey to a predator). More importantly, sufficient experience is typically provided for them to learn that there is no risk involved. The authors acknowledge that such tasks do not involve actual risk, yet often animals still discount those alternatives sufficiently to result in choice of the smaller sooner reward. To account for what would normally be considered suboptimal choice, the authors propose that in spite of the absence of risk, the animals ‘‘interpret’’ the larger later reward as one that involves risk. This choice, they argue, although no longer rational, can still be considered optimal. The problem with this account is that it does not allow for a test of the optimality of the animal’s decision. Whenever an animal chooses a lower rate of reward over a higher rate, one can assume that the perceived risk involved is sufficiently great to overcome the added value

123

(in food per unit time) of the larger later food. But if perceived risk accounts for their choice, and thus all choice is optimal, it is not clear that the term optimal is still useful. Pele´ and Sueur (2013) do recognize that animals may behave suboptimally under certain conditions. For example, given a bad experience with a predator, they may overestimate the danger of an encounter with a predator in the future (p. 552). Nesse (2001) refers to this as the smoke detector principle. But given the notion of perceived risk and the potential consequences of a false negative, from Pele´ and Sueur’s perspective, it is not clear that even this is suboptimal behavior because one must weigh the dire consequences of a missed signal against the relatively minor cost of a false alarm. Although one might consider the false alarm overly cautious, in the absence of sufficient data about the real risk, one might consider it prudent behavior. We propose that the term optimal be used only when risk can be defined in terms of the procedure or can be controlled by its objective and fully experienced elimination. We have examined a procedure in which we describe the animals’ choice as suboptimal because, on average, their choice provides them with less food than could otherwise be earned, and the actual risk is carefully controlled and adequately experienced (Stagner and Zentall 2010). In our procedure (see Fig. 1, top), pigeons are given extensive training involving forced and choice trials. On forced trials, they are forced to experience the consequences of one alternative or the other. On choice trials, they are given a choice between two alternatives: choice of one (left) presents a green stimulus 20 % of the time that is always followed by food, whereas choice of the other presents a red stimulus 80 % of the time that is never followed by food (thus, when they choose this alternative they receive food 20 % of the time). If they choose the other alternative, they are presented with one of two different colors (yellow or blue) that are each followed by food 50 % of the time. Thus, choice of the second alternative provides 2.5 times as much food as the first (50 vs. 20 %). Each color is presented for 10 s so delay to reinforcement is held constant. Under these conditions, and after considerable experience with this procedure so that they can learn the contingencies of reinforcement, pigeons show a strong preference for 20 % reinforcement over 50 % reinforcement, in spite of the fact that there should be no differential perceived risk due to delayed reinforcement because delay of reinforcement is carefully controlled. Furthermore, because this is a relatively easy spatial discrimination with a large amount of experience, the latency to choose is generally very short and relatively constant. A better example of this suboptimal-choice effect was found when we manipulated the magnitude of reinforcement (Fig. 1, bottom) rather than the probability of

Anim Cogn Choice White P = .20

White P = .80

P = .20

Initial link P = .80

A

Green Invested pecks

Green

or

P(rf) = 1.0

Red

Blue

P(rf) = 0

or

P(rf) = .5

P = .20

Green

White P = .80

or

Red

P = .20

Blue

Terminal link

White

B

P(rf) = .5

Choice White

Yellow

1 peck

Initial link P = .80

or

Yellow

Terminal link

C

Green

Remaining pecks 10 pellets

No pellets

3 pellets

3 pellets

Fig. 1 Top choice trials in which one alternative, associated with 20 % reinforcement, led to a signal for reinforcement on 20 % of the trials and a signal for the absence of reinforcement on 80 % of the trials (overall 20 % reinforcement) and in which the other alternative led to either of two signals, one on 20 % of the trials that led to a signal for 50 % reinforcement, the other on 80 % of the trials that led to a signal for 50 % reinforcement (overall 50 % reinforcement). Bottom choice trials in which one alternative led to a signal for 10 pellets of reinforcement on 20 % of the trials or a signal for the absence of reinforcement on 80 % of those trials (overall mean reinforcement per trial  two pellets) and the other alternative led to one signal that occurred on 20 % of the trials and the other that occurred on 80 % of the trials, but both signals were always followed by three pellets

reinforcement (Zentall and Stagner 2011). Specifically, if the pigeons chose one alternative, 20 % of the time they were presented with a green stimulus that was always followed by 10 pellets of food and 80 % of the time they were presented with a red stimulus that was never followed by food (thus on average, they received two pellets of food per trial for choice of that alternative). If they chose the other alternative again, they were presented with one of two different colors (yellow or blue) and in either case, the colors were always followed by three pellets of food. Again, the duration of the colors was 10 s each so that there was no differential delay to reinforcement between the two alternatives. In this experiment, choice of the second alternative provided 50 % more food than choice of the first alternative. Once again, the pigeons showed a strong preference for the suboptimal alternative that averaged two pellets over the certain three pellets. It should be noted that, in this case, there should be no uncertainty in the choice of the three-pellet alternative (as there might have been in the previous study in which the better alternative involved an uncertain 50 % reinforcement). That is, the outcome following choice of the three-pellet alternative was certain and should have involved no risk, whereas the outcome following choice of the two-pellet alternative was probabilistic and might have been perceived to have

Red

10 pecks

Fig. 2 a Choice trials started with the green key lit. After a number of pecks (10–20) to the green key, it was turned off and b a white light appeared on the center key. A single peck turned the white light off, and c a green light appeared on the left and a red light appeared on the right. Pecks required for reinforcement were always 10 to the red key and 30 minus the initial investment to the green key

involved some risk. Furthermore, once again, any increase in perceived risk due to delay of reinforcement should have been equal for the two alternatives. Surely, this is an example of suboptimal behavior. We have also examined suboptimal choice, in the absence of risk, with pigeons using an analog of the sunk cost effect. The sunk cost effect is the tendency for humans to complete a task that was started even if the relative future cost in delay or effort required for reinforcement would have been less if they had abandoned the first task and started a new task (Pattison et al. 2012). In this study, pigeons were trained to peck either a green light 30 times or a red light 10 times for food. They also learned that sometimes pecking the green light would be interrupted by a white light that required a single peck. Following the white light, on some trials, they would either have to complete the pecks to the green light for reinforcement or peck the red light 10 times for reinforcement. Finally, on test trials, after being interrupted while pecking the green light (Fig. 2a) and pecking the white light once (Fig. 2b), they were given a choice to complete the pecks to the green light or make 10 pecks to the red light (Fig. 2c). The results indicated that once they started to peck the green light, all of the pigeons chose to return to the green light rather than switch to the red light, even when as many as 20 additional pecks were required to the green light for reinforcement. Again, there was no actual risk involved in this task, as all of the outcomes were certain (not probabilistic). Furthermore, in this case, the delay to reinforcement was generally greater when the pigeons chose to return to the green light. Of course, one could always posit that the pigeons perceived that there was added risk in switching, but if so we

123

Anim Cogn

do not think that it would be appropriate to refer to this choice as optimal. Although in the three studies described here there was no actual risk involved, reward seeking under conditions of actual risk has been studied in rats by Jentsch, Woods, Groman, and Seu (2010) in an analog of the balloon task used with humans (Lejuez et al. 2002). In this task, the number of food pellets a rat can earn on each trial depends on the number of presses it makes on the add-lever prior to responding on the cash out lever. However, the probability of earning those pellets decreases (11.1 %) with each subsequent press after the initial add-lever press. That is, with this procedure, given the risk of losing the earned reward by continuing to respond and taking the risk into account, the number of presses that would provide the greatest amount of food per trial (what one might consider the optimal number of lever presses) would be five, yet the rats generally were somewhat risk averse and typically pressed only about three times per trial. In this experiment, continued responses to the add-lever both increased the risk of losing the already accrued reward and minimally increased the delay to reinforcement (the time to make the additional responses). The risk averse tendency of the rats in this experiment emulates what Pele´ and Sueur (2013) suggest occurs subjectively in temporal discounting tasks, but the actual risk in this balloon task is both objective and measurable. Thus, we would suggest that if the concept of risk is to be useful, it should be used to indicate actual risk that can be objectively assessed and, together with delay and magnitude of reinforcement, allow for the determination of optimal choice. Although the article by Pele´ and Sueur effectively illustrates how the trade-off between speed and accuracy may be moderated by risk, our preference would be to use actual risk (with sufficient experience) as the measure of risk and thus, when actual risk is eliminated or

123

held constant, we would interpret choice that is not rational as being suboptimal.

References Jentsch JD, Woods JA, Groman SM, Seu E (2010) Behavioral characteristics and neural mechanisms mediating performance in a rodent version of the Balloon Analog Risk Task. Neuropsychopharm 35:1797–1806. doi:10.1038/npp.2010.47 Kulldorff M, Davis RL, Kolczak M, Lewis E, Lieu T, Platt R (2011) A maximized sequential probability ratio test for drug and vaccine safety surveillance. Seq Anal 30:58–78 Lejuez CW, Read JP, Kahler CW, Richards JB, Ramsey SE, Stuart GL et al (2002) Evaluation of a behavioral measure of risk taking: the Balloon Analogue Risk Task (BART). J Exp Psychol Appl 8:75–84. doi:10.1037/1076-898X.8.2.75 Myerson J, Green L, Hanson JS, Holt DD, Estle SJ (2003) Discounting delayed and probabilistic rewards: processes and traits. J Econ Psychol 24:619–635. doi:10.1016/S0167-4870(03)00005-9 Nesse RM (2001) The smoke detector principle: natural selection and the regulation of defenses. Ann N Y Acad Sci 935:75–85 Odum AL (2011) Delay discounting: I’m a K, you’re a K. J Exp Anal Behav 96:427–439. doi:10.1901/jeab.2011.96-423 Pattison KF, Zentall TR, Watanabe S (2012) Sunk cost: pigeons (Columba livia) too show bias to complete a task rather than shift to another. J Comp Psychol 126:1–9 Pele´ M, Sueur C (2013) Decision-making theories: linking the disparate research areas of individual and collective cognition. Anim Cogn 16:543–556. doi:10.1007/s10071-013-0631-1 Ratcliff R, Rouder J (1998) Modeling response times for two-choice decisions. Psychol Sci 9:347–356 Stagner JP, Zentall TR (2010) Suboptimal choice behavior by pigeons. Psychon Bull Rev 17:412–416 Tobin H, Logue AW (1994) Self-control across species (Columba livia, Homo sapiens, and Rattus norvegicus). J Comp Psychol 108:126–133. doi:10.1037/0735-7036.108.2.126 Wald A, Wolfowitz J (1948) Optimum character of the sequential probability ratio test. Ann Math Stat 19:326–339 Zentall TR, Stagner JP (2011) Maladaptive choice behavior by pigeons: an animal analog of gambling (sub-optimal human decision making behavior). Proc R Soc B Biol Sci 278:1203–1208

Risk should be objectively defined: comment on Pelé and Sueur.

Pelé and Sueur (2013) propose that optimal decisions depend on delay of reinforcement, accuracy (probability or magnitude of reinforcement), and risk...
291KB Sizes 2 Downloads 3 Views