Topics in Cognitive Science 6 (2014) 312–330 Copyright © 2014 Cognitive Science Society, Inc. All rights reserved. ISSN:1756-8757 print / 1756-8765 online DOI: 10.1111/tops.12085

An Evolutionary Perspective on Information Processing Peter C. Trimmer, Alasdair I. Houston School of Biological Sciences, University of Bristol Received 1 October 2012; received in revised form 5 November 2013; accepted 30 November 2013

Abstract Behavioral ecologists often assume that natural selection will produce organisms that make optimal decisions. In the context of information processing, this means that the behavior of animals will be consistent with models from fields such as signal detection theory and Bayesian decision theory. We discuss work that applies such models to animal behavior and use the case of Bayesian updating to make the distinction between a description of behavior at the level of optimal decisions and a mechanistic account of how decisions are made. The idea of ecological rationality is that natural selection shapes an animal’s decision mechanisms to suit its environment. As a result, decision-making mechanisms may not perform well outside the context in which they evolved. Although the assumption of ecological rationality is plausible, we argue that the exact nature of the relationship between ecology and cognitive mechanism may not be obvious. Keywords: Bayesian decision making; Bias; Ecological rationality; Natural selection; Signal detection

1. Introduction

How we think shows through how we act. —David Joseph Schwartz Evolution by natural selection is sometimes summarized as “survival of the fittest,” which suggests that there is some currency, “fitness,” which will tend to be maximized over evolutionary time. This mindset has allowed behavioral ecologists to build and test theories of which action an animal will take according to its circumstances. The theories are termed “normative models”; they evaluate options and so suggest what ought to be Correspondence should be sent to Alasdair I. Houston, Modelling Animal Decisions Group, School of Biological Sciences, University of Bristol, Woodland Road, Bristol BS8 1UG, UK. E-mail: a.i.houston@ bristol.ac.uk

313

P. C. Trimmer, A. I. Houston / Topics in Cognitive Science 6 (2014)

done if fitness is to be maximized. In subsequent sections, we summarize various normative models which are applicable in different paradigms. The hypothesis that the behavior of an animal maximizes its fitness relies upon the species having been in a given environment for long enough for natural selection to have full effect. If behavior is to be approximately optimal in a new environment, then the cognitive mechanisms governing behavior must be sufficiently plastic to allow such behavior to be produced (as opposed to being genetically constrained, for instance); this assumption is referred to as the behavioral gambit (Fawcett, Hamblin, & Giraldeau, 2012). Learning is one of the processes which allows behavior to be fine-tuned to an adaptive solution as information reduces uncertainty about the environment. The information processing relating to learning is therefore a crucial component of fitness in a world where decisions should depend on expectations about the environment. The fitness measure which is relevant to an individual making a decision (or series of decisions) is termed reproductive value (Fisher, 1930; Houston & McNamara, 1999). The value depends on both the current condition of the individual and his or her strategy (i.e., what action the individual will choose given his or her condition) in the future (McNamara & Houston, 1986). A technique known as Dynamic Programming (Bellman, 1957) can be used to calculate the optimal strategy of an animal and, with it, the reproductive value (Clark & Mangel, 2000; Houston & McNamara, 1999). The technique has been used to explain various effects, such as why birds sing at dawn (McNamara, Mace, & Houston, 1987) and why food choices can appear irrational (Houston, McNamara, & Steer, 2007b). Although such normative models can be used to identify optimal behavior, this often says little or nothing about how the animal actually reaches those decisions, that is, the mental mechanisms (or neuroscience) involved in such decisions. An alternative approach is to make use of findings from neuroscience to hypothesize particular general mental mechanisms. Modeling can then be used to identify the effects of the mechanism parameters, to make predictions about behavior which can then be tested. Although behavior in a lab setting may appear irrational or suboptimal (Houston & McNamara, 1989), it can make sense when considered at a more general level of how the mental mechanisms would respond to stimuli in the natural world. When the behavior can be understood in this way, the animal is said to be behaving in an “ecologically rational” manner. In Section 2, we explain more fully what is meant by the term “ecological rationality.” The rest of the paper is structured around identifying such behavior and the conditions under which it is (or is not) expected to evolve. In Section 3, we identify some key modeling constructs which allow optimal behavior to be identified in various paradigms. This is reinforced by showing how such work can imply cognitive mechanisms/behavioral styles, using particular examples. In Section 4, we discuss how different questions can be asked about the evolution of cognitive mechanisms. We provide examples where the assumption of particular mechanisms allows the likely parameters of such mechanisms to be established (together with behavioral implications) and consider the more difficult problem of identifying why particular classes of mechanism evolve. We conclude with a discussion of the (often subtle) relationship between ecology and cognitive mechanism.

314

P. C. Trimmer, A. I. Houston / Topics in Cognitive Science 6 (2014)

2. Ecological rationality Behavioral ecologists do not assume that animals carry out the mathematical calculations that are performed to make predictions about behavior; that is, animals are not literal maximizers (Houston, 2009). Instead, it is assumed that animals follow simple rules that perform well in particular circumstances (McNamara & Houston, 1980; Simon, 1956). In behavioral ecology, these rules are often known as rules of thumb (e.g., Houston & McNamara, 1984; McNamara & Houston, 1980), whereas in psychology, they are called heuristics (Simon, 1990; Tversky & Kahneman, 1974); see Hutchinson and Gigerenzer (2005) for a review. The idea of ecological rationality is that natural selection shapes an animal’s decision mechanisms to suit its environment. As a result, decisionmaking mechanisms may not perform well outside the context in which they evolved. An animal’s decision-making process can exploit regularities in the environment. Sulikowski and Burke (2011) show that the searching behavior of noisy miners (Manorina melanocephala) reflects the properties of the resources that they are exploiting. When animals are searching for food, they may concentrate their effort in the area where a food item has been found (e.g., Thomas, 1974). This pattern of behavior makes sense if food items are clumped so that finding an item makes it more likely that other items are in the vicinity. If items are distributed randomly, then finding an item provides no information and area-concentrated search is not expected to evolve (Krakauer & Rodriguez-Girones, 1995). We use the term “ecological rationality” (Hutchinson & Gigerenzer, 2005) to disambiguate the term from “rationality” as used by economists (see Kacelnik, 2006). Economic rationality is a collection of compelling rules, such as transitivity and regularity, which support utility theory. This is the equivalent of an absolute fitness measure (where each option has a fixed fitness value, irrespective of other options). Although the concept of absolute fitness may seem intuitive, it is a mistake to assume that options have absolute fitness values. Although utility theory is based upon axioms such as transitivity (if A is better than B, and B is better than C, then A is better than C) and regularity, such properties of choice in animals do not always—and should not always—hold. This is because the value of an option depends on future behavior, which in turn can depend on the options which are currently (and are likely to remain) available (Houston, 1997, 2012; Houston et al., 2007b; Trimmer, 2013). Consequently, economic rationality is not necessarily useful in assessing or predicting the mental mechanisms behind animal behavior. In this paper, we regard behavior as ecologically rational when it maximizes reproductive value in the animal’s ecological setting. This says nothing about how well an animal will perform in a different setting, such as a lab test, though seemingly irrational behaviors in such tests may be indicative of the cognitive processes which govern decisions in the wild. Ecologically rational behavior will not always evolve; behavior in the wild can also be suboptimal. This is because optimal behavior relies upon mechanisms which are capable of supporting such decisions; we identify cases where such mechanisms will not tend to evolve in an environment with an unchanging fitness landscape.

315

P. C. Trimmer, A. I. Houston / Topics in Cognitive Science 6 (2014)

3. Optimal behavior Typically, optimal behavior in a given circumstance will involve some kind of tradeoff; this may be between immediate probabilities of reward and risk of punishment, or between short- and long-term gains or between speed and accuracy. In some situations, the behavior will be chosen from a continuous range of possibilities (such as how long to spend in a patch, or how much food to eat before drinking), while in others, the behavioral choices will be discrete (e.g., turn left or right to approach food or water, accept or reject a prey item). In non-game theoretic scenarios, if expected reward is to be maximized, the optimal choice will correspond to that predicted by Bayesian decision theory. 3.1. Bayesian updates A Bayesian update combines previous expectations and new data to reach an updated estimate of the state of the world. For instance, we may be interested in the probability of a particular event, A, occurring. Without any information about the occurrence of other events, our best guess about the probability of event A is P(A). This is known as the prior probability. If another event, B, occurs, then we would like to know the conditional probability of event A given that B has occurred, P(A|B). This is known as the posterior probability. If we know the conditional probability of event B given that A has occurred (known as the likelihood), then Bayes’ theorem allows us to calculate the posterior probability according to: PðAjBÞ ¼ PðBjAÞP(A)=P(B) Bayes’ theorem therefore allows us to update our expectations for the future as events occur. McNamara and Houston (1980) provide a general account of Bayesian priors and updates in relation to animal behavior; this is further elucidated by Trimmer et al. (2011). Although Bayesian updates provide the optimal probability distribution for estimating the probability of a particular outcome, the updates can often be difficult or impossible to calculate in a tractable manner, so may say very little about the operation of mental mechanisms which make estimates or decisions in an animal (though Vilares, Howard, Fernandes, Gottfried, & Kording, 2012 indicate that humans have separate neural representations of priors and likelihoods). Thus, we see a divide between what is optimal from an external (normative) perspective on how an animal should behave and how approximations of that optimal behavior are reached (the mechanistic perspective). The Bayesian approach provides the theoretical backbone of many mathematical tools which are useful for identifying optimal behavior, such as signal detection theory, bandit models, Kalman filtering, and drift-diffusion models; the first two of which we shall now describe.

316

P. C. Trimmer, A. I. Houston / Topics in Cognitive Science 6 (2014)

3.2. Signal detection theory Many species of organism, including humans, have to make decisions on the basis of ambiguous information. For example, an animal might hear a noise that could have been made by a predator or by something harmless. Should the animal behave as if there is a predator present or as if there is not? This question is analyzed by Nesse (2005) and Haselton and Nettle (2006) using signal detection theory (SDT), which is a particular example of Bayesian decision making. An interesting aspect of the application of SDT is that it suggests some general principles about the optimal way to respond to ambiguous information. In this section, we outline SDT in a way that brings out the similarity with techniques that are commonly used in behavioral ecology. We then review attempts to obtain general conclusions from SDT, putting particular emphasis on claims about human decision making. SDT (Green & Swets, 1966) was developed to model the performance of humans trying to distinguish between signals and noise. In a discrete trial procedure, a subject is presented with a series of stimuli. After each presentation, the subject has to choose between two responses: yes (i.e., the stimulus was a signal) and no (i.e., a signal was not presented). The stimuli vary along a single dimension. For auditory stimuli this could be loudness. High values are likely to be caused by signals, whereas low values are unlikely to be signals. If the stimulus value is denoted by y, the subject’s problem is to find the best critical threshold value yt of y, such that all stimuli above yt are treated as signals and all stimuli below yt are treated as noise. Green and Swets (1966) consider various meanings of “best” in this context. From an evolutionary point of view, the best critical value is the one that maximizes reproductive value. Given that there are two possible decisions (yes and no) and two possible states of the world (signal present and signal absent), there are four possible outcomes, as shown in Table 1. The probability of these four outcomes is determined by two things: the probability h of deciding yes when the signal is present and the probability f of saying yes when a signal is absent. To specify the optimal behavior, it is necessary to assign a value to each of the four outcomes. We denote the values by vh, vm, vf, and vcr for hit, miss, false-positive, and correct rejection. In general, h and f depend on the signal level y, but it is instructive to start with a case in which the subject ignores the signal (or, equivalently, the signal provides no informa-

Table 1 The four possible outcomes in signal detection theory Decision on Signal Present

Signal present Signal absent

Yes

No

Hit False alarm (false positive)

Miss (false negative) Correct rejection

317

P. C. Trimmer, A. I. Houston / Topics in Cognitive Science 6 (2014)

tion). In this case, h equals f and we have a 45° line when h is plotted against f running from (0,0), where the subject always says no, to (1,1) where the subject always says yes. In this case, if the signal is present with probability p, then it is better to say yes than no when: pvh þ ð1  pÞvf [ pvm þ ð1  pÞvcr that is, when pðvh  vm Þ [ ð1  pÞðvcr  vf Þ: It is of course safe to assume that vh > vm and vcr > vf. It is therefore clear that if p is large (small), it is better to guess that the signal is present (absent), and if the probability is approximately 50%, then with similar payoffs for being correct, it is best to say yes (no) if the payoff for a false alarm is greater (less) than the payoff for a miss. Although the above is obvious, it allows a very simple mental mechanism (in the form of a decision-tree) to approximate optimal decision making, as shown in Figure 1. In general, it is better to take the intensity of the stimulus, y, into account whenever the signal provides information. If L is the likelihood ratio (i.e., the probability of y having been produced by the signal, divided by the probability of y having been produced by background noise) then it is better to say yes whenever (Egan, 1975): L[

p vcr  vf : 1  p vh  vm

ð1Þ

If an individual has the mental machinery to compute Eq. (1), then it can of course choose optimally. However, if it has the mental machinery shown in Figure 1, then it may be able to approximate the SDT choice reasonably well if it is able to incorporate the stimulus information, y, as a bias—either in its estimate of p or in the expected costs of making errors. Note that although there is only one optimum behavior to a given signal, there are many mechanisms which could implement that behavior, and even more which could approximate it; for example, reversing the order of the first two decision boxes in Figure 1 would produce identical behavior. This is one reason why it is difficult to infer cognitive mechanism from an animal’s ecology. Finally, note that if the animal evolves the ability to discriminate the signal more easily (relative to the background noise), or the environment alters such that the signal is

Fig. 1. A decision-making mechanism that approximates the optimal signal detection theory outcome.

318

P. C. Trimmer, A. I. Houston / Topics in Cognitive Science 6 (2014)

more easily detected, then Eq. (1) shows that the optimal threshold should often be altered somewhat. The direction of change (i.e., toward or against choosing yes) will depend on the four payoff values; in an approximated version, this may simplify or (more likely) increase the complexity of the bias associated with a stimulus. From a behavioral perspective, Nesse (2005) identifies that some effects in animal behavior can be explained in an analogous manner to the design of a smoke detector. When designing a fire alarm, the question of how sensitive to make the system should be resolved based on costs and benefits. The cost of failing to warn of a real fire is much more substantial than the cost of a false alarm. Consequently, rather than set off the alarm when the sensors indicate that the probability of a fire is 50%, a much lower measure should be used, leading to many “fire alarms” being false alarms. Nesse compares this principle with the warning systems in animals, such as fear or pain, showing, for instance, that the startle reflex of birds taking to the wing at the risk of a predator should lead to a huge proportion of flights being false alarms. This is backed up by empirical findings; for example, Haftorn (2000) identifies this effect in willow tits (Parus montanus). Further examples are supplied by Haselton and Buss (2000) and Haselton and Nettle (2006). Getty and Krebs (1985) show that great tits (Parus major) in a changing environment act as though they are gradually altering their decision threshold, such that their behavior is equivalent to that of signal detection theory, but with a lag being imposed for learning the new (best) parameters. 3.3. Bandit models A bandit model “represents in a simplified way the general question of how we learn—or should learn—from past experience” (Robbins, 1952). In Reinforcement Learning (Sutton & Barto 1998), the general issue is known as the Explore-Exploit problem: whether to further explore the world (in the hope of finding better returns) or simply exploit the knowledge which has been gained so far. The term “bandit model” comes from thinking about decisions in terms of choosing between gambling machines (known as one-armed bandits) to maximize expected profit. Fundamentally, there are two styles of bandit problem: one where the size of reward (if successful) is known but the probability of success on each arm is unknown, and one where a reward is always obtained, but the expected size of reward on each arm is unknown. With an unlimited number of trials, the “value” of an arm can be obtained from the Gittins index (Gittins, 1989), with the highest valued option being the best choice at the current time. In the animal literature, the choice of arm could represent a behavioral action such as a choice of location. For instance, consider a scenario in which an animal has a choice of two locations at which it could forage. Assume for simplicity that the locations differ only in their predation risk, and that the animal knows the probability of seeing a predator, p, in a time step at one of the locations (e.g., where it reached maturity) but has no indication of the risks elsewhere. We are interested in the behavior which maximizes its probability of survival over a given number of time steps, N (e.g., surviving each day of winter before it can reproduce). How should the animal behave?

319

P. C. Trimmer, A. I. Houston / Topics in Cognitive Science 6 (2014)

McNamara, Trimmer, and Houston (2012a, appendix 3) analyze the scenario assuming that at each time step, the animal may see a predator without being killed but that having seen a predator, there is some probability, d, that the animal is killed in that time step. The probability of seeing a predator in a given time step is assumed to depend only on the current location (independent of other time steps), and a predator is assumed to be seen before being killed in a given time step. Here, we recap that work and specify the equations in a slightly different way. We then use the model to show effects relating to cognitive mechanisms in Section 3. By assuming that the probability of seeing a predator at the unknown location has a beta distribution with uniform initial prior (so the hyperparameters of the beta distribution, a and b, are initially each set to 1), each update results in a new beta distribution with one of the parameters having been incremented. In the unknown environment, a is incremented each time a predator is seen and b is incremented each time step that no predator is seen. The mean of the distribution, a/(a+b), is the expected probability of seeing a predator in the next time step. Note that if at any point, the known location is chosen, no new information is obtained, so there is no reason to switch back to the unknown location. The optimal strategy for maximizing survival over the N time steps can be formulated as a recurrence relation, where the expected survival chances at the next time step are used at each stage. Here, p is the probability of success at the known location, and N is the number of remaining time steps. R is the reproductive value, which we take to be the probability of survival given that decisions are optimal: ! ð1  pdÞRða; b; p; N  1Þ; Rða; b; p; NÞ ¼ max a b aþb ð1  dÞRða þ 1; b; p; N  1Þ þ aþb Rða; b þ 1; p; N  1Þ with

  að1  dÞ þ b Rða; b; p; 1Þ ¼ max 1  pd; : aþb

By identifying the critical probability pc (of seeing a predator at the known location) at which the animal has an equal chance of survival whether it takes the known or unknown patch first, we find that the animal can rationally choose the unknown location even though the initial risk of mortality is greater than at the known location. The effect increases with N, as shown in Figure 2. Figure 2 shows that it can be optimal for the animal to choose the unknown location when the expected risk of mortality is greater than at the known location. Fundamentally, this is because the animal is interested in maximizing its overall probability of survival, not just the expected survival probability in a single time step. Bandit models capture the trade-off between information gain and immediate reward when a series of trials will be carried out. This has obvious implications for animal behavior, some of which have been tested. For instance, Krebs, Kacelnik, and Taylor

320

P. C. Trimmer, A. I. Houston / Topics in Cognitive Science 6 (2014)

Fig. 2. The critical probability of seeing a predator (pc) at the known location, above which the unknown location should be visited first, in relation to the total number of time steps (N). The solid line relates to the learning situation, in which the animal can switch patches based upon what it learns; the dashed line shows the smaller bias when learning is irrelevant because the first location to be chosen will be repeatedly chosen for the duration [d = 0.1; uniform prior on the unknown location: a = b = 1].

(1978) argue that great tits (Parus major) are able to approximate the Bayesian optimum over a number of trials when sampling between two foraging patches. However, many rules are able to perform well (Houston, Kacelnik, & McNamara, 1982), so with a relatively low selective pressure on accuracy, the question of which mechanisms will be selected for will often depend on how well the rules generalize across the range of ecological circumstances faced by the animal in question. In order to be able to calculate the optimum with the computing resources available at the time, Krebs et al. (1978) made the simplifying assumption that the optimal strategy was to sample the options equally and then decide to make all the remaining responses on the better option. Unfortunately, this assumption has encouraged the mistaken belief that the optimal strategy has this form (e.g., Lea, McLaren, Dow, & Graft, 2012). 3.4. Foraging for information In terms of animal behavior, signal detection theory was first used to analyze human performance. Its use was then extended to non-human animals. The idea of humans foraging for information involves the use of an approach that was developed to study the foraging behavior of animals. Optimal foraging theory assumes that foraging behavior can be understood in terms of the maximization of reproductive value. It is often assumed that rate of energetic gain is a suitable surrogate for reproductive value (Houston & McNamara, 2013; Stephens & Krebs, 1986). In the patch use paradigm, a forager travels

321

P. C. Trimmer, A. I. Houston / Topics in Cognitive Science 6 (2014)

through the environment encountering patches of food. If the forager exploits a patch, its rate of energetic gain decreases because it becomes harder to extract energy as the remaining energy falls. The forager has to decide when to leave the patch if it is to maximize its overall rate of energetic gain. The optimum is given by the marginal value theorem which states that the forager should leave when the instantaneous rate of gain on the patch falls to the overall rate of gain from the environment (Charnov, 1976). The marginal value approach has been used to predict how humans should behave in order to maximize the information gained from a given time on the Internet (Pirolli, 2007). One problem with this analysis is that the marginal value theorem is based on an animal that has a very long time (essentially an infinite time) available to it. When time is relatively short, the optimum will not necessarily be given by the marginal value theorem. This may not be a major problem; the cost of using the marginal value theorem rather than the true optimum might be small. 3.5. Example: When is it worth learning, and when might learning evolve? There are various possible costs involved in learning, such as energy requirements and speed of reaction. Here, we shall consider the cost of learning in a particularly simple manner using the bandit model of Section 2, imposing a cost for being able to learn as an increase in P(predation|predator seen). This could represent the additional time required to move to safety (due to the heavier brain or a slower reaction speed due to additional processing). For instance, let us assume that P(predation|predator seen) = 0.2 when no learning occurs. Figure 3a shows that if an animal which learns (perfectly) has an increased risk of predation when a predator is seen (0.24 rather than 0.2), then in most circumstances, the non-learning animal has a higher probability of surviving the time period. In contrast, Figure 3b shows that if learning increases the risk only slightly (from 0.2 to 0.21), there is a large range of environmental conditions under which it is better to learn. Learning in a scenario such as that of Fig. 3a is unlikely to evolve (even in the shaded region, where learning would be beneficial), because a mechanism would need to evolve which took account of many updates in a near-perfect manner for any benefit to be gained. In the conditions of Fig. 3b, there is more scope for a learning mechanism to evolve, especially if there were a small number of updates, as a suboptimal mechanism may still provide benefit (see Trimmer, McNamara, Houston, & Marshall, 2012). Having evolved a learning mechanism, if the conditions were then gradually to shift to those of the shaded region of Fig. 3a (e.g., the effective number of time steps gradually increased and the speed of predators increased so the cost of any extra weight in the brain was more significant), it is entirely possible that such learning mechanisms would be retained. In summary, we have identified that environmental conditions may need to alter if optimal (or nearly optimal) strategies are to evolve, that is, rather than cognitive mechanisms developing to ideally match the current conditions, the environment (i.e., fitness

322

P. C. Trimmer, A. I. Houston / Topics in Cognitive Science 6 (2014) (a)

(b)

Fig. 3. The shaded regions show the conditions under which it is better to learn, paying the extra predation cost when a predator is present. P (predation|predator seen) = 0.2 when no learning occurs in each case. (a) P (predation|predator seen) = 0.24 with learning. Without sufficient time to make use of learning, it is better to have a fixed response. However, with a great deal of time left to go, it is also better not to pay the cost of learning. (b) P (predation|predator seen) = 0.21 with learning. Below the shaded region, it is always better to stick with the current environment. Above the shaded region, it is better to always switch to the unknown patch without paying the cost for learning.

landscape) needs to have altered over time for the current conditions to be dealt with suitably by cognitive mechanisms. 4. Mechanisms and levels of questions McNamara and Houston (2009) identify various levels at which questions about mechanisms can be asked. The two of these that we discuss below are as follows:

• •

For a given type of mechanism or rule, how is evolution expected to have tuned its parameters? Why do animals have particular organizational principles?

Each question can lead to different answers about what is expected to evolve. We now provide examples of each type of question, followed by a case where either question is applicable.

323

P. C. Trimmer, A. I. Houston / Topics in Cognitive Science 6 (2014)

4.1. Class of mechanism is assumed If it is assumed that a particular cognitive mechanism is used for a particular task (or set of tasks), then the best possible set of parameter values for that mechanism can be calculated when the distribution of environmental conditions is known. This is useful, because such a mechanism, together with the optimal parameter settings, can then be used to predict behavior in other contexts. Trimmer et al. (2008) analyze the behavior of a foraging mammal, assuming that the brain has two components which are capable of making decisions. Signal detection theory is used to represent one component, which makes very fast but often inaccurate decisions. The other component is assumed to be more sophisticated, taking in information over time to allow a more accurate, but slower, decision (cf. the fast and slow processes described by Kahneman, 2011). By allowing each component to act in isolation, or allowing different amounts of information to flow between the two systems, it is possible to compare the behavior of the optimal system with that of other systems with suboptimal information linkage. This allows the trade-offs between speed and accuracy to be seen in relation to the mechanisms involved, and it provides an explanation for why the more primitive (less accurate) system has been retained. Tuning of parameters can also be used to explain why animals appear suboptimal in laboratory tests. For instance, empirical studies show that animals of various taxa value food items according to the state of the animal at the time of taking that food option (Aw, Holbrook, de Perera, & Kacelnik, 2009; Kacelnik & Marsh, 2002); that is, items which are taken when the animal is hungry are valued more highly and are more likely to be taken in the future than other, equivalent, food options even if the animal is no longer hungry. The effect is known as state-dependent valuation and initially appears illogical, since one might expect that only the expected value of the food (given the current state) would be of relevance to a choice, as often assumed in state-dependent models (e.g., see Houston & McNamara, 1999). By evolving a parameter which weights the value of food (positively or negatively) according to reserves at the time of taking the option, McNamara, Trimmer, and Houston (2012b) show that it can be better to value options taken at low reserves more highly than at high reserves when the best action depends on the current environment. This is because there is a correlation between the current quality of the environment and the current reserve levels of an animal. Thus, behavior which can seem illogical, such as state-dependent valuation, can be explained as ecologically rational. There are two ways in which evolved mechanisms may not produce ecologically rational behavior: 1. The mechanism may not be able to support optimal behavior, even with the best possible settings. 2. Evolution may not be able to fine tune the parameters to the environment. This could be due to (a) the fitness landscape having local maxima which trap the

324

P. C. Trimmer, A. I. Houston / Topics in Cognitive Science 6 (2014)

solution in a suboptimal position; or (b) mutation rates meaning that mechanisms, and thus behavior, are often not optimal. We now consider the situation where rather than assuming a mechanism, we attempt to evolve one. 4.2. Evolution of mechanism classes A particular function may be carried out by different types of mechanism. This raises the question of which class of mechanism we might expect to evolve, and whether such a mechanism would then allow ecologically rational behavior. For instance, there are various possible styles of learning. Bayesian learning combines past information with new information in an optimal manner to reach the best possible estimate. However, empirical studies provide evidence that animals often update value estimates using something more like the Rescorla–Wagner rule (Rescorla & Wagner, 1972), which combines past and current information in a very simple manner. In its simplest form, the rule can be written as: new estimate ¼ old estimate þ aðnew value received  old estimateÞ Thus, learning only occurs when the value of an option is found to differ from the old estimate, that is, its expected value. Trimmer et al. (2012) make use of genetic programming when considering what learning rules tend to evolve under various environmental conditions. Despite choosing a situation where the optimal rule could be written in an equally simple form as the Rescorla–Wagner rule, the latter is more likely to evolve in many situations. This is because the Rescorla–Wagner rule can still perform well with suboptimal parameter settings, whereas a rule structured to calculate the optimal update (if the parameters are set perfectly) often performs very badly with suboptimal parameter settings, as shown in Figure 4. Consequently, the Rescorla–Wagner rule could be “discovered” by mutations and recombinations far more easily than structures which could perform the optimal update; the parameters of the Rescorla–Wagner rule could then evolve to near-perfect values with additional generations. The evolution of the Rescorla–Wagner rule, rather than a rule which supports the optimal learning update, shows that the class of mechanism which evolves will not necessarily be ecologically rational. Thus, there can be a difference between the behaviors which evolve in a given ecological setting and behaviors which are ecologically rational. Also related to learning, Lotem and Halpern (2012) consider situations in which it is sensible to have a developmental period during which learning takes place, followed by not learning for the rest of the animal’s life. The developmental period and the data acquisition mechanism (e.g., some kind of mental template) should co-evolve according to the ecological conditions. For instance, a newly hatched duckling will “imprint” on anything which matches its template for a “mother,” subsequently following that

325

P. C. Trimmer, A. I. Houston / Topics in Cognitive Science 6 (2014)

Fig. 4. The fitness landscape with respect to c for the Rescorla–Wagner rule, DV = c(k  V), and the rule structure which supports the Bayesian optimum, DV = c(k  0.5). Fitness in this case is 1/expected error. Note that although the rule structure supporting the optimum reaches a higher maximum fitness peak, the fitness of the Rescorla–Wagner rule is generally higher (with suboptimal parameter settings), meaning that such a rule is more likely to become established before parameters become fine-tuned.

individual without relearning at any point. The evolution of a template and a short learning period is adaptive because, in practice, the individual that best fits the template is almost always the mother. Similarly, wood frogs (Rana sylvatica) do not learn about a particular type of predator (tiger salamanders, Ambystoma tigrinum) after embryonic exposure to their smell (Ferrari & Chivers, 2011); arguably, this is because embryonic exposure to a particular chemical indicates that the chemical is unlikely to be useful as a warning signal. 4.3. Choice of evolutionary questions The speed-accuracy trade-off was identified in humans, but it is now applied to nonhuman animals (Chittka, Skorupski, & Raine, 2009). We consider a simple case based on bees visiting flowers. The framework we use is the prey choice paradigm from foraging theory (Stephens & Krebs, 1986). There are two prey types (i.e., two types of flower). Each type has an energy content e, a handling time h, and an encounter rate k. Instead of allowing a range of decision times, assume that a predator can either take all items that it encounters or can spend a fixed time s deciding on the type of item. Two strategies need to be considered: specialize (only handle type 1 items) and generalize (handle both types). Only specialists pay the time cost of discriminating between the types. As a result, specialists are slow but have perfect discrimination, whereas generalists are fast but have no discrimination (their choice of item is determined by the encounter rates). The gross gain rates for the two strategies are as follows:

326

P. C. Trimmer, A. I. Houston / Topics in Cognitive Science 6 (2014)

Rsp ¼

k1 e 1 1 þ k1 ðh1 þ sÞ þ k2 s

Rgen ¼

k1 e1 þ k2 e2 1 þ k1 h1 þ k2 h2

cf. Houston, Krebs, and Erichsen (1980). Assume that only type 1 items provide energy, that is, type 2 flowers, are empty and so e2 = 0. Then the condition for the slow strategy to have the higher rate is: k1 \k2 ðh2 =s  1Þ This condition means that increasing the frequency of empty flowers can result in a switch from an environment in which fast foragers do better to one in which slow foragers do better. 4.3.1. The “class of mechanism” question Thus, if flowers are often empty, bees with a cognitive ability to assess flowers more thoroughly before approaching them will do better, so will tend to evolve by natural selection; this is the behavior which would be ecologically rational. Similarly, if flowers are often full, bees which do not take time to assess will do better. Such a finding ignores mechanism almost entirely—it is subsumed into the assumptions about time taken to reach a decision. 4.3.2. A parametric question If bees are assumed to have the cognitive ability to assess flowers, and are choosing whether to do so based on the current environment, the question is rather different. Instead of identifying whether it is better for bees to assess flowers or not, the question becomes one of environmental assessment—for example, how many flowers should be assessed before tending to fix on a discriminatory choice for future flowers? This question is one of evolving parameters rather than mechanisms. Thus, by pitching the questions slightly differently, we see an entirely different type of analysis emerging. Which question is “best” depends on one’s perspective of what mechanisms are available to (potentially) perform particular functions. How a particular mental mechanism evolves will often depend on the minute details both of the existing mechanisms and the environmental conditions (e.g., see Trimmer et al., 2012). However, it may be possible to identify how a set of interacting mechanisms (e.g., learning of values and value-based decision making) evolve over time as an environment changes. Any such understanding may be most readily achieved by assuming some level of convergent evolution toward producing the optimal behavior given the ecology.

327

P. C. Trimmer, A. I. Houston / Topics in Cognitive Science 6 (2014)

Much work has been done by assuming optimal behavior evolves in simple situations, but such behavior may require complex mechanisms. McNamara and Houston (2009) argue that it is useful to identify simple rules which perform well in complex environments. One way to do this is to impose a cost for rule complexity, for instance, by assuming an energy cost for memory or processing. As Niven, Anderson, and Laughlin (2007) write, the “law of diminishing returns promotes the evolution of economical structures by severely penalising overcapacity.”

5. Summary This review serves as a brief account of how theoretical models can help to make sense of animal behavior. It is often assumed as a first approximation that natural selection will produce organisms that make optimal decisions in their natural environment. In the wild, decisions which exploit regularities in the environment will typically be made using simple rules that do not necessarily perform well outside this environment. Performance in novel environments, such as lab tests, will depend on the details of the rule. Consequently, behavior in lab experiments may appear irrational (McNamara, Fawcett, & Houston, 2013). When such behavior can be understood from the perspective of an animal’s choices in the wild, the behavior is said to be ecologically rational (McNamara et al., 2012b). Optimal foraging theory typically assumes that animals have the ability to learn the optimal foraging behavior in many laboratory procedures. Such an assumption seems reasonable if animals are presented with foraging options that are similar to those they encounter in the wild. It is not so clear that humans will respond optimally when confronted with computer-based information processing problems. To understand the details of behavior, including choices which are suboptimal even in the wild, the mechanisms supporting behavior need to be understood. This is still an emerging field, known as evo-mecho (McNamara & Houston, 2009). We have provided examples where (a) mechanisms are assumed and the parameters are evolved; (b) the mechanisms themselves are evolved, to identify which class of mechanism should tend to emerge under natural selection. In the latter case, we have also shown that the class of mechanism which typically emerges may not support optimal behavior. Consequently, although the assumption of ecological rationality is plausible, the exact nature of the relationship between ecology and cognitive mechanism can be far from obvious. Often, models of optimal behavior (on their own) tell us little or nothing about the mechanisms which animals use to approximate such behavior. However, when combined with “errors” which animals make, the models may help to indicate possible proximate mechanisms (and biases to those mechanisms) which underlie behavior. We have only been able to scratch the surface of the literature in this review. For instance, we have considered optimization of choices in situations where the actions of individuals do not affect others. Game theory, in contrast, deals with strategies where choices should influence, and be influenced by, others (Maynard Smith, 1982). This is a

328

P. C. Trimmer, A. I. Houston / Topics in Cognitive Science 6 (2014)

vast topic, covering predator–prey interactions, parent–offspring conflicts and many aspects of sexual selection such as the handicap principle. Game theory can also provide insights into mechanisms because, if one mechanism is assumed, such as autonomic blushing when telling a lie, other possible effects can be inferred, such as there being a benefit to self-deception (so as not to blush). McNamara, Gasson, and Houston (1999) also show that the behavioral outcome can differ when evolution acts on the cognitive mechanisms for parental effort rather than simply on the optimal behavior. The assumption of ideal information processing has allowed many aspects of behavior to be explained from an optimality perspective. However, much work remains if we are to understand how mechanisms really function, and how they have evolved. From a theoretical perspective, one obvious route ahead when evolving information processing mechanisms is the incorporation of computational costs, on memory (size), processing (speed), or efficiency (power), rather than regarding the brain as a “black box” with purely behavioral payoffs. Such assumptions may readily lead to information processing mechanisms emerging which are used for more than one purpose (rather than being out-performed by a set of task-specific mechanisms), which may in turn help to explain why particular classes of mechanisms evolve.

Acknowledgments This work was supported by the European Research Council (Evomech Advanced Grant 250209 to A. I. H.).

References Aw, J. M., Holbrook, R. I., de Perera, T. B., & Kacelnik, A. (2009). State-dependent valuation learning in fish: Banded tetras prefer stimuli associated with greater past deprivation. Behavioural Processes, 81(2), 333–336. Bellman, R. E. (1957). Dynamic programming. Princeton, NJ: Princeton University Press. Charnov, E. L. (1976). Optimal foraging: The marginal value theorem. Theoretical Population Biology, 9(2), 129–136. Chittka, L., Skorupski, P., & Raine, N. E. (2009). Speed-accuracy tradeoffs in animal decision making. Trends in Ecology & Evolution, 24(7), 400–407. Clark, C. W., & Mangel, M. (2000). Dynamic state variable models in ecology. Oxford, England: Oxford University Press. Egan, J. P. (1975). Signal detection theory and ROC analysis. New York: Academic Press. Fawcett, T. W., Hamblin, S., & Giraldeau, L.-A. (2012). Exposing the behavioural gambit: The evolution of learning and decision rules. Behavioral Ecology, 24(1), 2–11. Ferrari, M. C. O., & Chivers, D. P. (2011). Learning about non-predators and safe places: The forgotten elements of risk assessment. Animal Cognition, 14, 309–316. Fisher, R. A. (1930). The genetical theory of natural selection. Oxford, England: Clarendon Press. Getty, T., & Krebs, J. R. (1985). Lagging partial preferences for cryptic prey: A signal detection analysis of Great Tit foraging. The American Naturalist, 125(1), 39–60.

329

P. C. Trimmer, A. I. Houston / Topics in Cognitive Science 6 (2014)

Gittins, J. C. (1989). Multi-armed bandit allocation indices. Chichester, England: John Wiley & Sons. Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New York: Wiley. Haftorn, S. (2000). Contexts and possible functions of alarm calling in the Willow Tit, Parus montanus; the principle of “better safe than sorry.” Behaviour, 137(4), 437–449. Haselton, M. G., & Buss, D. M. (2000). Error management theory: A new perspective on biases in cross-sex mind reading. Journal of Personality and Social Psychology, 78(1), 81–91. Haselton, M. G., & Nettle, D. (2006). The paranoid optimist: An integrative evolutionary model of cognitive biases. Personality and Social Psychology Review, 10(1), 47–66. Houston, A. I. (1997). Natural selection and context-dependent values. Proceedings of the Royal Society B, 264, 1539–1541. Houston, A. I. (2009). Flying in the face of nature. Behavioural Processes, 80, 295–305. Houston, A. I. (2012). Natural selection and rational decisions. In S. Okasha & K. Binmore (Eds.), Evolution and rationality: Decisions, cooperation and strategic behaviour (pp. 50–66). Cambridge, England: Cambridge University Press. Houston, A. I., Kacelnik, A., & McNamara, J. M. (1982). Some learning rules for acquiring information. In D. J. McFarland (Ed.), Functional ontogeny (pp. 140–191). London: Pitman. Houston, A. I., Krebs, J. R., & Erichsen, J. T. (1980). Optimal prey choice and discrimination time in the great tit Parus major. Behavioral Ecology and Sociobiology, 6, 169–175. Houston, A. I., & McNamara, J. M. (1984). Imperfectly optimal animals — A correction. Behavioral Ecology and Sociobiology, 15(1), 61–64. Houston, A. I., & McNamara, J. M. (1989). The value of food: Effects of open and closed economies. Animal Behaviour, 37, 546–562. Houston, A. I., & McNamara, J. M. (1999). Models of adaptive behaviour, an approach based on state. Cambridge: Cambridge University Press. Houston, A. I., & McNamara, J. M. (2013). Foraging currencies, metabolism and behavioural routines. Journal of Animal Ecology, 83(1), 30–40. Houston, A. I., McNamara, J. M., & Steer, M. D. (2007b). Violations of transitivity under fitness maximization. Biology Letters, 3(4), 365–367. Hutchinson, J. M. C., & Gigerenzer, G. (2005). Simple heuristics and rules of thumb: Where psychologists and behavioural biologists might meet. Behavioural Processes, 69, 97–124. Kacelnik, A. (2006). Meanings of rationality. In S. Hurley, & M. Nudds (Eds.), Rational animals? Oxford, England: Oxford University Press. Kacelnik, A., & Marsh, B. (2002). Cost can increase preference in starlings. Animal Behaviour, 63, 245–250. Kahneman, D. (2011). Thinking fast and slow. New York: Farrar, Strauss, Giroux. Krakauer, D. C., & Rodriguez-Girones, M. A. (1995). Searching and learning in a random environment. Journal of Theoretical Biology, 177, 417–429. Krebs, J. R., Kacelnik, A., & Taylor, P. (1978). Test of optimal sampling by foraging great tits. Nature, 275 (5675), 27–31. Lea, S. E. G., McLaren, I. P. L., Dow, S. M., & Graft, D. A. (2012). The cognitive mechanisms of optimal sampling. Behavioural Processes, 89, 77–85. Lotem, A., & Halpern, J. Y. (2012). Co-evolution of learning and data acquisition mechanisms: A model for cognitive evolution. Philosophical Transactions of the Royal Society, B, 367, 2686–2694. Maynard Smith, J. (1982). Evolution and the theory of games. Cambridge, England: Cambridge University Press. McNamara, J. M., Fawcett, T. W., & Houston, A. I. (2013). An adaptive response to uncertainty generates positive and negative contrast effects. Science, 340, 1084–1086. McNamara, J. M., Gasson, C. E., & Houston, A. I. (1999). Incorporating rules for responding into evolutionary games. Nature, 401, 368–371. McNamara, J. M., & Houston, A. I. (1980). The application of statistical decision theory to animal behaviour. Journal of Theoretical Biology, 85, 673–690.

330

P. C. Trimmer, A. I. Houston / Topics in Cognitive Science 6 (2014)

McNamara, J. M., & Houston, A. I. (1986). The common currency for behavioral decisions. The American Naturalist, 127, 358–378. McNamara, J. M., & Houston, A. I. (2009). Integrating function and mechanism. Trends in Ecology & Evolution, 24, 670–675. McNamara, J. M., Mace, R. H., & Houston, A. I. (1987). Optimal daily routines of singing and foraging. Behavioral Ecology and Sociobiology, 20, 399–405. McNamara, J. M., Trimmer, P. C., & Houston, A. I. (2012a). It’s optimal to be optimistic about survival. Biology Letters, 8, 516–519. McNamara, J. M., Trimmer, P. C., & Houston, A. I. (2012b). The ecological rationality of state-dependent valuation. Psychological Review, 119, 114–119. Nesse, R. M. (2005). Natural selection and the regulation of defences. A signal detection analysis of the smoke detector principle. Evolution and Human Behaviour, 26, 88–105. Niven, J. E., Anderson, J. C., & Laughlin, S. B. (2007). Fly photoreceptors demonstrate energy-information trade-offs in neural coding. PLoS Biology, 5(4), e116. Pirolli, P. (2007). Information foraging theory. Oxford, England: Oxford University Press. Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II, current theory and research. Century Psychology Series (pp. 64–99). New York: Appleton-Century-Crofts. Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5), 527–535. Simon, H. A. (1956). Rational choice and the structure of the environment. Psychological Review, 63(2), 129–138. Simon, H. A. (1990). Invariants of human behaviour. Annual Review of Psychology, 41, 1–19. Stephens, D. W., & Krebs, J. R. (1986). Foraging theory. Princeton, NJ: Princeton University Press. Sulikowski, D., & Burke, D. (2011). Movement and memory: Different cognitive strategies are used to search for resources with different natural distributions. Behavioral Ecology and Sociobiology, 65, 621– 631. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press. Thomas, G. (1974). Influences of encountering a food object on subsequent searching behaviour in Gasterosteus aculeatus L. Animal Behaviour, 22, 941–952. Trimmer, P. C. (2013). Optimal behaviour can violate the principle of regularity. Proceedings of the Royal Society B, 280(1763), 20130858. Trimmer, P. C., Houston, A. I., Marshall, J. A. R., Bogacz, R., Paul, E. S., Mendl, M. T., & McNamara, J. M. (2008). Mammalian choices: Combining fast-but-inaccurate and slow-but-accurate decision-making systems. Proceedings of the Royal Society B, 275(1649), 2353–2361. Trimmer, P. C., Houston, A. I., Marshall, J. A. R., Mendl, M. T., Paul, E. S., & McNamara, J. M. (2011). Decision-making under uncertainty: Biases and Bayesians. Animal Cognition, 14(4), 465–476. Trimmer, P. C., McNamara, J. M., Houston, A. I., & Marshall, J. A. R. (2012). Does natural selection favour the Rescorla-Wagner rule? Journal of Theoretical Biology, 302, 39–52. Tversky, A., & Kahneman, D. (1974). Judgement under uncertainty — Heuristics and biases. Science, 185 (4157), 1124–1131. Vilares, I., Howard, J. D., Fernandes, H. L., Gottfried, J. A., & Kording, K. P. (2012). Differential representations of prior and likelihood uncertainty in the human brain. Current Biology, 22, 1641–1648.

An evolutionary perspective on information processing.

Behavioral ecologists often assume that natural selection will produce organisms that make optimal decisions. In the context of information processing...
226KB Sizes 2 Downloads 3 Views