Cognitive Science 38 (2014) 399–438 Copyright © 2014 Cognitive Science Society, Inc. All rights reserved. ISSN: 0364-0213 print / 1551-6709 online DOI: 10.1111/cogs.12104

Weighing Outcomes by Time or Against Time? Evaluation Rules in Intertemporal Choice Marc Scholten,a Daniel Read,b Adam Sanbornc a

European University Warwick Business School c Department of Psychology, University of Warwick b

Received 6 July 2012; received in revised form 8 May 2013; accepted 22 May 2013

Abstract Models of intertemporal choice draw on three evaluation rules, which we compare in the restricted domain of choices between smaller sooner and larger later monetary outcomes. The hyperbolic discounting model proposes an alternative-based rule, in which options are evaluated separately. The interval discounting model proposes a hybrid rule, in which the outcomes are evaluated separately, but the delays to those outcomes are evaluated in comparison with one another. The tradeoff model proposes an attribute-based rule, in which both outcomes and delays are evaluated in comparison with one another: People consider both the intervals between the outcomes and the compensations received or paid over those intervals. We compare highly general parametric functional forms of these models by means of a Bayesian analysis, a method of analysis not previously used in intertemporal choice. We find that the hyperbolic discounting model is outperformed by the interval discounting model, which, in turn, is outperformed by the tradeoff model. Our cognitive modeling is among the first to offer quantitative evidence against the conventional view that people make intertemporal choices by discounting the value of future outcomes, and in favor of the view that they directly compare options along the time and outcome attributes. Keywords: Psychology; Decision making; Human experimentation; Mathematical modeling; Intertemporal choice; Hyperbolic discounting

1. Introduction Choice models can be arranged on a continuum, with alternative-based models on one end and attribute-based models on the other (Payne, Bettman, & Johnson, 1988). In alternative-based choice models, options are independently assigned an overall value, Correspondence should be sent to Marc Scholten, European University, Quinta do Bom Nome, Estrada da Correia 53, 1500-210 Lisbon, Portugal. E-mail: [email protected]

400

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

these overall values are compared, and the option with the highest overall value is chosen. In attribute-based choice models, options are directly compared along their attributes, and the option favored by these comparisons is chosen. The fundamental issue is whether people evaluate options in isolation from one another or in direct comparison with one another. We investigate this issue in the domain of intertemporal choice, the outcomes of which differ in their value and timing. Most models in this domain are delay discounting models, which view choice as alternative based. We argue, however, that choice should instead be viewed as attribute based, and we offer quantitative support for this position. Consider a choice between $100 in 1 month and $200 in 1 year. In delay discounting models (e.g., Ainslie, 1975; Laibson, 1997; Loewenstein & Prelec, 1992; Mazur, 1987), such a choice is made as follows. Values are assigned to the outcomes ($100, $200), these values are discounted as a function of the delays to the outcomes (1 month, 1 year), and the option with the highest discounted value is chosen. Loewenstein and Prelec’s (1992) hyperbolic discounting model is the most comprehensive effort to account for intertemporal choice on the assumption that choice is alternative based. Descriptive models of intertemporal choice, such as the hyperbolic discounting model, accommodate preference patterns that are anomalous to the exponential discounting model, in which delayed outcomes are discounted at a constant rate.1 Imagine a representative agent who is indifferent between $100 today and $200 in 1 year, implying a discount rate of 50% per year. This discount rate would then apply universally, so this agent would also be indifferent between $1,000 today and $2,000 in 1 year, between $100 today and $200 in 1 year, and between $100 in 1 year and $200 in 2 years. None of these implied indifferences will actually be observed. First, there will be an absolute magnitude effect (Loewenstein & Prelec, 1992), with discount rates being lower for larger outcomes than for smaller ones (Thaler, 1981): The representative agent will prefer $2,000 in 1 year to $1,000 today.2 Second, there is a gain-loss asymmetry (Loewenstein & Prelec, 1992), with discount rates being lower for losses than for gains (McAlvanah, 2010): The representative agent will prefer $100 today to $200 in 1 year. And, third, there is a common difference effect (Loewenstein & Prelec, 1992), with the rate of discounting over a given interval being lower the later the interval begins (e.g., McAlvanah, 2010; Scholten & Read, 2006; but see Sayman & € uler, 2009; Read, Frederick, & Airoldi, 2012): The representative agent will prefer Onc€ $200 in 2 years to $100 in 1 year. These preference patterns are all accommodated by the hyperbolic discounting model, as summarized in Table 1. There are, however, anomalies that the hyperbolic discounting model, and indeed all delay discounting models, cannot accommodate. These anomalies violate the assumption of additivity in intervals, which is that total discounting over an interval should not depend on whether and how the interval is subdivided. For instance, discounting over 1 year should not depend on whether the year is divided into its four seasons (spring, summer, autumn, and winter) or left undivided. Yet systematic violations of additivity in intervals have been documented. Some evidence shows subadditivity, or less discounting over the entire period than over its segments (e.g., McAlvanah, 2010; Read, 2001; Read & Roelofsma, 2003;

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

401

Table 1 The empirical phenomena explained by the candidate models Model Discounting Phenomenon Basic discounting effects Outcomes discounted by a greater amount over longer delays than over shorter ones Larger outcomes discounted by a greater amount than smaller ones Absolute magnitude effect: Smaller outcomes discounted at a higher rate than larger ones Gain-loss asymmetry: Gains discounted at a higher rate than losses Common difference effect: Outcomes discounted at a higher rate over sooner intervals Intervals effects Subadditivity: Less discounting over an undivided interval than over a subdivided one Superadditivity: More discounting over an undivided interval than over a subdivided one Interval by compensation interaction effect: Relative nonadditivity Subadditivity when intervals are long relative to compensations Superadditivity when compensations are large relative to intervals

Exponential

Hyperbolic

Interval

Trade-off











































✓ ✓

Scholten & Read, 2006). Other evidence shows superadditivity, or more discounting over the entire period than over its segments (Scholten & Read, 2006). Interval effects show that there must be some attribute-based evaluation occurring, because the computation of an interval involves a direct comparison between the delays to both outcomes. Two recently proposed models introduce attribute-based evaluation to intertemporal choice. These are the interval discounting model (Scholten & Read, 2006) and the tradeoff model (Scholten & Read, 2010). The interval discounting model incorporates attribute-based evaluation by means of a discount function defined over intervals, or differences between delays, rather than delays only. The discount function, which accommodates both subadditivity and superadditivity, is then applied to outcome values, as in delay discounting models. The interval discounting model thus incorporates both attribute-based evaluation (the comparison between delays enters into the discount function) and alternative-based evaluation (choice results from a comparison between the discounted values of the options). Although the interval discounting model is not unique in combining alternative-wise and attribute-wise comparisons (e.g., Mellers & Biagini, 1994; Shafir, Osherson, & Smith, 1993), it is unique in proposing that direct comparisons are made along one attribute (time) but not along the other (outcome).

402

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

The tradeoff model (Scholten & Read, 2010) is fully attribute based: Options are directly compared along the time and outcome attributes, and the option favored by these comparisons is chosen. That is, outcomes are not weighted by time, as in discounting models, but they are weighted against time, as is also the case in Killeen’s (2009) additive-utility model. Because outcomes are weighted against time, time has value, just as outcome does: The value of gaining sooner or losing later is weighted against the value of gaining more or losing less. The tradeoff function, which weighs the time advantage of one option against the outcome advantage of the other, accommodates both subadditivity and superadditivity. Subadditivity and superadditivity can give rise to intransitivity, in which A is preferred to B, B is preferred to C, and yet C is preferred to A. This can be accommodated by both the interval discounting model and the tradeoff model. However, intransitive intertemporal choice exhibits a more intricate pattern that we call relative nonadditivity. When differences between outcomes (compensations) are large relative to differences between delays (intervals), intransitivity shows a superadditive pattern (Scholten & Read, 2006, 2010), in which the larger gain is chosen over a series of subintervals but the sooner gain is chosen over the undivided interval; conversely, when intervals are long relative to compensations, intransitivity shows a subadditive pattern (Roelofsma & Read, 2000; Scholten & Read, 2010), in which the sooner gain is chosen over subintervals but the larger gain is chosen over the undivided interval. The interval discounting model cannot accommodate relative nonadditivity because it operates on intervals but not on compensations, but the tradeoff model can, because it operates on both intervals and compensations (Table 1). Our overarching goal is to empirically compare three evaluation rules: Alternative based (exponential discounting model, hyperbolic discounting model), hybrid (interval discounting model), and attribute based (tradeoff model). In doing so, we make three contributions to the study of intertemporal choice. Our first contribution is to develop a full parametric functional form of the hyperbolic discounting model and the interval discounting model. Previously, the value function of both models has been described in terms of their qualitative properties, but neither Loewenstein and Prelec (1992; hyperbolic discounting model) nor Scholten and Read (2006; interval discounting model) developed and applied a parametric specification of a value function that exhibits those properties. Our second contribution is that we conduct quantitative comparisons of the three models. Unlike qualitative comparisons, which ask whether the models can predict systematic effects in the data, quantitative comparisons ask whether the models can also predict their magnitude. Qualitative support is necessary, but not sufficient, for quantitative support. We previously obtained qualitative support for the tradeoff model (Scholten & Read, 2010), with subadditivity and superadditivity arguing against delay discounting models, and relative nonadditivity arguing against the interval discounting model as well. Our quantitative analysis strengthens the case for the tradeoff model. Our third contribution is to offer a more parsimonious account of relative nonadditivity. We originally ascribed it to the inseparability of time and outcome (Scholten & Read,

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

403

2010). Specifically, there is subadditivity when compensations are large relative to intervals: The subintervals (A–B, B–C) are treated as short, but the undivided interval (A–C) is not, and therefore receives disproportionate weight. Conversely, there is superadditivity when intervals are long relative to compensations: The compensations over the subintervals are treated as small, but the compensation over the undivided interval is not, and therefore receives a disproportional weight. We will show that the tradeoff model can preserve attribute separability and yet accommodate relative nonadditivity, both qualitatively and quantitatively. This is the central piece of evidence against discounting models and in favor of the tradeoff model. We apply the three fully specified models to data from three studies. We employ a Bayesian analysis to compare the models, a technique that has not previously been used in intertemporal choice. Our cognitive modeling strongly supports the view that intertemporal choice draws on a fully attribute-based evaluation rule. More specifically, people evaluate the options by taking the differences between them along the time and outcome attributes, and then choose by taking the ratio between these differences. The tradeoff model, which combines these evaluation and choice rules, offers an accurate description of choice behavior, whereas the other candidate models do not. In the next section, we formalize the models and show how they arrive at their qualitative predictions.

2. Delay discounting models: An alternative-based evaluation rule The exponential and hyperbolic discounting models hold that intertemporal choice is governed by an alternative-based evaluation rule: A value is assigned to each outcome, that value is discounted as a function of the delay to the outcome, and then the outcome with the highest discounted value is chosen. The models differ on the properties of the discount function and the value function. 2.1. Exponential discounting model Let SS be a smaller outcome xS available after a shorter delay tS, and let LL be a larger outcome xL available after a longer delay tL. In the choice between $100 today and $200 in 1 year, for instance, we have xS = 100, tS = 0, xL = 200, and tL = 1. In the exponential discounting model, the decision maker will be indifferent between SS and LL when dtS xS ¼ dtL xL ;

ð1Þ

where 0 < d < 1 is a one-period discount factor, and q = 1  d is the proportion by which the outcome is discounted over one unit of time (the “discount rate”). When indifferent between $100 today and $200 in 1 year, the $200 is discounted by 50% over 1 year. More generally, from indifference between outcomes separated by any interval tS ? tL, we can solve for the one-period discount factor, which is an empirical measure of discounting over that interval:

404

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

 t t1 xS L S d¼ : xL

ð2Þ

In the exponential discounting model, the discount function is exponential, that is, dðtÞ ¼ dt and the value function is linear, that is, v(x) = x. The exponential discounting model predicts two basic discounting effects (Scholten & Read, 2013). First, the longer the delay, the larger the amount by which outcomes are discounted. Someone indifferent between $100 today and $200 in 1 year will be indifferent between $100 today and more than $200 in 2 years. Second, the larger the outcomes, the larger the amount by which they are discounted. Someone indifferent between $100 today and $200 in 1 year will be indifferent between $1,000 today and more than $1,100 in 1 year. The exponential discounting model shares these predictions with all other models (Table 1). However, it predicts no further preference patterns, because multiplying both outcomes by the same constant, reversing the sign of both outcomes, or adding the same constant to both delays does not affect d. Also, subdividing the interval tS ? tL into a sequence of intervals tS ? tM ? tL does not affect the total discounting over the interval: xS xM xS  ¼ ¼ dtL tS ¼ dtL tM  dtM tS : xM xL xL Studies of intertemporal choice conventionally use the exponential discounting model as a benchmark model, deviations from which are called “anomalies” (see also note 2). The hyperbolic discounting model, which we discuss next, modifies the discount function and the value function to address three of these anomalies. 2.2. Hyperbolic discounting model According to Loewenstein and Prelec’s (1992) hyperbolic discounting model, the decision maker will be indifferent between SS and LL when dðtS ÞvðxS Þ ¼ dðtL ÞvðxL Þ;

ð3Þ

where v is a reference-dependent value function, and d is a generalized hyperbolic discount function: dðtÞ ¼ ð1 þ s tÞb=s ;

ð4Þ

where b > 0 is delay discounting, and s > 0 is the departure from exponential discounting. As s goes to 0, the generalized hyperbolic discount function reduces to the exponential discount function dðtÞeb t ¼ dt . The hyperbolic discounting model predicts the common difference effect. To see this, the total discounting over the interval tS ? tL can be expressed in the form of a discount fraction:

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

vðxS Þ ¼ vðxL Þ



1 þ s tS 1 þ s tL

b=s

 ¼

1 þ s tS 1 þ s tS þ s ðtL  tS Þ

b=s

405

:

Increasing both delays by a common additive constant does not affect the interval tS ? tL in the denominator, but it does increase tS in the numerator and denominator. The discount fraction therefore increases toward 1, so that the discounting over the interval decreases. Indifference between $100 today and $200 in 1 year therefore entails preference for $200 in 2 years over $100 in 1 year. The hyperbolic discounting model cannot accommodate interval effects because outcome value is discounted only as a function of the delay to the outcomes, and not also as a function of the interval between them. The total discounting over a subdivided interval tS ? tM ? tL therefore equals the total discounting over the undivided interval tS ? tL: dðtM Þ dðtL Þ dðtL Þ  ¼ : dðtS Þ dðtM Þ dðtS Þ The value function v has five properties. The first three are taken from prospect theory (Kahneman & Tversky, 1979): 1. Reference dependence. Outcomes are evaluated as gains (x > 0) and losses (x < 0) relative to a neutral reference point, that is, v(0) = 0. 2. Diminishing absolute sensitivity. For constant absolute increases of x > 0 (decreases of x < 0), v(x) increases (decreases) by decreasing absolute amounts. 3. Loss aversion. Losses loom larger than gains, that is, v is steeper for losses than for gains. Tversky and Kahneman (1992) specified prospect theory with constant loss aversion, which is that reversing the sign of an outcome from positive to negative increases the magnitude of its value by a multiplicative constant, that is, v(x) = kv(x), where x ≥ 0 and k > 1. Two additional properties account for the absolute magnitude effect and the gain-loss asymmetry. Both properties involve the elasticity of the value function, or the ratio between the proportional change in value and the proportional change in amount: e¼

v0 ðxÞ=vðxÞ : 1=x

The properties are as follows: 4. Increasing elasticity. For constant proportional increases of x > 0 (decreases of x < 0), v(x) increases (decreases) by increasing proportional amounts. Therefore, if outcome magnitude is increased by a common multiplicative constant, the ratio between the values of xS and xL decreases:

406

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

vðmxS Þ vðxS Þ \ vðmxL Þ vðxL Þ

if m [ 1:

Increasing elasticity accommodates the absolute magnitude effect: Indifference between $100 today and $200 in 1 year entails a preference for $2,000 in 1 year over $1,000 today. When v(0) = 0 (property 1), diminishing absolute sensitivity (property 2) means that elasticity lies between 0 and 1. Thus, elasticity increases, yet never exceeds 1. 5. Loss amplification (Prelec & Loewenstein, 1991). Elasticity is greater for losses than for gains. That is, decreasing x < 0 by a given proportion yields a greater proportional change in value than increasing x > 0 by the same proportion. If outcome sign is reversed from positive to negative, therefore, the ratio between the values of xS and xL decreases: vðxS Þ vðxS Þ \ vðxL Þ vðxL Þ

if xS ; xL [ 0:

Loss amplification accommodates the gain-loss asymmetry: Indifference between $100 today and $200 in 1 year entails a preference for $100 today over $200 in 1 year. This is not the same as loss aversion. Loss aversion means that the negative value of a loss is greater than the positive value of an equivalent gain, or v(x) > v(x) for all x > 0. Loss amplification, as given by the expression above, is about the relative value of a pair of losses in comparison to the relative value of a pair of equivalent gains. Loewenstein and Prelec (1992) did not specify a functional form for their value function. Chapman (1996) proposed an additive combination of two power functions, specifically, v(x) = ax½ + bx, where a, b > 0. Our provisional specification is v(x) = xa + lxb, where 0 < a < b < 1 and l > 0. Because power functions have constant elasticity equal to their exponent, the elasticity of this combination ranges from a (as x goes to 0) to b (as x goes to infinity). Our final specification of the value function relies on two additional considerations: 1. Constant zero value of the zero outcome. The zero outcome is assigned a zero value, that is, v(0) = 0, regardless of whether the decision maker is sensitive to outcomes or not. This can be achieved by letting v(x) = axa + lbxb. 2. Symmetric bounds on elasticity. The elasticity of the value function increases from 0 < 1  e ≤ ½ (as x goes to 0) to ½ ≤ e < 1 (as x goes to infinity). This can be achieved by letting v(x) = (1  a)x1  a + laxa, where ½ < a < 1. Drawing on these considerations, and considering the domain of losses as well, we specify the value function as follows:

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

( vðxÞ ¼

1c þ lcxc ð1  h cÞx i k ð1  cÞðxÞ1c þ ðl þ rÞcðxÞc

if x  0 if x\0;

407

ð5Þ

where ½ < c < 1 is diminishing absolute sensitivity to outcomes, l > 0 is increasing elasticity, k > 1 is loss aversion, and r > 0 is loss amplification. In this value function, elasticity increases from 1  c to c, and a zero outcome is assigned a zero value, regardless of whether the decision maker is sensitive to outcomes (½ < c < 1 and l ≥ 0 or ½ < c ≤ 1 and l > 0) or not (1 – c = l = 0).3 As discussed in Appendix A, the Loewenstein and Prelec (1992) value function raises issues with conventional estimation methods, which the above specification does not address. We advance a modification in the value function in Eq. 5 to deal with those issues. The precise role of the four parameters is as follows. First, the diminishing sensitivity parameter c scales the bounds on elasticity. The greater its departure from ½, the more the elasticity of the value function increases over the entire range of outcomes. Furthermore, the increasing elasticity parameter l scales the contribution of the second, more elastic power function relative to the first, less elastic power function: The greater its departure from 0, the more the elasticity of the value function increases over the range of small outcomes, and the less it increases over the range of large outcomes. Finally, the loss-aversion parameter k yields a steeper value function in losses than in gains, that is, v(x) > v(x) for x > 0, whereas the loss amplification parameter r yields a steeper and more elastic value function in losses than in gains. In the absence of loss amplification (r = 0), we obtain constant loss aversion, that is, v(mx)/v(mx) = v(x)/v(x) for m > 1 and x > 0, but, in its presence (r > 0), we obtain increasing loss aversion, that is, v(mx)/v(mx) > v(x)/v(x). While Loewenstein and Prelec’s (1992) discount function is widely applied in studies of delay discounting, their value function is not. Instead, the value function is typically assumed to be linear, as it is in the exponential discounting model. Having developed a value function that exhibits all five properties proposed by Loewenstein and Prelec (1992), we will be able to apply the full formulation of the hyperbolic discounting model. However, the hyperbolic discounting model, like any delay discounting model, cannot account for interval effects. We next discuss the interval discounting model, which preserves the value function in Eq. 5, but modifies the discount function to accommodate interval effects.

3. The interval discounting model: A hybrid evaluation rule The interval discounting model (Scholten & Read, 2006) holds that intertemporal choice is governed by a hybrid evaluation rule. It is alternative based in that each outcome receives a discounted value, but it is attribute based in that the values of the outcomes are discounted not only as a function of their delays but also as a function of the interval between them. In the interval discounting model, the decision maker will be indifferent between SS and LL when

408

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

Dð0; tS ÞvðxS Þ ¼ Dð0; tS ÞDðtS ; tL ÞvðxL Þ; where v is Loewenstein and Prelec’s (1992) value function and D is an interval discount function. As can be seen, LL is discounted over two consecutive intervals, 0 ? tS and tS ? tL. Because the first interval is shared with SS, it drops out of the equation, and the indifference point becomes vðxS Þ ¼ DðtS ; tL ÞvðxL Þ:

ð6Þ

The interval discount function is motivated by three results from Scholten and Read (2006), illustrated in Fig. 1. The first result, also predicted by the hyperbolic discounting model, is the effect of interval onset, or the common difference effect: Less discounting (higher d) the later an interval of given length begins. There were, in addition, both possible effects of interval length: Superadditivity over short intervals, that is, more per-period discounting (lower d) as short subintervals were merged into undivided intervals, and subadditivity over longer intervals, that is, less per-period discounting (higher d) as longer subintervals were themselves merged into undivided intervals. The interval discount function below accommodates these three results:   !b=a wðtL Þ  wðtS Þ # ; ð7Þ DðtS ; tL Þ ¼ 1 þ a # where a > 0 is subadditive discounting, and ϑ > 1 is superadditive discounting. The argument of the interval discount function is the difference between weighted delays, w(tL)  w(tS), or the effective interval (Scholten & Read, 2010). The time-weighing function w is

Interval onset Superadditivity Late

δ

Subadditivity Early

Common difference effect

Interval length

Fig. 1. The three anomalies addressed by the interval discount function: The common difference effect (interval onset), superadditivity (short intervals), and subadditivity (longer intervals).

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

1 wðtÞ ¼ logð1 þ s tÞ; s

409

ð8Þ

where s > 0 is diminishing absolute sensitivity: For constant absolute increases of t, w(t) increases by decreasing absolute amounts. That is, if delay length is increased by a common additive constant, the effective interval decreases: wða þ tL Þ  wða þ tS Þ\wðtL Þ  wðtS Þ

if a [ 0:

Diminishing absolute sensitivity to delays therefore accommodates the common difference effect. While the time-weighing function is a concave function over raw delays, the interval discount function is an inverse S-shaped function over differences between weighted delays, or effective intervals. This accommodates the progression from superadditivity to subadditivity over intervals of increasing length. For instance, if nonadditivity gives rise to intransitivity, one could choose the larger later gain over the subintervals tS ? tM and tM ? tL but choose the smaller sooner gain over the undivided interval tS ? tL (superadditivity), and one could choose the smaller sooner gain over the subintervals tS ? tL and tL ? tX but choose the larger later gain over the undivided interval tS ? tX (subadditivity). The interval discount function in Eq. 7 and the time-weighing function in Eq. 8 differ from their original specifications (Scholten & Read, 2006). As discussed in Appendix B, they introduce two improvements. First, the interval discount function in Eq. 7 allows for superadditivity over any range of intervals before it reverses into subadditivity. Second, in combination with the time-weighing function in Eq. 8, the interval discount function satisfies a formal requirement that we call alpha-tau equivalence: When the sooner outcome is immediate (tS = 0) and when superadditivity vanishes (ϑ = 1), it reduces to the same functional form regardless of whether subadditivity or diminishing absolute sensitivity to delays vanishes next (i.e., either a or s going to 0). In each of these cases, the interval discount function reduces to the generalized hyperbolic discount function in Eq. 4. The interval discounting model accounts for interval effects as well as the anomalies addressed by the hyperbolic discounting model by means of a hybrid evaluation rule. In the next section, we discuss the tradeoff model, which replaces the hybrid rule with a fully attribute-based rule, and we will show that it accounts more parsimoniously for all anomalies addressed by the interval discounting model. We will later show how the tradeoff model also accounts for relative nonadditivity, where the relative magnitude of compensations and intervals determines whether superadditivity or subadditivity occurs.

4. The tradeoff model: An attribute-based evaluation rule The tradeoff model (Scholten & Read, 2010) holds that intertemporal choice is governed by an attribute-based evaluation rule, in which the outcome advantage of one

410

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

option is weighted against the time advantage of the other, and the option with the greatest advantage is chosen. When the outcomes are positive (e.g., receiving $100 today or $200 in 1 year), SS has the time advantage (receiving 1 year sooner), while LL has the outcome advantage (receiving $100 more). When the outcomes are negative (e.g., paying $100 today or $150 in 1 year), the situation is reversed, with SS having the outcome advantage (paying $50 less), and LL having the time advantage (paying 1 year later). These advantages are differences between weighted delays, w(tL)  w(tS), or the effective interval (as in the interval discounting model), and differences between valued outcomes, v(xL)  v(xS) for xL > xS > 0 and v(xS)  v(xL) for xL < xS < 0, or the effective compensation.4 The effective interval and the effective compensation are weighted against one another by a tradeoff function Q. The decision maker will be indifferent between SS and LL when  vðxL Þ  vðxS Þ if xL [ xS [ 0; ð9Þ QðwðtL Þ  wðtS ÞÞ ¼ if 0 [ xS [ xL . vðxS Þ  vðxL Þ The time-weighing function w weighs delays against another, while the value function v weighs outcomes against one another. They share the properties of reference dependence, diminishing absolute sensitivity, and augmenting proportional sensitivity. As in the interval discounting model, diminishing absolute sensitivity to delays accommodates the common difference effect, because increasing both delays by a common additive constant decreases the effective interval. Augmenting proportional sensitivity is that, for constant proportional increases of t, w(t) increases by increasing absolute amounts, and, for constant proportional increases of x > 0 (decreases of x < 0), v(x) increases (decreases) by increasing absolute amounts. That is, if outcome magnitude is increased by a common multiplicative constant, the effective compensation increases: vðmxL Þ  vðmxS Þ [ vðxL Þ  vðxS Þ

if xS ; xL [ 0 and m [ 1:

Augmenting proportional sensitivity therefore accommodates the absolute magnitude effect. A fourth property of the value function is constant loss aversion, by which reversing the sign of the outcomes from positive to negative increases the effective compensation: vðxS Þ  vðxL Þ ¼ kvðxS Þ  ½kvðxL Þ ¼ k½vðxL Þ  vðxS Þ [ vðxL Þ  vðxS Þ

if xS ; xL [ 0 and k [ 1

Constant loss aversion therefore accommodates the gain-loss asymmetry. The tradeoff model is more parsimonious than the interval discounting model because it does not need the two elasticity properties of the value function to accommodate the absolute magnitude effect and the gain-loss asymmetry. One might argue that the tradeoff model, instead, needs augmenting proportional sensitivity to account for the absolute

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

411

magnitude effect, but the interval discounting model implicitly invokes this property as well when invoking increasing elasticity, because a value function that exhibits increasing elasticity also exhibits augmenting proportional sensitivity, whereas the converse is not necessarily true (Scholten & Read, 2010; Appendix B). The tradeoff model has the same time-weighing function as the interval discounting model, given in Eq. 8. The generalized hyperbolic discount function in Eq. 4 is isomorphic to this normalized logarithmic time-weighing function, in that it results from exponential discounting over weighted delays, that is, dðtÞ ¼ ebwðtÞ (Takahashi, 2005; Takahashi, Oono, & Radford, 2008). Analogous to the time-weighing function, the tradeoff model has a normalized logarithmic value function: ( 1 logð1 þ c xÞ if x  0 ð10Þ vðxÞ ¼ c k  c logð1 þ cðxÞÞ if x\0; where c > 0 is diminishing absolute sensitivity to outcomes. Like the normalized exponential value function discussed by K€ obberling and Wakker (2005), the value function in Eq. 10 approaches constant sensitivity (unit elasticity) as c goes to 0 (linear value function), approaches insensitivity (zero elasticity) as c goes to infinity (zero value function), and is decreasingly elastic otherwise, spanning the entire range from unit elasticity (as x goes to 0) to zero elasticity (as x goes to infinity). The tradeoff function below weighs effective intervals against effective compensations:   ! j wðtL Þ  wðtS Þ # ; ð11Þ QðtS ; tL Þ ¼ log 1 þ a a # where j > 0 is time sensitivity (with greater sensitivity being analogous to more discounting). The tradeoff function is an S-shaped function over effective intervals, which accommodates a progression from superadditive to subadditive discounting over intervals of increasing length. Like the interval discount function in Eq. 7, it satisfies alpha-tau equivalence: When the sooner outcome is immediate (tS = 0) and when superadditivity vanishes (ϑ = 1), the tradeoff function reduces to the same functional form regardless of whether subadditivity or diminishing absolute sensitivity to delays vanishes next (i.e., either a or s going to 0). The interval discount function (Eq. 7) is isomorphic to the tradeoff function (Eq. 11), in that it results from exponential discounting over weighted intervals, that is, DðtS ; tL Þ ¼ eðb=jÞQðtS ;tL Þ : This concludes our formalization of the candidate models. They are isomorphic in the weighing of delays and intervals but differ in the evaluation of outcomes, and, of course, in the evaluation rule. We will compare the models on two types of data. The first type, indifference data, is obtained from choice-based matching, in which the delays and one outcome are fixed, and then the other outcome is adjusted as a function of the choices of the participant until the point of indifference is reached. The second type, preference data, is obtained from choice, in which the participants choose between the options, and the

412

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

choice probability across participants is taken as the strength of preference for one option over the other. We next offer an overview of our studies.

5. Overview of the studies We will apply the rival models (the tradeoff model, the interval discounting model, the hyperbolic discounting model, and the exponential discounting model) to data from three studies. All studies differentiate between the models, in that some models can offer a qualitative account of the preference patterns in the data (in which case we examine whether they can also offer a quantitative account), whereas other models cannot. However, the studies differentiate between the models in different ways. Study 1, a choice-based matching study, was designed to produce the absolute magnitude effect, the gain-loss asymmetry, the common difference effect, and subadditivity. This allowed us to differentiate between, on one hand, the tradeoff model and the interval discounting model, and, on the other hand, the delay discounting models: the hyperbolic discounting model, which cannot accommodate subadditivity, and the exponential discounting model, which cannot deal with any of the preference patterns (Table 1). With the design being hospitable to both the tradeoff model and the interval discounting model, these models had the opportunity to display their best performance, or, as phrased by Scheibehenne, Rieskamp, and Gonzalez-Vallejo (2009), to perform “in top gear.” This was done to determine the viability of our parametric functional form of the interval discounting model and to draw a preliminary comparison with the tradeoff model. The tradeoff model outperformed the interval discounting model, marginally in terms of goodness of fit but convincingly in terms of Bayes factors. The interval discounting model outperformed the hyperbolic discounting model, and convincingly so by both criteria. Study 2, a choice study, was designed to produce the absolute magnitude effect, the common difference effect, subadditivity, superadditivity, and relative nonadditivity in gains. This allowed us to differentiate between the tradeoff model and all three discounting models: the exponential discounting model, which cannot deal with any of the preference patterns; the hyperbolic discounting model, which cannot accommodate nonadditivity; and the interval discounting model, which cannot account for relative nonadditivity (Table 1). The tradeoff model outperformed all competing models, with the interval discounting model coming out as a distant second. Study 3, another choice study, used a questionnaire that has become popular in studies of delay discounting (Kirby, Petry, & Bickel, 1999). This questionnaire involves choices between an immediate gain and a delayed one, and it manipulates both rates of return and outcome magnitude. A greater willingness to wait with higher rates of return is accommodated by all four models, while the absolute magnitude effect differentiates between the exponential discounting model, which fails to deal with this preference pattern, and the other three models (Table 1).5 The tradeoff model again outperformed all competing models, with the interval discounting model coming out as a distant second.

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

413

6. Study 1: Indifference data from choice-based matching 6.1. Method 6.1.1. Participants The participants were 52 students (24 females and 28 males) from the London School of Economics in London, and from ISPA University Institute in Lisbon. The participants were paid $10 in their local currency. 6.1.2. Design The design included three within-participant factors: (1) the delay to the outcomes (standard [3 and 9 months], additively increased [23 and 29 months], and multiplicatively increased [12 and 36 months]); (2) the magnitude of the outcomes (small [20 and 40] and large [200 and 400]); and (3) the sign of the outcomes (positive and negative). The orthogonal manipulation of the three factors yielded 12 option pairs. 6.1.3. Procedure For each option pair, we conducted a choice-based matching procedure, as described by Scholten and Read (2006), to close in on the indifference point. The two delays were fixed, as well as the outcome rejected on the first trial. The outcome accepted on the first trial was then adjusted over a series of subsequent trials. To prevent participants from aiming at an infinitely large gain or a zero loss, the adjustment on the second trial always made the variable outcome worse, either by decreasing the magnitude of the gain or by increasing the magnitude of the loss. On subsequent trials, the direction of the adjustment depended on the participants’ choices. For instance, if SS was chosen on trial 1, and then again on trial 2, the adjustment on trial 3 went downward, but, if SS was chosen in trial 1, and LL was chosen on trial 2, the adjustment on trial 3 went upward. This procedure continued until the largest amount that yielded preference for one option differed by 1 monetary unit or less from the smallest amount that yielded preference for the other option. The midpoint between the two amounts was then taken as the indifference point. 6.1.4. Independent and dependent variables For each option pair, we computed the geometric means of outcomes xS and xL and the one-period discount factor d (Eq. 2). Taking the geometric mean of d is the same as taking the geometric means of xS and xL and computing d from there, which means that the functional relation between outcomes and one-period discount factors is preserved in the aggregate data (Scholten & Read, 2013). A full description of the 12 option pairs is given in Table 2. In our quantitative analyses, outcome xS and delays tS and tL are the independent variables, whereas outcome xL and the log transform of the one-period discount factor, log(d), are the dependent variables.

414

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

Table 2 Indifference data from choice-based matching (Study 1): Delays, adjusted outcomes, and one-period discount factors (N = 52) tSa

tLa

xSb

xLb

dc

3 3 3 3 23 23 23 23 12 12 12 12

9 9 9 9 29 29 29 29 36 36 36 36

15.94 29.27 184.72 301.87 19.19 32.27 191.52 345.80 13.55 28.79 176.18 282.65

33.11 44.45 296.61 430.45 29.00 40.62 261.15 400.95 36.46 44.26 331.57 445.06

.885 .933 .924 .943 .934 .962 .950 .976 .960 .982 .974 .981

a

Delays in months. Geometric means of outcomes, in pounds or euros. c Geometric means. b

6.1.5. Model specification and evaluation We evaluated how well the four candidate models were supported by the data using a Bayesian analysis. The virtue of this approach is that it allows us to make coherent, consistent, and complete inferences about the models and parameters from the data. We use this approach both to measure the relative ability of the models to produce the experimental data and to show that parameters used in the models are necessary. To evaluate the relative ability of the models to produce the experimental data, we used Bayes factors (Kass & Raftery, 1995). Bayes factors quantify the relative evidence in favor of a pair of models using likelihood ratios: BF ¼

pðDjM1 Þ ; pðDjM2 Þ

where p(D|M) is the probability of the data given model M. Prior beliefs about the plausibility of the models may be used as follows to determine what the relative beliefs about the models should be after seeing the data: pðM1 jDÞ pðM1 Þ ¼ BF pðM2 jDÞ pðM2 Þ If the models were considered equally likely beforehand, that is, p(M1) = p(M2), then the relative posterior belief is equal to the Bayes factor:

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

BF ¼

pðDjM1 Þ pðM1 jDÞ ¼ : pðDjM2 Þ pðM2 jDÞ

415

ð12Þ

Computing the Bayes factor requires determining the probability of the data marginalized over all free parameters in model M: Z pðDjMÞ ¼ pðDjM; hÞpðhjMÞdh; where h is a set of parameters for model M. These were given simple prior distributions. The parameters b, j, s, a, l, and r, along with the c parameter of the tradeoff model, were given a lognormal prior distribution with a mean of zero and a standard deviation of one. The parameters ϑ and k were given the same lognormal prior distribution, but with the distribution translated so that it began at one instead of zero. The lognormal distribution was chosen because it does not allow for negative values, it is heavy tailed, and it puts little weight on values extremely close to zero.6 For the parameters with both an upper and a lower bound on their possible values, d and the c parameter of the discounting models, the prior distribution was set to be uniform over all possible values. The above integral is normally intractable, as it appears to be for our models, so we solved it using an approximation method. We chose to use the Candidate’s formula for computing the marginal likelihood (Besag, 1989; Chib, 1995): pðDjMÞ ¼

pðDjM; hÞpðh  jMÞ ; pðh  jD; MÞ

for a particular value of h*, in our case the posterior mode.7 The Candidate’s formula can be implemented in an efficient way by using samples drawn from the posterior distribution with the standard technique of Markov Chain Monte Carlo (MCMC; Chib & Jeliazkov, 2001). For more details on our estimation procedure, see Appendix C. We will assume that the prior probabilities p(M) are equal for all models. Consequently, p(D|M) as given by the Candidate’s formula is equal to the posterior probability p(M|D). We normalize the posterior probabilities across models to sum to 1, so as to evaluate the relative posterior belief in the models. For any pair of models, the ratio between posterior probabilities is the Bayes factor BF, as in Eq. 12. In study 1, the data D are 12 values of log(d) corresponding to 12 option pairs. We assume a Gaussian likelihood distribution over log(d), with, for each model M, a standard deviation x. (This parameter only affects the goodness of fit of the model to the data, not the predictions made by the model. We allowed x to vary, setting a prior distribution over x that was uniform from 0 to 200.) The exponential discounting model, in Eq. 1, predicts a constant value of log(d), whereas the other models

416

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

predict that it will vary across the 12 option pairs. To obtain the predicted values of log(d) from the tradeoff model, we solved Eq. 9 for outcome xL, and inserted it, along with outcome xS and delays tS and tL, into Eq. 2, to obtain the predicted values of d, subsequently transformed into log(d). To obtain the predicted values of log(d) from the hyperbolic discounting model and the interval discounting model, which cannot be solved for xL (see Appendix A), we first derived, for a particular set of parameter values h* and for each option pair i, the value of outcome xL satisfying Eqs. 3 and 6, and then inserted that value, along with outcome xS and delays tS and tL, into Eq. 2, to obtain the predicted value of d for option pair i, subsequently transformed into log (d). For each model, the single set of parameter values with the highest posterior probability (the maximum a posteriori parameters [MAP]) was used for producing the predictions that we report. 6.2. Results We conducted a repeated measures ANOVA on log(d) for the 3 (delay length) 9 2 (outcome magnitude) 9 2 (outcome sign) design.8 As expected, d was higher for the large outcomes than for the small ones, F(1, 51) = 25.00, p < .005 (the absolute magnitude effect), and higher for losses than for gains, F(1, 51) = 29.06, p < .005 (the gain-loss asymmetry). In addition, d varied as a function of delay length, F(2, 102) = 42.51, p < .005. Planned contrasts showed that d was higher for the additively increased delays than for the standard ones, t(51) = 4.87, p < .005 (the common difference effect), and higher for the multiplicatively increased delays than for the additively increased ones, t (51) = 5.29, p < .005. To interpret this result, we consult Fig. 2, in which the interval spanned by the multiplicatively increased delays (M) is divided into an early interval (E), an intermediate interval spanned by the additively increased delays (A), and a late interval (L). Because E is longer than L, diminishing absolute sensitivity to delays implies that d will be higher for A than for M.9 Conversely, because M is longer than A, subadditivity implies that d will be higher for M than for A. We obtained the latter result, showing that subadditivity outweighed diminishing absolute sensitivity to delays.

0

3

9

12

23

E

29

A

36

L

M

Fig. 2. The six delays used in choice-based matching (Study 1): standard delays (3 and 9), additively increased delays (23 and 29), and multiplicatively increased delays (12 and 36). The interval spanned by the multiplicatively increased delays, denoted M, is divided into an early interval E, an intermediate interval spanned by the additively increased delays, denoted A, and a late interval L.

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

417

Because the design did not include intervals E and L, any role of superadditivity could not be identified. The tradeoff model and the interval discounting model therefore assumed that there was none (ϑ = 1). Table 3 shows the posterior probabilities, taken to be proportional to the marginal likelihoods, upon evaluating the candidate models on log(d). The tradeoff model outperformed the interval discounting model by a factor of 130.30 (the Bayes factor). The marginal likelihoods are a reflection of both a reward for fitting the data well and a penalty for complexity. To get a sense of why the tradeoff model outperformed the interval discounting model, we can look at the single best parameters from a Bayesian perspective, which are the MAP parameters. The MAP parameters of the tradeoff model and the interval discounting model both fit the data well: They accounted for 98.72% and 98.62% of the variance in log(d), respectively. Indeed, for these parameters alone, ignoring priors, the tradeoff model was favored by only a factor of 1.53. Thus, the main driver of the disadvantage that the interval discounting model has in terms of Bayes factors is the complexity penalty, which, on inspection, appears to result because the interval discounting model fits the data well only for very small values of l. The interval discounting model outperformed the hyperbolic discounting model by a factor of 15.20, but here the difference between the MAP parameters was much greater: For these parameters alone, ignoring priors, the interval discounting model was favored by a factor of about 2.6 million, and the MAP parameters of the hyperbolic discounting model accounted for only 83.79% of the variance in log(d). Thus, in terms of Bayes factors, the interval discounting model dominated the hyperbolic discounting model much less than that it was dominated by the tradeoff model, when the reverse was true in terms of MAP parameters. The ability of the hyperbolic discounting model to produce good, but not great, fits to the data for a wide range of parameters explains why this is so. Finally, the hyperbolic discounting model outperformed the exponential discounting model by a factor of 80.05. The MAP parameters of the exponential discounting model by definition account for 0% of the variance in log(d) and were dominated by the MAP parameters of the hyperbolic discounting model, with a likelihood ratio of 55,134. Table 4 shows the maximum a posteriori parameters. Recall that c differs structurally between the discounting models (½ < c < 1 in an additive combination of two power functions) and the tradeoff model (c > 0 in a normalized logarithmic function). The tradeoff model offers a measure of loss aversion in intertemporal choice, and k falls within the range of estimates from risky choice (Booij, van Praag, & van de Kuilen, 2010). Table 3 Indifference data from choice-based matching (Study 1): Posterior probabilities upon evaluating the candidate models on log(d) Exponential discounting model Hyperbolic discounting model Interval discounting model Trade-off model

.000006 .000501 .007612 .991881

418

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

Table 4 Indifference data from choice-based matching (Study 1): Maximum a posteriori parametersa Model Discounting Exponential Parameter b

d

Hyperbolic

Interval

Trade-off

Estimate

Parameter

Estimate

Parameter

Estimate

Parameter

Estimate

0.9498

b s

0.0920 0.1446

c l r

0.7522 0.0403 0.4578

b s a c l r

0.1925 0.1523 1.0802 0.7814 0.0169 0.1356

j s a c

3.4989 0.1294 0.8434 0.0842

k x

1.5188 0.0033

x 0.0292 x 0.0117 x 0.0034 Note. aThe models were estimated on 12 data points collected from 52 participants. b d = eb solves for b = log(d) = .0514.

Fig. 3 shows the observed values of d and the values predicted by the candidate models along a logarithmic scale. The interval discounting model and the tradeoff model reproduce the quantitative as well as qualitative patterns in the data. In contrast, the hyperbolic discounting model overpredicts the effect of additively increasing the delays to the outcomes (i.e., the common difference effect) and underpredicts the effect of multiplicatively increasing the delays to the outcomes. It even predicts that the former effect is stronger than the latter, contrary to what is observed. It arrives at this prediction because it accommodates the common difference effect, but not subadditivity, which outweighed the common difference effect. Overall, the hyperbolic discounting model succeeded qualitatively (it did predict the effects) but failed quantitatively (it did not accurately predict the magnitude of the effects). Purely in terms of goodness of fit, the interval discounting model and the tradeoff model performed about equally well in an environment hospitable to both, suggesting that there are no failures in the parametric functional form of the interval discounting model. Study 2 introduces several improvements over the choice-based matching study: (1) more data points, (2) estimation of the superadditivity parameter, and (3) a design that produces relative nonadditivity. The interval discounting model does not predict relative nonadditivity, whereas the tradeoff model does. We next discuss why.

7. The product rule: Weighing outcomes by time versus weighing outcomes against time Models that satisfy additivity in intervals will satisfy the product rule (Luce, 1959). Letting p be a choice probability and Ω = p/(1  p) be the choice odds, the product rule

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

419

1.00 .98 .96

δ

.94 .92

Delays

.90

t a+t m×t

.88

Hyperbolic discounting model

Interval discounting model

Large losses

Large gains

Small gains

Small losses

Large gains

Large losses

Small gains

Small losses

Large losses

Large gains

Small gains

Small losses

Large gains

Data

Large losses

Small gains

Small losses

.86

Tradeoff model

Fig. 3. Indifference data from choice-based matching (Study 1): The effect of outcome magnitude and outcome sign for standard delays (t), additively increased delays (a + t), and multiplicatively increased delays (m 9 t). Observed and predicted values of d (Eq. 2) displayed along a logarithmic scale. The horizontal line is the prediction of the exponential discounting model, which does not account for any variation in d.

states that the odds of choosing LL over SS should equal the odds of choosing LL over MM and the odds of choosing MM over SS combined: XLS ¼ XLM  XMS : Under the ratio interpretation of Luce’s (1959) choice axiom (Stott, 2006), we have, for the hyperbolic discounting model, the following equality: dðtL ÞvðxL Þ dðtL ÞvðxL Þ dðtM ÞvðxM Þ ¼  : dðtS ÞvðxS Þ dðtM ÞvðxM Þ dðtS ÞvðxS Þ The product rule holds because the discounted value of MM cancels out from the equation. Both the interval discounting model and the tradeoff model predict violations of the product rule. The difference is, however, that, in the interval discounting model, the violations are driven entirely by the nonadditivity of the interval discount function, whereas, in the tradeoff model, they are driven not only by the nonadditivity of the tradeoff function but also by the tradeoff rule itself, that is, by model structure. In the interval discounting model, additivity in intervals would require the following equality:

420

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

DðtS ; tL ÞvðxL Þ DðtM ; tL ÞvðxL Þ DðtS ; tM ÞvðxM Þ ¼  ; vðxS Þ vðxM Þ vðxS Þ or DðtS ; tL Þ ¼ DðtM ; tL Þ  DðtS ; tM Þ; which holds if both subadditivity and superadditivity vanish from the interval discount function D, in which case we have ebðwðtL ÞwðtS ÞÞ ¼ ebðwðtL ÞwðtM ÞÞ  ebðwðt;M ÞwðtS ÞÞ : For the tradeoff model, it is different. If subadditivity and superadditivity vanish from the tradeoff function, additivity in intervals requires the following equality: vðxL Þ  vðxS Þ vðxL Þ  vðxM Þ vðxM Þ  vðxS Þ ¼  ; jðwðtL Þ  wðtS ÞÞ jðwðtL Þ  wðtM ÞÞ jðwðtM Þ  wðtS ÞÞ which, when the valued outcomes and weighted delays are given, holds for only one value of the tradeoff parameter j, as can be seen by rearranging the above equality: j

vðxL Þ  vðxS Þ vðxL Þ  vðxM Þ vðxM Þ  vðxS Þ ¼  : wðtL Þ  wðtS Þ wðtL Þ  wðtM Þ wðtM Þ  wðtS Þ

In general, however, relatively large attribute differences outweigh relatively small attribute differences by a wider margin over a set of multiple subranges than over a single undivided range. Thus, we have superadditivity in intervals when the outcome differences are relatively large. For instance, 300  100 300  200 200  100 ¼ 4\16 ¼  : 25ð3  1Þ 25ð3  2Þ 25ð2  1Þ Conversely, we have superadditivity in compensations, or subadditivity in intervals, when the outcome differences are relatively small. For instance, 30  10 30  20 20  10 ¼ 0:4 [ 0:16 ¼  : 25ð3  1Þ 25ð3  2Þ 25ð2  1Þ Relative nonadditivity is this changing pattern of nonadditivity in intervals with the changing magnitude of the compensations. Thus, to accommodate relative nonadditivity, the tradeoff model does not need inseparability (Scholten & Read, 2010): Relative nonadditivity follows immediately from the combination of the tradeoff rule and the ratio choice rule.

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

421

To accommodate relative nonadditivity, the tradeoff model relies critically on the assumption that, as in Restle’s (1961) choice model, choice odds derive from the ratio between attribute differences. If choice odds derived from the difference between attribute differences, analogously to the logistic choice rule, or the difference interpretation of Luce’s (1959) choice axiom (see also Stott, 2006), the tradeoff model would, when subadditivity and superadditivity vanish from the tradeoff function, predict additivity in intervals, and it would therefore not predict relative nonadditivity. Thus, if accurately predicted by the tradeoff model, relative nonadditivity validates the combination of the tradeoff rule and the ratio choice rule. In sum, relative nonadditivity distinguishes a model in which outcomes are weighted against time (the tradeoff model) from models in which outcomes are weighted by time (the discounting models, including the interval discounting model): Relative nonadditivity only obtains when computing differences along the time attribute and the outcome attribute. We next apply the discounting models and the tradeoff model to preference data from choice.

8. Study 2: Preference data from choice 8.1. Method 8.1.1. Participants The participants were 378 online workers on Mechanical Turk, 52% male, and averaging 46 years of age. Most of them had completed high school (40%) or had an academic degree (58%), and most were employed (58%) or students (18%). They were paid 40 cents for 5-min work. 8.1.2. Design and procedure The design included four sextuples of option pairs, with each sextuple comprised four options, denoted SS, MM, LL, and XX. Three sextuples had “large” outcomes (larger than $500), and either “short” intervals spanned by “short” delays (1, 2, 3, and 4 weeks), “short” intervals spanned by “long” delays (7, 8, 9, and 10 weeks), or “long” intervals spanned by delays of 1, 4, 7, and 10 weeks. The fourth sextuple had “small” outcomes (smaller than $150) and “long” intervals spanned by delays of 1, 4, 7, and 10 weeks. The 24 option pairs are given in Table 5. For each option pair, the interest rate offered was 6%, compounded weekly.10 The outcomes were rounded to the nearest $5. The order of the sextuples and the order of the option pairs within each sextuple were randomized across participants. 8.1.3. Model specification and evaluation The models predict, and are evaluated on, the raw choices made by all participants among all option pairs (378 9 24 = 9,072 in total). If we take the right-hand side and the left-hand side of the model equations for indifference (Eqs. 1, 3, 6, and 9), the model specification for the strength of preference for LL over SS is p^ = RHS1/e/(RHS1/e + LHS1/e),

422

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

Table 5 Preference data from choice (Study 2): Delays, outcomes, and choice odds (N = 378) Large Outcomes

Large Outcomes

Short Intervals, Short Delays tS 1 2 3 1 2 1

a

tLa

xS

b

xLb

2 3 4 3 4 4

515 530 545 515 530 515

530 545 560 545 560 560

Large Outcomes

Short Intervals, Long Delays

Ω

tSa

tLa

xS

b

xLb

1.377 1.319 1.363 0.899 0.791 0.688

7 8 9 7 8 7

8 9 10 9 10 10

605 620 635 605 620 605

620 635 650 635 650 650

Small Outcomes

Long Intervals

Ω

tSa

tLa

xS

b

xLb

1.821 1.625 2.203 1.185 1.377 1.277

1 4 7 1 4 1

4 7 10 7 10 10

515 560 605 515 560 515

560 605 650 605 650 650

Long Intervals Ω

tSa

tLa

xSb

xLb

Ω

0.680 0.775 1.021 0.602 0.750 0.602

1 4 7 1 4 1

4 7 10 7 10 10

115 125 135 115 125 115

125 135 145 135 145 145

0.317 0.273 0.336 0.273 0.308 0.290

a

Delays in weeks. Outcomes in dollars.

b

where p^ is the predicted value of p, and e > 0 is a “noise parameter” (Andersen, Harrison, Lau, & Rutstr€ om, 2010): If e approaches 0, there is no noise, and choice favors the best option for certain; if e approaches infinity, there is only noise, and choice is a coin flip. We set a lognormal prior on e for each model, with a mean of zero and a standard deviation of one. The predicted value of p, which is the same for all participants, is then evaluated against all raw choices using a binomial distribution B(N, p^), where N is the number of participants. We thus compute the binomial probability with which n out of N participants would choose LL instead of SS, assuming that, for each participant, the probability of making that choice is p^: 8.2. Results We submitted choices of LL to two t-tests. Choice of LL among “large” outcomes was more likely for “long” delays than for “short” ones, t(377) = 6.87, p < .005 (common difference effect), and choice of LL over “long” intervals was more likely for “large” outcomes than for “small” ones, t(377) = 12.32, p < .005 (absolute magnitude effect). The odds of choosing LL are given in Table 5. For each sextuple, there were 26 = 64 unique choice patterns. Of these choice patterns, 24 were transitive (two of which are single-choice patterns), 16 subadditive, 16 superadditive, and 8 both subadditive and superadditive. By chance alone, the expected incidence rates would be 3.125% (single-choice patterns), 34.375% (remaining transitive patterns), 25% (subadditive patterns), 25% (superadditive patterns), and 12.5% (subadditive and superadditive patterns). The observed incidence rates were, averaged across sextuples, 26.5%, 54.5%, 5%, 13%, and 1%. Thus, transitive patterns, and in particular single-choice patterns, were overrepresented, whereas intransitive patterns were underrepresented. Even so, the intransitive patterns represented one-fifth of all observed patterns. Superadditive patterns were between two and three times as common as subadditive patterns.

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

423

The posterior probabilities of the candidate models were conclusive: They were nearly 0 for all discounting models and nearly 1 for the tradeoff model. The Bayes factor in favor of the tradeoff model over its closest competitor, the interval discounting model, was 9.73 9 1026. Table 6 shows the maximum a posteriori parameters. The tradeoff model identifies nonadditivity in the tradeoff function (a and ϑ) over and above the nonadditivity implied by the combination of the tradeoff rule and the ratio choice rule. Fig. 4 shows the observed values of Ω and the values predicted by the candidate models. The top panel shows the predictions of the exponential discounting model and the hyperbolic discounting model. The exponential discounting model predicts uniformly higher Ω for shorter intervals than for longer ones, while the hyperbolic discounting model adds to this a uniformly higher Ω for intervals that begin later than for those that begin sooner. The bottom panel shows the predictions of the interval discounting model and the tradeoff model. The tradeoff model does a great job in reproducing the quantitative as well as qualitative patterns in the data, among which relative nonadditivity: a reversal from superadditivity for the large outcomes and short intervals to subadditivity for the small outcomes and long intervals, with an irregular pattern for the large outcomes and long intervals in between. The interval discounting model, in contrast, does not produce accurate predictions. It does reproduce the superadditive patterns for the large outcomes and short intervals, both qualitatively and quantitatively, but it does not reproduce the irregular pattern for the large outcomes and long intervals, and the subadditive pattern for the small outcomes and long intervals. Furthermore, it underpredicts the common difference effect: Although it accurately predicts Ω for the short delays, it underpredicts Ω for the long ones. Therefore, it succeeds qualitatively, but it fails quantitatively. The tradeoff model performed very well in both analyses reported so far. Our quantitative analyses show that the differences in predictive accuracy between the candidate models are far from marginal, with the tradeoff model convincingly winning out over the Table 6 Preference data from choice (Study 2): Maximum a posteriori parametersa Model Discounting Exponential

Hyperbolic

Interval

Trade-off

Parameter

Estimate

Parameter

Estimate

Parameter

Estimate

Parameter

Estimate

db

0.8708

b s

0.0214 0.0446

c l ɛ

0.7526 0.0527 0.0116

b s a ϑ c l ɛ

0.1093 0.0501 2.7328 2.3151 0.7923 0.0152 0.0218

j s a ϑ c

1.0597 0.0387 0.0529 1.1520 0.0315

ɛ

0.1869

ɛ a

1.0511

The models were estimated on 24 data points collected from 378 participants. d = eb solves for b = log(d) = .1384.

b

424

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014) 10 9 8 7 6 5 4 3 2

1 .9 .8 .7 .6 .5 .4

Large outcomes

.3

Short intervals, short delays Short intervals, long delays Long intervals

.2

.1 .09 .08 .07 .06 .05 .04 .03 .02

Small outcomes LM · MS LS XL · LM XM XL · LM · MS XM · MS XL · LS XS

LM · MS LS XL · LM XM XL · LM · MS XM · MS XL · LS XS

LM · MS LS XL · LM XM XL · LM · MS XM · MS XL · LS XS

.01

Data

Exponential discounting model

Hyperbolic discounting model

Long intervals

10 9 8 7 6 5 4 3 2

1 .9 .8 .7 .6 .5 .4

Large outcomes

.3

Short intervals, short delays Short intervals, long delays Long intervals

.2

.1 .09 .08 .07 .06 .05 .04 .03 .02

Small outcomes LM · MS LS XL · LM XM XL · LM · MS XM · MS XL · LS XS

LM · MS LS XL · LM XM XL · LM · MS XM · MS XL · LS XS

LM · MS LS XL · LM XM XL · LM · MS XM · MS XL · LS XS

.01

Data

Interval discounting model

Tradeoff model

Long intervals

Fig. 4. Preference data from choice (Study 2): Values of Ω for subdivided intervals, for example, LMMS = ΩLMΩMS, and undivided intervals, for example, LS = ΩLS. For “short” intervals, the length of S ? M, M ? L, and L ? X was 1 week, and, for “long” intervals, the length of these intervals was 3 weeks. Observed and predicted values of Ω displayed along a logarithmic scale.

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

425

discounting models. One may wonder, however, how the candidate models would compare if we did not stack the deck against discounting models. In our final analysis, we compare the models in an environment that is hospitable to all.

9. Study 3: Preference data from choice As in Study 2, we obtain preference data from a choice study. The difference is that, in Study 2, the experimental interest rate was constant across option pairs, whereas in this study, it varies. We use the questionnaire of Kirby et al. (1999), in which simple interest rates steadily increases from 0.00016 to 0.25 per day (strongly favoring SS and LL, respectively), to determine at which point the decision maker switches from SS to LL. Therefore, this questionnaire promotes choices that seem consistent with a model in which experimental interest rates are compared with an individual interest rate. However, apart from interest rates, the questionnaire manipulates outcome magnitude, the effect of which (i.e., the absolute magnitude effect) is inconsistent with such a model. Kirby et al. (1999) applied their questionnaire to a small sample of 60 participants, in which four choice proportions reached the lower asymptote of 0 and three reached the upper asymptote of 1. Because these choice proportions would drop out in our computation of log odds, we applied the questionnaire to a much larger sample, in the expectation that asymptotes would be avoided, and so they were: Choice proportions ranged from .037 to .950. 9.1. Method 9.1.1. Participants The participants were 518 members of the Yale School of Management virtual laboratory (eLab), 39% male, and averaging 36 years of age. Most had completed high school (35%) or had an academic degree (63%), and most were employed (62%) or students (14%). They responded to an online questionnaire related to several studies on intertemporal choice. As a reward, they were entered in a lottery offering a 1 in 50 chance to win a $50 Amazon.com gift certificate. 9.1.2. Questionnaire The questionnaire designed by Kirby et al. (1999) contains nine triplets. In each triplet, the experimental interest rate is constant, and outcome magnitude varies (small, medium, and large). However, outcome magnitude varies very little because the outcomes in the entire questionnaire range from $11 to $85. Across triplets, the experimental interest rate is inversely related to delay length. The 27 option pairs are given in Table 7. 9.1.3. Discount functions, discounting models, and the tradeoff model We again apply the hyperbolic discounting model, the interval discounting model, and the tradeoff model. However, because the questionnaire promotes comparisons between

426

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

Table 7 Preference data from choice (Study 3): Delays, outcomes, and choice odds (N = 518) tLa

xSb

xL b

Ω

tLa

xSb

xLb

Ω

tLa

xSb

xLb

Ω

186 117 162 179 160 157 136 111 119

34 54 78 28 47 80 22 54 67

35 55 80 30 50 85 25 60 75

0.028 0.040 0.042 0.038 0.075 0.066 0.070 0.107 0.161

80 89 91 53 62 61 29 30 30

25 49 69 19 40 55 24 34 54

30 60 85 25 55 75 35 50 80

0.191 0.408 0.546 0.570 0.614 1.262 1.114 1.800 2.866

19 21 20 13 14 14 7 7 7

14 27 41 15 25 33 11 20 31

25 50 75 35 60 80 30 55 85

2.674 3.933 4.886 5.907 13.800 13.389 11.333 15.710 18.923

a

Delays in days. Outcomes in dollars.

b

experimental and individual interest rates, we also apply three discount functions: The exponential discount function (previously referred to as the “exponential discounting model”), which assumes that participants use a continuously compounded interest rate; Mazur’s (1987) hyperbolic discount function, which assumes that participants use a simple interest rate; and Loewenstein and Prelec’s (1992) generalized hyperbolic discount function, which includes the other discount functions as special cases. The exponential discount function is eb t ¼ dt , where b > 0 is the continuously compounded interest rate, and the hyperbolic discount function is dðtÞ ¼ 1=ð1 þ s tÞ; where s > 0 is the simple interest rate. The discount functions are applied to raw outcomes, whereas in the “discounting models,” the generalized hyperbolic discount function and the interval discount function are applied to valued outcomes. 9.2. Results We conducted a repeated measures ANOVA on choices of LL for the 9 (interest rate) 9 3 (outcome magnitude) design. Choice of LL increased, following an S-shaped curve, with interest rate, F(8, 4,136) = 1213.54, p < .005, and increased with outcome magnitude, F(2, 1,034) = 252.23, p < .005. The odds of choosing LL are given in Table 7. As in Study 2, the posterior probabilities of the candidate models were conclusive: They were nearly 0 for all discount functions and discounting models and nearly 1 for the tradeoff model. The Bayes factor in favor of the tradeoff model over its closest competitor, the interval discounting model, was 6.72 9 1016. Table 8 shows the maximum a posteriori parameters. The discounting parameter s of the hyperbolic discount function is placed between the discounting parameters b and s of the generalized hyperbolic discount function, because it captures both discounting and the departure from exponential discounting. The top panel of Fig. 5 displays observed and predicted values of Ω as a function of interest rate. The tradeoff model underpredicts Ω for the lowest interest rate (0.00016),

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

427

while all other formulations overpredict it, such that they do not predict an increase of Ω as the interest rate increases from 0.00016 to the second lowest interest rate (0.0004). The bottom panel of Fig. 5 shows observed and predicted values of Ω as a function of outcome magnitude. The discount functions fail to accommodate the absolute magnitude effect, and the tradeoff model provides the closest approximation to it.

10. General discussion We compared three evaluation rules in the domain of intertemporal choice. Models representing the three rules were applied to indifference data, to which cognitive models are customarily applied in this domain, and preference data, which deserve more attention in the cognitive modeling of intertemporal choice. Our quantitative analyses strongly favored a fully attribute-based rule (the tradeoff model) over either a hybrid rule (the interval discounting model) or a fully alternative-based rule (the hyperbolic discounting model). The tradeoff model offered an accurate account of the data and won out over the competing models, which fail to account for subadditivity (the hyperbolic discounting model) and relative nonadditivity, or reversals from subadditivity to superadditivity when compensations change from small to large (the hyperbolic discounting model and the interval discounting model). Moreover, even when the experimental design was not specifically intended to produce nonadditivity, as in Study 3, the tradeoff model still greatly outperformed its rivals. This strongly suggests that intertemporal choice, at least elementary choices between a smaller sooner and a larger later outcome, is governed by direct comparisons along the attributes of time and outcome. In this discussion, we consider some issues raised by our work. We begin by asking why hyperbolic discounting performed poorly, when it has previously been seen to provide such a great fit to data. We then say a few words about two theoretical issues raised by our specific modeling. First, what is the role played by time in models of choice: Is it merely a decision weight, like probability, or is it a bearer of value in itself? Second, what is the nature of the “value function,” and is the value function of the tradeoff model applicable outside the domain of intertemporal choice? We conclude with a brief afterword on the relationship between our work and the cognitive processes of intertemporal choice. 10.1. The hyperbolic discounting model The hyperbolic discounting model performed poorly, when it has previously been shown to perform well. There are at least four reasons. First, the hyperbolic discounting model is usually applied to data in which the effect of the delay to the larger later outcome (diminishing absolute sensitivity to delays, or hyperbolic discounting) is confounded with the effect of the interval between the outcomes (subadditivity in intervals). This confound is the delay/interval effect discussed in Appendix B. When the experimental design produces both the common difference effect and subadditivity in intervals, so

ɛ

0.3454

0.9933

Estimate

ɛ

s

Parameter

0.3162

0.0091

Estimate

Hyperbolic

ɛ

b s

Parameter

0.3341

0.0076 0.0029

Estimate

Generalized Hyperbolic

b

0.0033 0.0056

0.9159 0.0026 0.1222

b s

c l ɛ

b s a ϑ c l ɛ

0.5413 1.3042 0.1965 4.4612 0.9167 0.0020 0.1130

Estimate

Interval Parameter

Discounting

Estimate

Hyperbolic Parameter

The models were estimated on 27 data points collected from 518 participants. d = eb solves for b = log(d) = .0068.

a

d

b

Parameter

Exponential

Discount Function

Model

Table 8 Preference data from choice (Study 3): Maximum a posteriori parametersa

ɛ

j s a ϑ c

Parameter

0.7827

2.1439 0.9106 0.0594 2.3781 0.0216

Estimate

Trade-off

428 M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

429

100 0.00016 0.0004 0.001 0.0025 0.006 0.016 0.041 0.1 0.25

10

Ω

1 0.1 0.01

Data

EDF

HDF

GHDF

HDM

IDM

TM

10

Ω

Small 1

Medium Large

0.1

Data

EDF

HDF

GHDF

HDM

IDM

TM

Fig. 5. Preference data from choice (Study 3). Values of Ω as a function of simple interest rate (top panel) and outcome magnitude (bottom panel). Observed and predicted values of Ω displayed along a logarithmic scale. EDF, exponential discount function; HDF, hyperbolic discount function; GHDF, generalized hyperbolic discount function; HDM, hyperbolic discounting model; IDM, interval discounting model; TM, tradeoff model.

that the delay/interval effect is unconfounded, the hyperbolic discounting model performs poorly, because it cannot deal with nonadditivity in intervals. Second, the hyperbolic discounting model has not previously been applied in its original form. As formulated by Loewenstein and Prelec (1992), and as formalized by us, the value function of the hyperbolic discounting model exhibits increasing elasticity to accommodate the absolute magnitude effect. However, the typical approach is (1) to assume hyperbolic discounting of raw outcomes, and not valued outcomes, (2) to apply the hyperbolic discounting model within, and not across, outcomes of different magnitude, and (3) to accommodate the absolute magnitude effect with multiple, outcome-dependent discounting parameters (e.g., Green, Fry, & Myerson, 1994), or with a single discounting parameter that is scaled as a function of outcome magnitude (Kirby, 1997; Kirby & Marakovic, 1996; Noor, 2011), and not a single value function over all outcomes. Third, the hyperbolic discounting model is usually estimated on the outcome that yields indifference between the options, not on the one-period discount factor derived from the indifference point (Eq. 2). Outcomes include basic discounting effects (Table 1), which any model, including the exponential discounting model, predicts. The “predictive accuracy” of a model is therefore grossly overestimated when using outcomes as dependent

430

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

variable, with R2 measures going well into the nineties, even for the exponential discounting model (Green et al., 1994; Kirby, 1997). Fourth, model evaluation usually involves fitting a model to data and then computing an R2 measure to evaluate the fit of the model to those data. This practice evaluates goodness of fit, not generalizability, which should be the criterion for cognitive modeling (Pitt, Myung, & Zhang, 2002). Goodness of fit may provide valuable information, but only if one controls for overfitting, which is done by evaluating the fit of the model to data to which it was not fitted. To address overfitting, Keller and Strazzera (2002) evaluated hyperbolic and exponential discounting models with the leave-one-out method. We used Bayes factors, which, to our knowledge, has not been done before in quantitative analyses of intertemporal choice. 10.2. Time has value The tradeoff model abandons the view that outcome is weighted by time and replaces it with the view that outcome is weighted against time. Because outcome is weighted against time, time has value, just as outcome does. However, outcomes carry value by themselves, time delays do not. Time differences do carry value, just as outcome differences do, but they only acquire value from something good or bad occurring sooner or later. Although time is fundamentally different from outcome, treating time as something that has value as well does seem to improve our understanding of intertemporal choice. 10.3. Value functions For the tradeoff model, we proposed a normalized logarithmic value function, which may be relevant for other domains as well, such as risky choice. Stott (2006) compared a wide range of value functions, probability-weighing functions, and choice rules for cumulative prospect theory, and reported that a power value function yielded a better fit to the data than a logarithmic value function. However, he did not compare maximally comparable specifications of these value functions. Specifically, he compared v(x) = xc with v(x) = log (c + x), while we would have compared v(x) = 1/(1 + c)x1/(1 + c) with v(x) = (1/c) log (1 + cx). These normalized value functions are “maximally comparable,” in that both range from identity functions, that is, v(x) = x (constant sensitivity, as c goes to 0) to zero functions, that is, v(x) = 0 (insensitivity, as c goes to infinity). The comparison between the value functions can then focus on the real issues. A difference between the two is that the power function exhibits constant elasticity, whereas the logarithmic function exhibits decreasing elasticity. Decreasing elasticity is one requirement for prospect theory to produce Markowitz’s (1952) four-fold pattern of risk preferences (Scholten & Read, 2014), in which, as the magnitude of the outcomes increases by increasing multiplicative constants, preferences shift from risk seeking to risk aversion in gains, and from risk aversion to risk seeking in losses. A priori, then, the logarithmic function would seem a better candidate for operationalizing prospect theory than the power function.

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

431

10.4. Afterword: The tradeoff model and the decision-making process The tradeoff model was designed to be what Wakker (2010) called a homeomorphic model of decision making. He drew the distinction between paramorphic and homeomorphic models. A model is paramorphic if it describes the empirical phenomena of interest correctly, but the processes underlying the empirical phenomena are not matched by processes in the model. A model is homeomorphic if not only its empirical phenomena match reality but also its underlying processes do so. Wakker’s example of a homeomorphic model is cumulative prospect theory. The tradeoff model was intended to capture what we viewed as a strong and plausible intuition of how people make intertemporal choices. Rather than putting an independent value on options, computing net present values as an accountant would when evaluating the potential profitability of different projects, we believed that decision makers are likely to approach intertemporal choices by simply considering how much time they have to wait, and how much more they will gain or lose when waiting. We did not seek to collect process data, but recently Arieli, Ben-Ami, and Rubinstein (2011) have described eye-tracking studies of information acquisition for both risky choice and intertemporal choice, designed expressly to determine whether they are governed by alternative-based or attribute-based choice.11 For intertemporal choice they make an unequivocal claim: … it is hard to imagine that any of the participants made a “present-value-like” computation [alternative-based] which would have involved vertical eye movements. Indeed, we found that 2/3 of eye movements were horizontal. Thus, participants clearly used a C-procedure [attribute-based]; in other words, they based their decisions on comparing sums of money and delivery dates separately. (p. 73) By “horizontal” Arieli et al. (2011) mean that respondents almost always made intraattribute comparisons, comparing delays to delays, and gains to gains. This is what we would expect if people are computing differences between delays and differences between outcomes, which will then be compared. While Arieli et al. were not attempting to validate the details of the tradeoff model, their results give strong support to the view that its structure is essentially the right one to use as a starting point for subsequent research into intertemporal choice. One limitation of our results is that we restricted our attention to elementary choices, in which a single smaller sooner outcome was pitted against a single larger later one. The tradeoff model is easily adapted to such contexts, but when the options are temporal sequences, further issues arise concerning how a sequence of delays and a sequence of outcomes can be weighed against one another. As a first step in this direction, we recently extended the tradeoff model to choices involving elementary sequences of two outcomes (Read & Scholten, 2012). The extended version of the tradeoff model offers a unified account of all recently discovered anomalies in this restricted domain. Further extension is left as a challenge for future research.

432

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

Acknowledgments This work was supported by the Fundacß~ao para a Ci^encia e Tecnologia (project POCI 2010; grant numbers PTDC/PSI-PCO/101447/2008, PTDC/ MHC-PCN/3805/2012), by the Economic and Social Research Council (grant number ES/K002201/1), and by the Leverhulme Trust (grant number RP2012-V-022). We thank Joaquim Judice (Universidade de Coimbra), Jo~ao Patrıcio (Instituto Politecnico de Tomar), and Peter Wakker (Erasmus University, Econometric Institute, Rotterdam, the Netherlands) for methodological and theoretical advice.

Notes 1. The exponential discounting model is commonly and erroneously referred to as a normative model. A normative model states that, in a perfect capital market, a rational agent will have a single discount rate for money, corresponding to the opportunity cost for capital, and not merely, or even necessarily, that the discount rate be constant. Thus, a normative model entails a specific discount rate, while the exponential discounting model allows the single discount rate to take on any value. Nonetheless, the major anomalies in intertemporal choice are defined as being anomalous to the exponential discounting model (for a recent discussion of the rational model, see Read, Frederick, & Scholten, 2013). 2. When we discuss empirical results, we use the term “discounting” as a description of an empirical regularity, not as something that is necessarily produced by discounting per se. 3. The value function exhibits (1) constant elasticity when c = ½, so that v(x) = ½(1 + µ)x½ and e = ½, or when l = 0, so that v(x) = (1  c)x1  c and e = 1  c; (2) unit elasticity (i.e., constant sensitivity) when c = 1, so that v(x) = lx, and (3) zero elasticity (i.e., insensitivity) when c = 1 and l = 0, so that v(x) = 0. 4. The designation “weighted delays” should be taken to mean “delays as submitted to a time-weighing function.” In the tradeoff model, the weighing of delays actually amounts to a valuation of delays, because, by weighing outcomes against delays, delays have value. More on this in the Section 10. 5. Responsiveness to interest rates is tantamount to discounting. In the tradeoff model, it requires diminishing absolute sensitivity to outcomes. 6. We also tried other choices for the prior distribution, including exponential distributions and Pareto distributions. These alternative prior distributions do not change the conclusions in any of the experiments. 7. There are many different methods available for estimating the marginal likelihood. We chose the candidate’s formula because simpler methods, such as importance sampling (Hammersley & Handscomb, 1964) and the harmonic mean estimator (Newton & Raftery, 1994), have been criticized when applied to problems where

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

8.

9.

10.

11. 12.

433

the posterior distribution is concentrated (or “narrow”) relative to the prior (Raftery, 1996; discussion by R. M. Neal in Newton & Raftery, 1994), which is the case in our data. Another recommended method, the Savage–Dickey estimator (Dickey, 1971; Wagenmakers, Lodewyckx, Kuriyal, & Grasman, 2010), only applies to nested models. In a separate ANOVA, with country (United Kingdom – Portugal) included among the variables, there were no interaction between this variable and the variables in the stimulus design (all Fs < 1). Most models that account for the common difference effect imply that d will be higher for A than for M even if E is as long as L, because d increases bypaffiffiffiffiffiffiffiffiffi greater ffi proportion from E to A than from A to L, that is, dA/dE > dL/dA or dA > dE dL . In the choice-based matching study reported by Scholten and Read (2006), the participants demanded on average an interest rate of 6.1% per week. In the choice-based matching study reported by us as Study 1, the participants demanded on average an interest rate of 6.7% per month. These weekly and monthly interest rates correspond to annual interest rates of 2173.7% and 217.8%, respectively. In the choice study reported by Scholten and Read (2010, Anomaly 2), participants demanded about 2% per year. These results show how much the design of a study, such as stating the delays as weeks, months, or years, unintentionally frames intertemporal choice. Arieli et al. (2011) use the terms “Holistic” or H-procedures, and “Component” or C-procedures, where we use, respectively, alternative based and attribute based. Consider the indifference point between a smaller immediate amount and a larger later one: ½l1 v1 ðxS Þ þ l2 v2 ðxS Þ=dðtL Þ ¼ l1 v1 ðxL Þ þ l2 v2 ðxL Þ. From this equation, xL cannot be isolated because we have v1 2 ð½l1 v1 ðxS Þ þ l2 v2 ðxS Þ=l2 dðtL Þ  l1 v1 ðxL Þ=l2 Þ ¼ xL , where xL appears both at left-hand and at the right-hand side.

References Ainslie, G. (1975). Specious reward: A behavioral theory of impulsiveness and impulse control. Psychological Bulletin, 82, 463–496. Andersen, S., Harrison, G. W., Lau, M. I., & Rutstr€om, E. E. (2010). Behavioral econometrics for psychologists. Journal of Economic Psychology, 31, 553–576. Arieli, A., Ben-Ami, Y., & Rubinstein, A. (2011). Tracking decision makers under uncertainty. American Economic Journal: Microeconomics, 3, 68–76. Besag, J. (1989). A candidate’s formula: A curious result in Bayesian prediction. Biometrika, 76, 183. Booij, A. S., van Praag, B. M. S., & van de Kuilen, G. (2010). A parametric analysis of prospect theory’s functionals for the general population. Journal of Risk and Uncertainty, 68, 115–148. Brooks, S. P., & Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7, 434. Chapman, G. B. (1996). Temporal discounting and utility for health and money. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 771–791. Chib, S. (1995). Marginal likelihood from the Gibbs output. Journal of the American Statistical Association, 90, 1313–1321.

434

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

Chib, S., & Jeliazkov, I. (2001). Marginal likelihood from the Metropolis–Hastings output. Journal of the American Statistical Association, 96, 270–281. Dickey, J. (1971). The weighted likelihood ratio, linear hypotheses on normal location parameters. The Annals of Mathematical Statistics, 42, 204–223. Green, L., Fry, A., & Myerson, J. (1994). Discounting of delayed rewards: A life-span comparison. Psychological Science, 5, 33–36. Hammersley, J. M., & Handscomb, D. C. (1964). Monte Carlo methods. London: Methuen & Co. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 363–391. Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773–795. Keller, L. R., & Strazzera, E. (2002). Examining predictive accuracy among discounting models. Journal of Risk and Uncertainty, 24, 143–160. Killeen, P. R. (2009). An additive-utility model of delay discounting. Psychological Review, 116, 602–619. Kirby, K. N. (1997). Bidding on the future: Evidence against normative discounting of delayed rewards. Journal of Experimental Psychology: General, 126, 54–70. Kirby, K. N., & Marakovic, N. N. (1996). Delay-discounting probabilistic rewards: Rates decrease as amounts increase. Psychonomic Bulletin & Review, 3, 100–104. Kirby, K. N., Petry, N. M., & Bickel, W. K. (1999). Heroin addicts have higher discount rates for delayed rewards than non-drug-using controls. Journal of Experimental Psychology: General, 128, 78–87. K€ obberling, V., & Wakker, P. P. (2005). An index of loss aversion. Journal of Economic Theory, 122, 119–131. Laibson, D. (1997). Golden eggs and hyperbolic discounting. Quarterly Journal of Economics, 62, 443–477. Loewenstein, G., & Prelec, D. (1992). Anomalies in intertemporal choice: Evidence and an interpretation. Quarterly Journal of Economics, 107, 573–597. Luce, R. D. (1959). Individual choice behavior: A theoretical analysis. New York: Wiley. Markowitz, H. (1952). The utility of wealth. Journal of Political Economy, 60, 151–158. Mazur, J. E. (1987). An adjusting procedure for studying delayed reinforcement. In M. L. Commons, J. E. Mazur, L. A. Nevin, & H. Rachlin (Eds.), Quantitative analyses of behavior V: The effect of delay and of intervening events on reinforcement value (pp. 55–73). Hillsdale, NJ: Lawrence Erlbaum. McAlvanah, P. (2010). Subadditivity, patience, and utility: The effects of dividing time intervals. Journal of Economic Behavior and Organization, 76, 325–337. Mellers, B. A., & Biagini, K. (1994). Similarity and choice. Psychological Review, 101, 505–518. Newton, M., & Raftery, A. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society, Series B (Methodological), 56, 3–48. Noor, J. (2011). Intertemporal choice and the magnitude effect. Games and Economic Behavior, 72, 255–270. al-Nowaihi, A., & Dhami, S. (2009). A value function that explains magnitude and sign effects. Economics Letters, 105, 224–229. Payne, J. W., Bettman, J. R., & Johnson, E. J. (1988). Adaptive strategy selection in decision making. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 534–552. Pitt, M. A., Myung, I. J., & Zhang, S. (2002). Toward a method of selecting among computational models of cognition. Psychological Review, 109, 472–491. Prelec, D., & Loewenstein, G. (1991). Decision making over time and under uncertainty: A common approach. Management Science, 37, 770–786. Raftery, A. E. (1996). Hypothesis testing and model selection. In W. R. Gilks, D. J. Spiegelhalter, & S. Richardson (Eds.), Markov Chain Monte Carlo in Practice (pp. 163–188). London: Chapman and Hall. Read, D. (2001). Is time-discounting hyperbolic or subadditive? Journal of Risk and Uncertainty, 23, 5–32. Read, D., Frederick, S., & Airoldi, M. (2012). Four days later in Cincinnati: Longitudinal tests of intertemporal preference reversals due to hyperbolic discounting. Acta Psychologica, 140, 177–185. Read, D., Frederick, S., & Scholten, M. (2013). DRIFT: An analysis of outcome framing in intertemporal choice. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39, 573–588.

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

435

Read, D., & Roelofsma, P. H. M. P. (2003). Subadditive versus hyperbolic discounting: A comparison of choice and matching. Organizational Behavioral and Human Decision Processes, 91, 140–153. Read, D., & Scholten, M. (2012). Tradeoffs between sequences: Weighing accumulated outcomes against outcome-adjusted delays. Journal of Experimental Psychology: Learning, Memory, and Cognition, 38, 1675–1688. Restle, F. (1961). Psychology of judgment and choice: A theoretical essay. New York: John Wiley and Sons. Roelofsma, P. H. M. P., & Read, D. (2000). Intransitive intertemporal choice. Journal of Behavioral Decision Making, 13, 161–177. Rosenthal, J. (2011). Optimal proposal distributions and adaptive MCMC. In S. Brooks, A. Gelman, G. Jones, & X.-L. Meng (Eds.), Handbook of Markov Chain Monte Carlo (pp. 93–112). London: Chapman and Hall. € uler, A. (2009). An investigation of time inconsistency. Management Science, 55, 470–482. Sayman, S., & Onc€ Scheibehenne, B., Rieskamp, J., & Gonzalez-Vallejo, C. (2009). Cognitive models of choice: Comparing decision field theory to the proportional difference model. Cognitive Science, 33, 911–939. Scholten, M., & Read, D. (2006). Discounting by intervals: A generalized model of intertemporal choice. Management Science, 52, 1426–1438. Scholten, M., & Read, D. (2010). The psychology of intertemporal tradeoffs. Psychological Review, 117, 925–944. Scholten, M., & Read, D. (2013). Time and outcome framing in intertemporal tradeoffs. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39, 1192–1212. Scholten, M., & Read, D. (2014). Prospect theory and the “forgotten” fourfold pattern of risk preferences. Journal of Risk and Uncertainty. Shafir, E. B., Osherson, D. N., & Smith, E. E. (1993). The advantage model: A comparative theory of evaluation and choice under risk. Organizational Behavior and Human Decision Processes, 55, 325–378. Stott, H. P. (2006). Cumulative prospect theory’s functional menagerie. Journal of Risk and Uncertainty, 32, 101–130. Takahashi, T. (2005). Loss of self-control in intertemporal choice may be attributable to logarithmic timeperception. Medical Hypotheses, 65, 691–693. Takahashi, T., Oono, H., & Radford, M. H. B. (2008). Psychophysics of time perception and intertemporal choice models. Physica A, 387, 2066–2074. Thaler, R. (1981). Some empirical evidence on dynamic inconsistency. Economics Letters, 8, 201–207. Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5, 297–323. Wagenmakers, E.-J., Lodewyckx, T., Kuriyal, H., & Grasman, R. (2010). Bayesian hypothesis testing for psychologists: A tutorial on the Savage-Dickey method. Cognitive Psychology, 60, 158–189. Wakker, P. P. (2010). Prospect theory for risk and ambiguity. Cambridge, UK: Cambridge University Press.

Appendix A: Loewenstein and Prelec’s (1992) value function Suppose that, for several option pairs, we obtain the point of indifference between SS and LL by adjusting xL. In the hyperbolic discounting model, the indifferent point is described by Eq. 3. The conventional way of estimating a model is to solve the model equation for xL, and to minimize the sum of squared deviations between observed and predicted values of xL across option pairs. This is explicit estimation. Given the value function in Eq. 5, however, the model equation cannot be solved for xL.12 With xL trapped inside the model, we must minimize the overall discrepancy between the left-hand side and righthand side of the model equation; that is, we must resort to implicit estimation. This, how-

436

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

ever, exposes us to degenerate solutions, or parameter configurations for which the model equation is true regardless of the data. Given the value function in Eq. 5, the parameter search routine can reach a degenerate solution by setting 1 – c = l = 0. With these parameter settings, the model equation is d(tS) 9 0 = d(tL) 9 0, which is true regardless of the data. To reduce exposure to degenerate solutions, the value function may be modified as follows: 8 c 1 c 1 < 1þc x1þc þ l 1þc x1þc if x  0 h i vðxÞ ¼ c 1 c : k 1 ðxÞ1þc þ ðl þ rÞ 1þc ðxÞ1þc if x\0; 1þc where c > 1 is diminishing absolute sensitivity to outcomes. The parameter search routine can then only approach a degenerate solution by setting l = 0 and letting c approach infinity. This reduces, but does not eliminate, exposure to degenerate solutions. The above value function is an additive combination of two power functions. A multiplicative combination was proposed by al-Nowaihi and Dhami (2009): ( if x  0 xc1 ðm þ l xÞc2 c1 vðxÞ ¼ c1 c2 c1 if x\0; kðxÞ ðm  ðl þ rÞxÞ where 0 < c1 < c2 ≤ 1, and m, l, r > 0. Given this value function, the parameter search routine can reach a degenerate solution by setting b = c1 = c2 = 0, where b is the discounting parameter of Loewenstein and Prelec’s (1992) generalized hyperbolic discount function in Eq. 4. With these parameter settings, the model equation is 1 9 1 = 1 9 1, which is true regardless of the data.

Appendix B: The interval discount function and the time-weighing function The interval discount function in Eq. 7 and the time-weighing function in Eq. 8 differ from their original specifications (Scholten & Read, 2006). The interval discount function first divides the effective interval by ϑ and then raises the result to the power ϑ. Without the prior division, the inflection point of the inverse S-shaped discount function is bounded from above by 1, so that superadditivity can occur only over a very restricted range of intervals. With the prior division, the inflection point is bounded from above by ϑ, so that superadditivity can occur over any range of intervals before it reverses into subadditivity. In addition, the time-weighing function is a normalized logarithmic function rather than a power function. This is to satisfy a formal requirement that we call alpha-tau equivalence. When the smaller outcome is immediate (tS = 0) and superadditivity vanishes (ϑ = 1), subadditivity (a) and diminishing absolute sensitivity to delays (s) are confounded into a single delay/interval effect: When the delay to the larger outcome (tL), and therefore the interval

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

437

between the outcomes (tL  tS), increases, per-period discounting may decrease either because of diminishing absolute sensitivity to delays or because of subadditivity, and these cannot be disentangled. Alpha-tau equivalence requires that the discount function be the same regardless of whether the delay/interval effect is attributed to a or s. This requirement is satisfied when the interval discount function in Eq. 7 is combined with the normalized logarithmic time-weighing function in Eq. 8. When subadditivity (a) vanishes, Dð0; tL Þ ¼ ð1 þ awðtL ÞÞb=a and a ! 0 yields ebwðtL Þ ¼ eb=s logð1þs tL Þ ¼ ð1 þ s tL Þb=s ¼ dðtL Þ: When diminishing absolute sensitivity to delays (s) vanishes, Dð0; tL Þ ¼ ð1 þ awðtL ÞÞb=a ¼ ð1 þ a=s logð1 þ s tL ÞÞb=a and s ! 0 yields ð1 þ atL Þb=a ¼ dðtL Þ Thus, in both cases, the interval discount function reduces to Loewenstein and Prelec’s (1992) generalized hyperbolic discount function. Alpha-tau equivalence is not satisfied with other time-weighing functions, for example, the power function w(t) = ts, where 0 < s < 1 is diminishing absolute sensitivity to delays (as adopted by Scholten & Read, 2006). When subadditivity (a) vanishes, Dð0; tL Þ ¼ ð1 þ awðtL ÞÞb=a and a ! 0 yields ebwðtL Þ ¼ eb tL 6¼ dðtL Þ; s

meaning that the interval discount function does not reduce to the generalized hyperbolic discount function.

Appendix C: Details on the estimation procedure To collect samples from the posterior distribution, we ran the Metropolis–Hastings version of the MCMC sampler twice for two million samples during each run for each model. Each MCMC run started at a random sample from the prior distribution. The first million samples were used as a burn-in period to allow the sampler to “forget” its start state and begin to sample from the posterior distribution; these burn-in samples were then discarded. The burn-in period was also used to calibrate the proposal distribution used by the Metropolis–Hastings algorithm to select potential new states for the Markov chain because a good choice of proposal distribution can make a large difference in the efficiency of the sampler. We used an adaptive proposal distribution: Setting the covariance of the Gaussian proposal distribution to be a constant multiple of the covariance of the already collected samples, plus a small additive diagonal factor to ensure the covariance matrix did not collapse to zero (Rosenthal, 2011). Convergence of the sampler was evalu-

438

M. Scholten, D. Read, A. Sanborn / Cognitive Science 38 (2014)

ated by comparing the variance between and within the two runs of the sampler using the R^p statistic (Brooks & Gelman, 1998). The values R^p indicated that the post burn-in samples had converged. The marginal likelihoods were then estimated as described above separately for each of the two runs and then averaged to produce a final estimate.

Weighing outcomes by time or against time? Evaluation rules in intertemporal choice.

Models of intertemporal choice draw on three evaluation rules, which we compare in the restricted domain of choices between smaller sooner and larger ...
400KB Sizes 0 Downloads 0 Views