Psychopharmacology DOI 10.1007/s00213-015-3904-3

ORIGINAL INVESTIGATION

Noradrenergic modulation of risk/reward decision making David R. Montes & Colin M. Stopper & Stan B. Floresco

Received: 10 November 2014 / Accepted: 23 February 2015 # Springer-Verlag Berlin Heidelberg 2015

Abstract Rationale Catecholamine transmission modulates numerous cognitive and reward-related processes that can subserve more complex functions such as cost/benefit decision making. Dopamine has been shown to play an integral role in decisions involving reward uncertainty, yet there is a paucity of research investigating the contributions of noradrenaline (NA) transmission to these functions. Objectives The present study was designed to elucidate the contribution of NA to risk/reward decision making in rats, assessed with a probabilistic discounting task. Methods We examined the effects of reducing noradrenergic transmission with the α2 agonist clonidine (10–100 μg/kg), and increasing activity at α2A receptor sites with the agonist guanfacine (0.1–1 mg/kg), the α2 antagonist yohimbine (1– 3 mg/kg), and the noradrenaline transporter (NET) inhibitor atomoxetine (0.3–3 mg/kg) on probabilistic discounting. Rats chose between a small/certain reward and a larger/risky reward, wherein the probability of obtaining the larger reward either decreased (100–12.5 %) or increased (12.5–100 %) over a session. Results In well-trained rats, clonidine reduced risky choice by decreasing reward sensitivity, whereas guanfacine did not affect choice behavior. Yohimbine impaired adjustments in decision biases as reward probability changed within a session by altering negative feedback sensitivity. In a subset of rats that displayed prominent discounting of probabilistic rewards, the lowest dose of atomoxetine increased preference for the large/risky reward when this option had greater long-term utility.

D. R. Montes : C. M. Stopper : S. B. Floresco (*) Department of Psychology and Brain Research Centre, University of British Columbia, Vancouver, BC V6T 1Z4, Canada e-mail: [email protected]

Conclusions These data highlight an important and previously uncharacterized role for noradrenergic transmission in mediating different aspects of risk/reward decision making and mediating reward and negative feedback sensitivity. Keywords Decision making . Noradrenaline . Risk . Probabilistic discounting . α2 Receptors . Rat

Introduction Impairments in risk/reward decision making are central to a variety of psychiatric disorders, yet the neurochemical underpinnings of these types of evaluations have only recently been studied in detail. Recent investigations have focused primarily on dopamine, mapping out the receptors and neural circuits where it may act to mediate these functions. Pharmacological manipulations of dopamine transmission can exert a marked impact on the direction of choice assessed with a probabilistic discounting task where rats choose between smaller certain rewards and larger, uncertain ones. Systemic treatment with D1 or D2 antagonists reduces risky choice, with the effects of D1 antagonism mediated through actions within the medial prefrontal cortex (mPFC) and nucleus accumbens (St. Onge and Floresco 2009; St. Onge et al. 2010, 2011; Stopper et al. 2013). Low striatal D1 expression results in overweighting of low probabilities and underweighting of high probabilities, and increased risk-taking is associated with increased NAc shell D1 expression (Takahashi et al. 2010; Simon et al. 2011). Accordingly, stimulating NAc D1 receptors optimizes risk/reward decision making by biasing choice toward the option of greater utility as probabilities change (Stopper et al. 2013). The contribution of D2 receptors to risk/reward decision making is less conclusive, with variable findings

Psychopharmacology

from different brain regions and tasks (Ghods-Sharifi et al. 2009; Zeeb et al. 2009; St. Onge et al. 2011; Simon et al. 2011; Stopper et al. 2013). Alternatively, stimulation of D3 receptors, presumably via autoreceptor activation, decreases risk-taking and sensitivity to unexpected reward by reducing striatal and midbrain activity (Riba et al. 2008; St. Onge and Floresco 2009; Stopper et al. 2013). Increasing dopamine activity with drugs such as amphetamine impairs adjustments in choice biases when reward probabilities change. These treatments increase risky choice when the likelihood of obtaining reward decreases over time and has the opposite effect when reward probabilities are initially low and then increase, in a manner similar to the effects of inactivation of the mPFC(St. Onge and Floresco 2009, 2010; St. Onge et al. 2011). In comparison, the involvement of noradrenaline (NA) in risk/reward decision making has received little attention. Contemporary theory of noradrenergic function has posited that NA plays a key role in balancing exploitation of familiar options with exploration of novel alternatives (Aston-Jones and Cohen 2005). Decisions involving reward uncertainty often entail refinement of choice biases through exploration of different options to ascertain the likelihood of obtaining rewards and the long-term value associated with different courses of action. It is plausible, therefore, that NA may also aid in modifying choice biases during risk/reward decision making. Various noradrenergic medications are commonly prescribed for the treatment of attention deficit/hyperactivity disorder (ADHD) and other behavioral disorders (Michelson et al. 2003; Croxtall 2011; Bukstein and Head 2012). Although they differ in their precise actions, the noradrenaline transporter (NET) inhibitor atomoxetine and α2A agonist guanfacine decrease impulsive choice or impulsive action, respectively (Robinson et al. 2008; Fernando et al. 2012; Sun et al. 2012). Conversely, the α2 agonist clonidine and antagonist yohimbine both make rats more impulsive and reduce preference for larger, delayed rewards (van Gaalen et al. 2006; Sun et al. 2010), whereas the α2 antagonist yohimbine increases impulsive action and induces perseverative effects during delay discounting (Sun et al 2010; Schwager et al. 2014). Only a few studies have examined the role of NA in risky choice (Kim et al. 2012; Baarendse et al. 2013). In monkeys, guanfacine increased choice of larger, delayed rewards without influencing risk preference (Kim et al. 2012). Atomoxetine impaired performance on a rat gambling task only when co-administered with a dopamine reuptake inhibitor (Baarendse et al. 2013). However, in these studies, delivery of particular rewards was not only probabilistic but was also delayed for some time after animals made a choice or incorporated a time-out punishment. Thus, it is difficult to disambiguate from these findings how NA transmission may regulate cost/benefit judgments involving reward uncertainty concomitant with delays, which have been proposed to influence the subjective value of rewards (Green and Myerson 2004).

The present study was designed to explore the contribution of noradrenergic transmission to risk/reward decision making, using a probabilistic discounting task known to be dependent on dopaminergic transmission within prefrontal and ventral striatal networks (St. Onge et al. 2011; Stopper et al. 2013). In so doing, we targeted the α2 autoreceptor with the agonist clonidine and antagonist yohimbine, in an attempt to induce broad-based decreases and increases in noradrenergic tone. We also tested the effects of the selective α2A receptor agonist guanfacine. Unlike clonidine, which has pronounced autoreceptor effects that decrease NA release, guanfacine is believed to preferentially act on postsynaptic receptors in the PFC to enhance cognitive functioning. We also compared these α2 receptor compounds to the selective noradrenaline reuptake blocker atomoxetine. Both of these compounds have been prescribed for use with humans and have been shown to exert a variety of pro-cognitive effects in animals (Arnsten et al. 1988; Robinson et al. 2008; Newman et al. 2008; Arnsten and Jin 2012; Robinson 2012).

Materials and methods Animals Male Long-Evans rats (Charles River Laboratories, Montreal, Canada) weighing 250–300 g at the beginning of training were used. Upon arrival, rats were given 1 week to acclimatize to the colony room and had a subsequent week of food restriction before any operant training. Feeding occurred in the rats’ home cages at the end of the experimental day, wherein rats were provided a ration of 20–25 g of food that allowed for modest weight gain appropriate for proper development, and body weights were monitored daily. All testing was in accordance with the Canadian Council on Animal Care and the Animal Care Committee of the University of British Columbia. Apparatus Behavioral testing was conducted in 20 operant chambers (30.5×24×21 cm; Med Associates, St. Albans, VT, USA) enclosed in sound-attenuating boxes. The boxes were equipped with a fan that provided ventilation and masked extraneous noise. Each chamber was fitted with two retractable levers, one located on each side of a central food receptacle where sucrose food reinforcement (45 mg; Bioserv, Frenchtown, NJ, USA) was delivered by a pellet dispenser. The chambers were illuminated by a single 100-mA house light located in the top center of the wall opposite the levers. Four infrared photobeams were mounted on the side of each chamber. The number of photobeam breaks that occurred during a session measured locomotor activity. All experimental

Psychopharmacology

data was collected using personal computers connected to the chambers through an interface. Lever pressing training Before training on the full task, rats received 5–7 days of lever press training, in a manner identical to that used by St. Onge and Floresco (2009). Briefly, rats were trained to press each of the two levers on a FR-1 schedule and then received retractable lever training (90 trials per session), requiring them to press one of the two levers within 10 s of its insertion for reinforcement delivered with a 50 % probability. This procedure familiarized them with the association of lever pressing with food delivery as well as the probabilistic nature of the discounting task. Immediately after the last day of retractable lever training, rats that were to be trained on the discounting task were tested for their side bias, using procedures we have described elsewhere (Stopper et al. 2013). This procedure was used because our experience has shown that accounting for rats’ innate side bias when designating the lever to be associated with a larger reward reduces considerably the number of training sessions required to observe prominent discounting. This session resembled pretraining, except that both levers were inserted into the chamber simultaneously. On the first trial, a food pellet was delivered after responding on either lever. Upon subsequent lever insertion, food was delivered only if the rat responded on the lever opposite to the one chosen initially. If the rat chose the same lever as the initial choice, no food was delivered, and the house light was extinguished. This continued until the rat chose the lever opposite to the one chosen initially. After choosing both levers, a new trial commenced. Thus, a single trial of the side bias procedure consisted of at least one response on each lever. Rats received seven such trials, and typically required 13–15 responses to complete side bias testing. The lever (right or left) that a rat responded on first during the initial choice of a trial was recorded and counted toward its side bias. If the total number of responses on the left and right lever were comparable, the lever that a rat chose initially four or more times over seven total trials was considered its side bias. However, if a rat made a disproportionate number of responses on one lever over the entire session (i.e., >2:1 ratio for the total number of presses), that lever was considered its side bias. On the following day, rats commenced training on the decision-making task. Probabilistic discounting task The primary task used in these studies was the probabilistic discounting procedure that has also been described previously (St. Onge and Floresco 2009; Stopper et al. 2014), originally modified from that described by Cardinal and Howes (2005). Rats received daily sessions consisting of 72 trials, separated

into four blocks of 18 trials. The entire session took 48 min to complete, and animals were trained 5–7 days per week. Each session began in darkness with both levers retracted (the intertrial state). A trial began every 40 s with the house light illuminated and the insertion of one or both levers into the chamber. One lever was designated as the large/risky lever, the other the small/certain lever, which remained consistent throughout training. For each rat, the large/risky lever was set to be opposite of its side bias. If the rat did not respond within 10 s of lever presentation, the chamber was reset to the intertrial state until the next trial (omission). When a lever was chosen, both levers retracted. Choice of the small/certain lever always delivered one pellet with 100 % probability; choice of the large/risky lever delivered four pellets but with a particular probability that changed over the four trial blocks (as described below). After a response was made and food delivered, the house light remained on for another 4 s, after which the chamber reverted back to the intertrial state until the next trial. Multiple pellets were delivered 0.5 s apart. The four blocks consisted of eight forced-choice trials where only one lever was presented (four trials for each lever, randomized in pairs) permitting animals to learn the amount of food associated with each lever press and the respective probability of receiving reinforcement over each block. This was followed by 10 free-choice trials, where both levers were presented, and the animal had to decide whether to choose the small/certain or the large/risky lever. Drugs such as amphetamine or nicotine or inactivation of the mPFC induce differential effects on probabilistic discounting depending on whether reward probabilities were initially high and then decreased over a session or vice versa (St. Onge et al. 2010; St. Onge and Floresco 2010; Mendez et al. 2012). Accordingly, in each experiment, separate groups of rats were trained on variants where reward probability on the large/risky lever systematically decreased (100, 50, 25, 12.5 %) or increased (12.5, 25, 50, 100 %) across trial blocks. For each session and trial block, the probability of receiving the large reward was drawn from a set probability distribution. Therefore, on any given day, the probabilities during the forced- and free-choice portions of each block may have varied, but averaged across many training days, the actual probability experienced by the rat approximated the set value. In the three probabilistic trial blocks of this task, selection of the larger reward option carried with it an inherent Brisk^ of not obtaining any reward on a given trial. Squads of rats (typically 8–16) were trained on the task until as a group, they (1) chose the large/risky lever during the 100 % trial block on 90 % of trials, (2) chose the large/ risky lever during the 12.5 % trial block on fewer than 60 % of successive trials, and (3) demonstrated stable baseline levels of choice. Across all experiments, rats required 21–27 days of training before stable performance was displayed and drug testing commenced. Intraperitoneal injections were

Psychopharmacology

administered after a group of rats displayed stable patterns of choice for 3 consecutive days, assessed using a procedure described by St. Onge and Floresco (2009). In brief, data from three consecutive sessions were analyzed with a repeatedmeasure ANOVA with two within-subjects factors (day and trial block). If the effect of block was significant at the p0.1 level), animals were judged to have achieved stable baseline levels of choice behavior.

Reward magnitude discrimination As we have done in previous studies (Stopper et al. 2013; Stopper and Floresco 2014), we determined a priori that if any drug treatment reduced preference for the large/risky option on both variants of the probabilistic discounting task, we would assess how the most effective dose of that compound altered reward magnitude discrimination. This was done to confirm whether or not the reduced preference for the risky option was due to a general reduction in preference for larger rewards or other non-specific motivation or discrimination deficits. Separate groups of animals were trained and tested on an abbreviated task consisting of 48 trials divided into four blocks, each consisting of 2 forced- and 10 free-choice trials. As with the discounting task, choices were between a large (four pellets) option and a small (one pellet) option. However, the probability of reinforcement for both options was held constant at 100 % across blocks.

Drugs and injection protocol A within-subjects design was used for all drug tests. Each test consisted of a 2-day sequence for which animals received intraperitoneal vehicle (day 1) and then drug (day 2) injections 30 min before a daily training session. These repeated vehicle tests compensated for any drift in baseline levels of choice that may have occurred over training. Following a drug test day, rats were retrained until they again displayed stable patterns of choice, after which subsequent drug tests were administered (at least 4 days later). This procedure was repeated until rats in a group had received each of their designated treatments. The following drugs were used: α2 agonist clonidine (10, 30, and 100 μg/kg; Tocris Bioscience, Bristol, UK), α2Aselective agonist guanfacine (0.1, 0.3, and 1 mg/kg; Tocris Bioscience, Bristol, UK), α2 antagonist yohimbine (1 and 3 mg/kg; Tocris Bioscience, Bristol, UK), and NET inhibitor atomoxetine (0.3, 1, and 3 mg/kg; Tocris Bioscience, Bristol, UK). All drugs were dissolved in 0.9 % saline, with the exception of yohimbine, which was dissolved in sterile H2O. All drugs were sonicated until dissolved and protected from light. Drugs were injected intraperitoneally at a volume of 1 ml/kg.

Data analysis The primary dependent measure of interest was the proportion of choices directed toward the large reward lever for each block of free-choice trials, factoring out trial omissions. For each block, this was calculated by dividing the number of choices of the large reward lever by the total number of successful trials (i.e., those where the rat made a choice). Choice data were analyzed using three-way mixed-design ANOVAs, with treatment and trial block as two within-subjects factors and task variant (descending or ascending shifts in large/risky reward probabilities) as a between-subjects factor. Data from each vehicle treatment test day were averaged. The effect of trial block was always significant (p < 0.001) for the probabilistic discounting task and will not be discussed further. Whenever there was no significant main effect or interaction with the task factor, data from both groups were pooled for subsequent comparisons. Response latencies, locomotor activity (i.e., photobeam breaks), and the number of trial omissions were analyzed with one-way repeated-measures ANOVAs. Multiple comparisons were made using Dunnett’s test where appropriate. Whenever a significant main effect of a drug treatment on probabilistic discounting was observed, we conducted a supplementary analysis to further clarify whether changes in choice biases were due to alterations in sensitivity to reward (win-stay performance) or negative feedback (lose-shift performance) (Bari et al. 2010; Stopper and Floresco 2011). Animals’ choices during the task were analyzed according to the outcome of each preceding trial (reward or non-reward) and expressed as a ratio. The proportion of win-stay trials was calculated from the number of times a rat chose the large/risky lever after choosing the risky option on the preceding trial and obtaining the large reward (a win), divided by the total number of free-choice trials where the rat obtained the larger reward. Conversely, lose-shift performance was calculated from the number of times a rat shifted choice to the small/certain lever after choosing the risky option on the preceding trial and was not rewarded (a loss), divided by the total number of free- choice trials resulting in a loss. This analysis was conducted for all trials across the four blocks. We could not conduct a block× block analysis of these data because there were many instances where rats either did not select the large/risky lever or did not obtain the large reward at all during the latter blocks. Changes in win-stay performance were used as an index of reward sensitivity, whereas changes in lose-shift performance served as an index of negative feedback sensitivity (Stopper et al. 2013).

Results Clonidine Rats (n= 32, 16 trained on the descending variant) were trained on the probabilistic discounting for an average of

Psychopharmacology

25 days before the commencement of drug testing. The analysis of the choice data revealed no significant main effect of task variant (descending vs. ascending) or any interactions with the treatment factor (all F values

reward decision making.

Catecholamine transmission modulates numerous cognitive and reward-related processes that can subserve more complex functions such as cost/benefit dec...
1MB Sizes 3 Downloads 11 Views