Journal of Abnormal Psychology 2015, Vol. 124, No. 3, 697–708

© 2015 American Psychological Association 0021-843X/15/$12.00 http://dx.doi.org/10.1037/abn0000039

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Predicting Psychosis Across Diagnostic Boundaries: Behavioral and Computational Modeling Evidence for Impaired Reinforcement Learning in Schizophrenia and Bipolar Disorder With a History of Psychosis Gregory P. Strauss

Nicholas S. Thaler

State University of New York at Binghamton

University of Nevada, Las Vegas

Tatyana M. Matveeva

Sally J. Vogel, Griffin P. Sutton, Bern G. Lee, and Daniel N. Allen

University of Minnesota

University of Nevada, Las Vegas There is increasing evidence that schizophrenia (SZ) and bipolar disorder (BD) share a number of cognitive, neurobiological, and genetic markers. Shared features may be most prevalent among SZ and BD with a history of psychosis. This study extended this literature by examining reinforcement learning (RL) performance in individuals with SZ (n ⫽ 29), BD with a history of psychosis (BD⫹; n ⫽ 24), BD without a history of psychosis (BD⫺; n ⫽ 23), and healthy controls (HC; n ⫽ 24). RL was assessed through a probabilistic stimulus selection task with acquisition and test phases. Computational modeling evaluated competing accounts of the data. Each participant’s trial-by-trial decision-making behavior was fit to 3 computational models of RL: (a) a standard actor– critic model simulating pure basal ganglia— dependent learning, (b) a pure Q-learning model simulating action selection as a function of learned expected reward value, and (c) a hybrid model where an actor– critic is “augmented” by a Q-learning component, meant to capture the top-down influence of orbitofrontal cortex value representations on the striatum. The SZ group demonstrated greater reinforcement learning impairments at acquisition and test phases than the BD⫹, BD⫺, and HC groups. The BD⫹ and BD⫺ groups displayed comparable performance at acquisition and test phases. Collapsing across diagnostic categories, greater severity of current psychosis was associated with poorer acquisition of the most rewarding stimuli as well as poor go/no-go learning at test. Model fits revealed that reinforcement learning in SZ was best characterized by a pure actor– critic model where learning is driven by prediction error signaling alone. In contrast, BD⫺, BD⫹, and HC were best fit by a hybrid model where prediction errors are influenced by top-down expected value representations that guide decision making. These findings suggest that abnormalities in the reward system are more prominent in SZ than BD; however, current psychotic symptoms may be associated with reinforcement learning deficits regardless of a Diagnostic and Statistical Manual of Mental Disorders (5th Edition; American Psychiatric Association, 2013) diagnosis. Keywords: psychosis, reward, reinforcement learning, aberrant salience Supplemental materials: http://dx.doi.org/10.1037/abn0000039.supp

cal substrates (Yu et al., 2010), neuropsychological and social– cognitive impairments (Allen et al., 2010; Glahn et al., 2006; Thaler, Allen, Sutton, Vertinski, & Ringdahl, 2013; Thaler, Strauss, et al., 2013), genetics (Craddock, O’Donovan, & Owen, 2005, 2007; Potash et al., 2003), and response to psychotropic medication (Frye et al., 1998; Post et al., 1998). These commonalities may be greatest among people with SZ and individuals with BD who have psychotic features (up to 60% of people with BD; Keshavan et al., 2011; Seidman et al., 2002). Although reward system abnormalities have been identified in both SZ and BD (Linke, Sonnekes, & Wessa, 2011; Pizzagalli, Goetz, Ostacher, Iosifescu, & Perlis, 2008; Roiser et al., 2009; Singh et al., 2013), it is currently unclear whether these deficits are of similar magnitude and whether they are associated with psychosis in both disorders. Recent neurobiological theories of psychosis have proposed a central role for reward system dysfunction in the formation of delusions. Specifically, excessive dopamine

There has been increased interest in examining overlap between schizophrenia (SZ) and bipolar disorder (BD; Berrettini, 2003a, 2003b; Greene, 2007; Heckers, 2008; Hill, Harris, Herbener, Pavuluri, & Sweeney, 2008). Recent empirical evidence suggests similarities between these two disorders in terms of neuroanatomi-

This article was published Online First April 20, 2015. Gregory P. Strauss, Department of Psychology, State University of New York at Binghamton; Nicholas S. Thaler, Department of Psychology, University of Nevada, Las Vegas; Tatyana M. Matveeva, Department of Psychology, University of Minnesota; Sally J. Vogel, Griffin P. Sutton, Bern G. Lee, and Daniel N. Allen, Department of Psychology, University of Nevada, Las Vegas. Correspondence concerning this article should be addressed to Daniel N. Allen, Department of Psychology, University of Nevada, Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154. E-mail: [email protected] 697

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

698

STRAUSS ET AL.

cell firing may cause stimuli, thoughts, and percepts to acquire aberrant salience and promote atypical associative learning processes that result in psychotic symptoms (Corlett, Frith, & Fletcher, 2009; Corlett, Murray, et al., 2007). This theory is grounded in work on dopamine and reinforcement learning in behaving primates, which indicates that dopamine cell firing codes prediction error signals that are important for associative learning (Schultz, 1998). Prediction errors are mismatches between expected and obtained outcomes, which are accompanied by transient increases (positive prediction error) or decreases (negative prediction error) in dopamine cell firing in the striatum. These prediction errors serve as “teaching signals” to either repeat behaviors that result in better-than-expected outcomes or avoid behaviors that result in worse-than-expected outcomes. Two key findings support the role of aberrant prediction error signaling in psychosis. First, prediction error signaling has been shown to be disrupted by the administration of ketamine (Corlett, Honey, & Fletcher, 2007), a drug that induces transient psychotic states. Second, aberrant neurophysiological response during prediction error signaling has been associated with greater delusional severity in chronic SZ (Corlett, Murray, et al., 2007). One interpretation of these findings is that basal-ganglia driven prediction error signaling may influence how reinforcement outcomes are linked to actions and stimuli. Specifically, the fidelity of reward prediction error signals may be reduced for individuals experiencing psychosis, irrespective of a Diagnostic and Statistical Manual of Mental Disorders (5th Edition; DSM–5; American Psychiatric Association, 2013) diagnosis, because prediction errors are scaled on the absolute difference between tonic and phasic dopamine levels (Frank, 2008). Elevated tonic dopamine, which is associated with greater severity of psychosis, may dampen the magnitude of prediction error signals, thus reducing sensitivity to positive and negative outcomes and increasing stochasticity (i.e., random exploratory behavior). In turn, greater stochasticity may result in slower integration of reward statistics over time and consequently, an impaired learning rate. A second potential mechanism linking aberrant reinforcement learning and psychosis involves the prefrontal cortex. In particular, disrupted dopamine signaling within the prefrontal cortex may lead the prefrontal cortex to respond to physiological noise as if it were motivationally relevant, resulting in misattribution of salience to irrelevant environmental cues and a blunting of response to validly salient cues (Corlett et al., 2009). This interpretation is supported by studies implicating the prefrontal cortex, but not the striatum, in aberrant prediction error signaling and its association with psychosis severity (Corlett, Murray et al., 2007). Abnormalities in the orbitofrontal cortex, which have been reported in SZ (Nakamura et al., 2008) and BD (Stanfield et al., 2009), may be particularly important, resulting in difficulty representing the value of outcomes and associating them with stimuli. This may lead to misattribution of salience, causing stimuli of differing value to seem similar regardless of their reinforcing properties. Alternatively, reinforcement learning and psychosis may be linked through dysfunctional interactions between the basal ganglia and orbitofrontal cortex, which result in problems with both prediction error signaling and value representation. It is possible that aberrant basal ganglia based prediction error signaling drives attention toward irrelevant cues, causing new associative learning processes to take place and inappropriate updating of beliefs

related to those stimuli because of reduced top-down influence of the orbitofrontal cortex on value representations. Thus, reinforcement learning abnormalities may result from several processes, including impaired value representation, impaired prediction error signaling, and dysfunctional interactions between these two processes. It is currently unclear whether abnormalities in these aspects of reinforcement learning are common to both SZ and BD, and whether dysfunctional reinforcement learning processes are associated with greater severity of psychosis across diagnostic categories. In this study, we used a combination of behavioral performance and computational modeling of reinforcement learning to explore competing hypotheses regarding reinforcement learning deficits in individuals with schizophrenia (SZ), bipolar disorder with (BD⫹) or without (BD⫺) a history of psychosis, and healthy controls (HC). Each participant’s trial-by-trial decision-making behavior on a probabilistic reinforcement learning task (Frank, Seeberger, O’Reilly, 2004) was fit to three computational models of reinforcement learning: (a) a standard actor-critic model simulating pure basal ganglia— dependent learning driven by prediction errors, (b) a pure Q-learning model simulating action selection as a function of learned expected reward value and contributions of the orbitofrontal cortex, and (c) a hybrid model where an actor– critic is “augmented” by a Q-learning component, meant to capture the top-down influence of orbitofrontal cortex value representations on the striatum and prediction error signaling. These computational models evaluate potential mechanisms behind impaired behavioral reinforcement learning performance (Montague, 1999; Sutton & Barto, 1998). The first computational model, the standard “actor– critic” framework, is used to evaluate contributions of basal ganglia driven prediction errors on decision making (Joel, Niv, & Ruppin, 2002). In this model, the critic evaluates the reward value of particular states, and the actor selects a response in relation to learned stimulus response weights. Mismatches between observed and expected outcomes (prediction errors) are used to adjust learning in the critic itself to improve future accuracy of reward prediction, as well as to update response weights in the actor. Action selection in the actor– critic framework therefore is driven on the basis of prediction error signaling alone, independently from actual outcome value representations (Joel et al., 2002). In contrast, the Q-learning model directly considers reward value of each action separately (Sutton & Barto, 1998); there is consistent evidence that such value representations are driven by the orbitofrontal cortex (Frank & Claus, 2006; Furuyashiki & Gallagher, 2007; Plassmann, O’Doherty, & Rangel, 2010; Roesch & Olson, 2007). Expected value of each action is learned by comparing possible values of potential decisions and selecting the one with the highest predicted expected value. Prediction errors are computed in relation to expected quality (“Q value”) of the action selected and these prediction errors are used to directly adjust expected action value. Thus, while the actor– critic guides action selection solely on the basis of prediction errors relative to a state without regard to the value of separate actions, the Q-learning model relies on specific action value representations to guide behavior. The hybrid model examines whether learning driven by prediction errors in the striatum is impacted by expected action value representations formalized as inputs from the Q-learning model (orbitofrontal cortex). When evaluated together, these three models can be used

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

REWARD AND PSYCHOSIS

in conjunction with behavioral data to evaluate competing hypotheses regarding processes associated with reinforcement learning in BD and SZ. We hypothesized that CN and BD⫺ would evidence better behavioral performance at acquisition and test phases compared with SZ and BP⫹ and that BP⫹ and SZ would display comparable impairments at acquisition and test phases. Furthermore, based on prior work (Gold et al., 2012) we predicted that a hybrid model would best fit the behavioral performance of CN, as well as BD⫺. However, the pure actor– critic model would best fit SZ and BD⫹ given evidence for an association between psychosis and poor prediction error signaling (Corlett, Murray, et al., 2007), as well as prefrontal cortex driven deficits in value representation (Strauss, Robinson, et al., 2011).

699

interview to rule out lifetime history of major depressive disorder, BD, SZ, posttraumatic stress disorder, and psychosis. Participants denied current substance abuse or substance dependence in the last month, history of significant neurological conditions, or family history of affective or psychotic disorder based on a structured interview. See Table 1 for demographic information. Groups did not differ in ethnicity. The SZ group was significantly older, had fewer years of education and lower estimated IQ (Ringe, Saine, Lacritz, Hynan, & Cullum, 2002; Wechsler, 1997), and had more male participants than the other groups. Symptom ratings were obtained with the Scale for Assessment of Positive Symptoms (SAPS; Andreasen, 1984) and Negative Symptoms (SANS; Andreasen, 1984). All participants provided written informed consent for a protocol approved by a university institutional review board.

Method Participants

Probabilistic Selection Task

Participants included 47 euthymic patients with BD, 29 patients with SZ, and 24 healthy controls (HC) with no personal or family history of a mood or psychotic disorder. BD and SZ participants were recruited through advertising at local community health centers and college campuses, were clinically stable during evaluation, and met DSM–IV–TR criteria for SZ or BD. Twenty-four (51%) BD participants met criteria for bipolar disorder with psychotic features (BD⫹) whereas the remaining 23 met criteria for bipolar disorder without psychotic features (BD⫺). Diagnoses were determined via the Structured Clinical Interview for DSM–IV (First, Spitzer, Gibbon, & Williams 2001) and available medical records. Participants were classified as BD⫹ if previously diagnosed with BD and experienced delusions or hallucinations during at least one mood episode as determined by the SCID-IV. HC participants were recruited through online advertisements and advertising on local college campuses. All HC underwent a SCID-IV

Participants were administered a computerized probabilistic selection task (PST; Frank et al., 2004) that had acquisition and test phases. During acquisition, participants were required to choose between three stimulus pairs (AB, CD, EF) based on predetermined probabilistic feedback (80:20%, 70:30%, 60:40%). Stimuli were the Japanese hiragana characters used in the original PST (Frank et al., 2004). Feedback was provided after each trial to inform participants whether the stimulus they selected was correct or incorrect. Acquisition blocks consisted of 60 trials, and stimulus pairs were presented in pseudorandom order. The acquisition phase was terminated when participants either achieved a criterion as defined by 65% correct in the AB (80:20) condition, 60% correct in the CD (70:30) condition, and 50% correct in the EF (60:40) condition or after 360 trials were completed. This criterion was intended to prevent overlearning of the contingencies prior to the test phase (Waltz et al., 2007).

Table 1 Demographic and Clinical Characteristics of Patients and Healthy Control Variable

BD⫹ (n ⫽ 24)

BD⫺ (n ⫽ 23)

SZ (n ⫽ 29)

HC (n ⫽ 24)

p

Age Education Estimated IQ Male (%) Ethnicity White (%) African American (%) Hispanic/Latino (%) Other (%) Medication Antipsychotic (%) Anticonvulsant (%) Antidepressant (%) No Medication (%) Symptoms ratings SAPS SANS

37.6 (14.3) 13.3 (2.1) 98.1 (12.7) 25.0

33.9 (12.3) 14.3 (1.7) 102.8 (11.9) 39.1

47.2 (11.2) 11.4 (2.1) 84.6 (12.7) 65.5

36.1 (13.4) 14.0 (2.2) 102.9 (12.8) 45.8

⬍ .01 ⬍ .001 ⬍ .001 ⫽ .03 ⫽ .21

BD⫹, BD⫺, HC ⬍ SZ SZ ⬍ BD⫹, BD⫺, HC SZ ⬍ BD⫹, BD⫺HC

75.0 12.5 12.5 0.0

87.0 4.3 0.0 8.7

58.6 24.1 6.9 10.3

62.5 12.5 8.3 16.7

58.3 54.2 41.7 12.5

34.8 30.4 39.1 34.8

96.0 24.0 48.0 0.0

7.4 (9.1) 22.2 (17.0)

4.7 (6.5) 14.4 (13.8)

23.4 (15.3) 23.1 (14.2)

⬍ .001 ⬍ .001

HC, BD⫺ ⬍ BD⫹ ⬍ SZ HC ⬍ BD⫺ ⬍ SZ, BD⫹

1.2 (4.2) 4.1 (8.1)

LSD

Note. BD⫹ ⫽ bipolar disorder with psychotic features; BD⫺ ⫽ BD without psychotic features; SZ ⫽ schizophrenia; HC ⫽ healthy control; SAPS ⫽ Scale for the Assessment of Positive Symptoms; SANS ⫽ Scale for the Assessment of Negative Symptoms; LSD ⫽ least significant difference. Prescribed antipsychotic medications included olanzapine (n ⫽ 13), aripiprazole (n ⫽ 12), quetiapine (n ⫽ 11), risperidone (n ⫽ 9), ziprasidone (n ⫽ 8), clozapine (n ⫽ 5), paliperidone (n ⫽ 3), haloperidol (n ⫽ 2), fluphenazine (n ⫽ 2), asenapine (n ⫽ 2), and chlorpromazine (n ⫽ 1).

700

STRAUSS ET AL.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Following acquisition, participants completed a transfer phase involving 60 combinations of paired stimuli, of which 12 consisted of prior pairings (AB, CD, EF) and 48 involved novel combinations. No feedback was provided during this phase. As in previous studies (Waltz, Frank, Robinson, & Gold, 2007), we analyzed acquisition of previously learned test pairings (AB, CD, EF) and transfer performance, as measured by responses to all novel pairings involving either an A or a B stimulus. Selection of the A stimulus over any other stimulus presented in a novel pairing served as a measure of go learning. Avoidance of the B stimulus when it was paired in a novel combination served as a measure of no-go learning.

Statistical Analysis Analyses first examined group differences in behavioral data among BD⫹, BD⫺, SZ, and HC groups. To compare the acquisition of contingencies among groups, a Group ⫻ Reward Contingency two-way mixed-model analysis of covariance (ANCOVA) controlling for age was performed, followed by post hoc tests examining participants’ proportions of correct responses across the blocks. Success of learning probabilistic contingencies was assessed during the postacquisition phase. As only four trials per learned contingency pairing were available for analysis, likelihood ratios were used rather than parametric statistics to analyze proportion of participants who responded correctly to 0%, 25%, 50%, 75%, and 100% of the pairings. Transfer performance was analyzed by examining participants’ cumulative test scores for the four novel pairs involving A (go; AC, AD, AE, AF) and the four novel pairs involving B (no-go; BC, BD, BE, BF) using one-way analyses of variance (ANOVAs) and orthogonal contrasts anticipating SZ ⬍ BD⫹ ⬍ BD⫺ ⬍ HC. We analyzed transfer performance first with all participants and a second time including only participants who successfully learned the AB contingency as defined as passing 75% (three out of four) of previously learned test trials during the postacquisition phase (Waltz et al., 2007). Correlations were used to examine relationships between PST performance and the SANS and SAPS. To evaluate the effects of antipsychotic medications, we categorized participants into low- or high-potency D2 blocking antipsychotic groups (Strauss, Frank, et al., 2011) and repeated-measures ANOVA evaluated differences in acquisition and test phase performance.

Computational Model All three computational models were fitted to each group’s mean performance in the acquisition and test phases: (a) a pure actor-critic model; (b) a pure Q-learning model; (c) a hybrid model where the actor critic is augmented by Q-learning (Sutton & Barto, 1998). In addition, we fitted each model to participants’ trial-by-trial performance. The advantage of this approach was greater specificity in capturing each individual’s sequence of choices at every trial and describing each participant’s behavior above chance. The goal was to uncover the free parameters that maximize the likelihood of producing each participant’s choice sequence throughout the task. Following prior work by Gold et al. (2012), we used a standard likelihood procedure to fit all models to individuals’ data. We used the Akaike information criterion (AIC) for complexity

penalization. In accordance with prior work (Gold et al., 2012), we corrected for the number of additional parameters in each model (three for the actor– critic model, two for the Q-learning model, and five for the hybrid model).

Actor–Critic (Basal Ganglia) Model The actor– critic model calculates the expected value V(t) of a state context at each Trial t as a function of predictions errors. A state here is defined by a pair of stimuli appearing together. Accordingly, each state predicts the likelihood that a positive or negative outcome (correct or incorrect) will follow. Values are updated on every trial using a simple delta rule: V(s, t ⫹ 1) ⫽ V(s, t) ⫹ ␣C * ⌬(t), where ␣C is the learning rate for the critic reflecting the extent to which values are updated at each trial, and ⌬(t) stands for the reward prediction error and captures the difference between the expected value V of the present state s and the outcome experienced after action selection. This discrepancy between expected and actual outcome is calculated using ⌬(t) ⫽ outcome(t) ⫺ V(s, t). In the actor, ⌬(t) influences the updating of the weight of a stimulus–response pair following action selection. Thus, once the outcome of an action has been observed, the stimulus–response weight for that action is adjusted using w(s, a, t ⫹ 1) ⫽ w(s, a, t) ⫹ ¥ A * ⌬(t), where w(s, a, t) is the weight associated with the stimulus–response pair for the choice made at the current trial, ⌬(t) is the prediction error rendered by that choice, and ¥A is the learning rate for the actor and reflects how rapidly the weights are updated. The learning rates for the actor and the critic have values within the range [0,1], with greater values indicating faster learning. Following convention implemented in our prior modeling work (Gold et al., 2012), we normalize the weights in the actor via w(s, a1, t) ⫽ w(s, a1, t) ⁄ (|w(s, a1, t)| ⫹ |w(s, a2, t)|). In this instance, the weights w(s, a1, t) and w(s, a2, t) can each be normalized by the sum of their absolute values to avoid unbound growth. Finally, action selection is implemented via the softmax function, P(a1, t) ⫽ exp(w(s, a1, t) ⁄ ␤) ⁄ (exp(w(s, a1, t) ⁄ ␤) ⫹ exp(w(s, a2, t)W ⁄ ␤)), where a1 and a2 are Actions 1 and 2 taken to select Stimuli 1 and 2, respectively; P(a1, t) is the probability of taking Action 1 at Trial t to select Stimulus 1; and the parameter ␤ is the softmax temperature and reflects the degree of stochasticity (noise, exploration) of the softmax function. The beta (␤; or temperature) parameter of the softmax function affects the probability with which actions are selected among alternatives. The softmax temperature therefore reflects the degree of stochasticity or exploration. High beta values indicate that all

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

REWARD AND PSYCHOSIS

actions have similar likelihood of being selected, whereas low values reflect lower degrees of exploration and higher consistency in choosing a (previously rewarded) action. It then naturally follows that as the value of beta approaches 0, the probability of selecting an action previously associated with a reward approaches 1. In this sense, highly stochastic choice behavior even after multiple presentations of a stimulus associated with high likelihood of reward might reflect poor integration of reward contingencies and, as a result, failure to adopt an optimal choice strategy. In the actor– critic model described here, action selection is impacted by learning rates in the actor (¥A) and critic (␣C). The learning rate in the actor reflects how rapidly weights associated with a stimulus–response pair are updated for each trial. In the critic, the learning rate determines the updating of values attached to state contexts. Discrepancies between expected and experienced outcomes affect the values of states, which are updated after action selection in the critic. These differences between expected and actual outcomes are captured by the reward prediction error term ⌬(t). The ⌬(t) parameter, in turn, influences the updating of the weights associated with stimulus–response weights in the actor. It is important to note that this model is not sensitive to the actual values of outcomes and uses solely prediction errors to modify the probability of action selection in the future.

tributions from both model components using a mixing parameter, as follows: M(s, a1, t) ⫽ [(1 ⫺ c) * w(s, a, t) ⫹ c * Q(a, t)] : P(a1, t) ⫽ e(M(s, a1, t)␤) ⁄ (e(M(s, a1, t) ⁄ ␤) ⫹ e(M(s, a2, t) ⁄ ␤)), where 0 ⱖ c ⱖ 1 is a mixing parameter that controls the relative contributions from the actor– critic and Q-learning models to overall learning. Effectively, when c ⫽ 0, the model is reduced to an actor– critic and working memory influences on basal ganglia-driven learning are not present. Alternatively, when c ⫽ 1, the model is driven by Q-learning mechanisms. Because we have normalized weights to avoid an increase in their value without a natural bound and Q-values and weights are within the range [⫺1, 1], the basal ganglia and orbitofrontal cortex contribute equally to learning when c ⫽ 0.5. The mixing parameter (c) determines the relative impact of the actor– critic and Q-learning models on action selection. The values of this parameter range between 0 and 1. When c ⫽ 0, the model is reduced to a pure actor– critic. Alternatively, when c ⫽ 1, the model is reduced to a Q-learning model. Values of 0.5 reflect equal model contributions.

Q-learning (Orbitofrontal Cortex) Model In contrast to the actor– critic model, the Q-learning model uses actual state action values to guide future choice. In this model, action values are learned directly as a function of the discrepancy (prediction error) between the expected and experienced outcome of taking an action at each trial. Action values are thereby calculated as follows: Q(a, t ⫹ 1) ⫽ Q(a, t) ⫹ ␥O * (outcome(t) ⫺ Q(a, t)), where ␥O denotes the learning rate for the orbitofrontal cortex and reflects the rapidity with which action values are updated. In the Q-learning model, the learning rate (␥O) governs how rapidly the value of each action is updated on that trial. In this framework, action values are learned separately and are adjusted as a function of the discrepancy between expected and experienced outcome following action selection. It is important that only the value of the action selected at the current trial is updated. This mechanism allows for faster learning of separate action values following each trial. Actions are selected using the same softmax function (␤) described in the actor– critic model and reflect the degree of stochasticity or exploration (high values indicate that all actions have similar likelihood of being selected; low values reflect lower degrees of exploration and higher consistency).

Hybrid Actor–Critic and Q-Learning Model To capture the distinct learning mechanisms of both models, we also tested a hybrid actor– critic Q-learning model proposed in our prior work (Gold et al., 2012). In this framework, the actor– critic component represents prediction error driven learning implemented through dopaminergic signaling in the basal ganglia but is additionally influenced by top-down contributions from working memory, reflected by the Q-learning model. Action selection is implemented through a softmax function, which incorporates con-

701

Results Behavioral Findings For the acquisition phase, two-way ANCOVA found a main effect for group, F(3, 95) ⫽ 8.4, p ⬍ .01, and reward contingency, F(2, 190) ⫽ 4.2, p ⬍ .05 (see Figure 1, panel A). There was no significant interaction effect, F(4, 192) ⫽ 1.7, p ⫽ .12. Post hoc tests for group indicated that the BD⫹ and BD⫺ groups performed significantly worse than controls across the stimulus pairs but better than the SZ group.1 Given that performance of the SZ group was near 50% in each condition, we evaluated whether low performance could be because of higher rates of switching behavior (i.e., selecting a different stimulus from the prior trial). Results were not consistent with this hypothesis, as SZ did not switch more than other groups and rate of switching was near chance (see supplemental materials). Postacquisition transfer phase was evaluated with likelihood ratios (Figure 1, panel B.). No differences emerged between BD ⫹ and BD⫺. SZ had a significantly greater proportion of low scores on the AB stimulus pair compared with BD⫹, Kruskal’s ␶ ⫽ .19, p ⬍ .05; BD⫺, Kruskal’s ␶ ⫽ .20, p ⬍ .05; and HC, Kruskal’s ␶ ⫽ .22, p ⬍ .05. SZ also had a greater proportion of low scores on the CD pair compared to HC, Kruskal’s ␶ ⫽ .20, p ⬍ .05. BD⫹ had a greater proportion of low scores on the EF condition compared with HC, Kruskal’s ␶ ⫽ .26, p ⬍ .05, whereas BD⫺ did not. SZ had a trend toward greater proportion of low scores on the EF pair compared with HC, Kruskal’s ␶ ⫽ .16, p ⬍ .07. Similar results emerged at acquisition and transfer when BD⫺ and BD⫹ were combined to form a single group (see supplemental materials). 1 Group differences in acquisition remained when age and full-scale IQ were entered as covariates, where SZ had lower performance than the other three groups (see supplementary materials).

STRAUSS ET AL.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

702

Figure 1. Acquisition of probabilistic contingencies (top panel), postacquisition transference of contingencies (middle panel) l and go versus no-go at postacquisition transfer (bottom panel). BD⫹ ⫽ bipolar disorder with psychotic features; BD⫺ ⫽ BD without psychotic features; SZ ⫽ schizophrenia; HC ⫽ healthy control; A: Acquisition learning of stimulus pairs in all participants; B: Accuracy for stimulus pairs during test phase in all participants; c: go versus no-go performance at test phase for all participants; D: go versus no-go performance at test phase for participants meeting learning criterion.

Orthogonal contrasts set at SZ ⬍ BD⫹ ⬍ BD⫺ ⬍ HC indicated that SZ performed significantly poorer than the HC and BD for both the go condition, t(96) ⫽ 2.1, p ⬍ .05, and the no-go condition, t(96) ⫽ 2.4, p ⬍ .05 (see Figure 1, panel C). No differences emerged among the other groups. Go/no-go performance in the test phase was also evaluated for participants who successfully met learning criteria. Twenty BD⫹, 15 BD⫺, 15 SZ, and 21 NC met criteria. Orthogonal contrasts were no longer significant, and go/no-go performance was improved after participants who met test-phase criterion were separately analyzed (see Figure 1): go SZ versus others, t(67) ⫽ ⫺1.55, p ⫽ .13; no-go SZ versus others, t(67) ⫽ ⫺1.52, p ⫽ .14.

Computational Modeling Results The goal of the modeling was to quantitatively fit behavioral acquisition and test-phase performance using each of the three models. Figure 2, Panels A–C show that the hybrid model was able to provide best fits to and reproduce key features of the behavioral data for HC, BD⫹ and BD⫺ at acquisition. The actor– critic model was insufficient for these groups because it failed to capture robust

effects of learning demonstrated in the postacquisition transfer phase. Independent samples t tests did not find significant differences between model and group performance on any of the pairs. A pure Q-learning model failed to discriminate between stimuli on the basis of frequency of positive or negative feedback and thus could not adequately replicate behavioral differences in choosing stimuli based on their likelihood of leading to a desirable outcome. The actor– critic model provided best fits to behavioral data in SZ, as shown in Figure 2, Panel D. It is important to note that the actor– critic model, but not the Q-learning or hybrid models, could best reproduce observed failures to sufficiently differentiate between stimuli on the basis of individual outcome values and poor learning demonstrated during the postacquisition phase. A one-way ANOVA on model parameters for the acquisition phase revealed a significantly higher value of the beta parameter in the actor critic model compared with the hybrid model (p ⬍ .001) and significantly lower values for learning rate parameters in the actor critic in comparison to the hybrid model (p ⬍ .001). Independent samples t tests indicated that behavioral and modeled data at acquisition did not significantly differ from observed data on the

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

REWARD AND PSYCHOSIS

703

Figure 2. Observed and model simulation results at acquisition for patient and control groups (mean performance). BD⫹ ⫽ bipolar disorder with psychotic features; BD⫺ ⫽ BD without psychotic features; SZ ⫽ schizophrenia; HC ⫽ healthy control; A: CN behavioral and simulated model data; B: BD⫺ behavioral and simulated model data; C: BD ⫹ behavioral and simulated model data; D: SZ behavioral and simulated model data.

AB, CD, or EF pairs for controls, F ⬍ 0.79, p ⬎ .21; BP⫺, F ⬍0.61, p ⬎ .1; BP⫹, F ⬍ 0.35, p ⬎ .06; and SZ, F ⬍ 0.61, p ⬎ .19. These findings are consistent with the hypothesis that impaired learning of reinforcement contingencies in SZ was because of impairments in integrating action– outcome associations to efficiently guide future behavior on the basis of recent experience. Higher stochasticity (i.e., degree of exploration), as indicated by higher values of the beta parameter, in addition to poor top-down control in the absence of working memory (Q-learning) contributions to learning, could account for near-chance performance of SZ during the training phase. For the transfer phase, a one-way ANOVA revealed significant differences in the beta parameter (p ⬍ .001), with higher values of beta in the actor critic model when compared with the hybrid model, as well as significantly lower learning rates in the actor critic when compared with the hybrid model (p ⬍ .001). In addition, HC, BD⫹, and BD⫺ behavior at this phase showed a further decrease in exploration, indicated by a slightly lower value of the beta parameter. This could account for the more robust choice of rewarding stimuli observed in these three groups following training. Decreased exploration during the transfer phase could reflect effects of learning of reinforcement contingencies during the acquisition phase and efficient use of this knowledge in guid-

ing choice. At the test phase, independent samples t tests indicated that observed behavioral and modeled data did not significantly differ on the AB, CD, or EF pairs for controls, F ⬍ 3.29, p ⬎ .18; BP⫺, F ⬍2.9, p ⬎ .17; BP⫹, F ⬍ 4.4, p ⬎ .07; and SZ, F ⬍ 1.06, p ⬎ .17 (Figure 3, Panels A–D). It is important that model fits to individuals’ trial-by-trial performance and to group performance means, respectively, rendered similar results: behavior in the HC and both BD groups were best captured by the hybrid orbitofrontal cortex– basal ganglia model, whereas the SZ group was best captured by a reduction of this model to a pure actor– critic with values of the mixing parameter, c ⫽ 0 (see Table 2). Notably, the hybrid model provided best fits to participants’ data in both BD groups and in the CN group. We subsequently used the fitted parameters of the hybrid and actor– critic models, respectively, to show that the models could successfully reproduce critical features of the observed behavior. An ANOVA on the fitted parameters of the hybrid model revealed a main effect of group for the mixing parameter c, F(3, 95) ⫽ 5.23, p ⬍ .001, and the softmax temperature parameter beta, F(3, 95) ⫽ 5.45, p ⫽ .027. Post hoc contrasts revealed a significantly lower c value for the SZ group compared with BD⫹, BD⫺, and CN groups, and a significantly higher beta value in the SZ compared with all other groups (ps ⬍ .001). The BD⫹, BD⫺, and

STRAUSS ET AL.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

704

Figure 3. Observed and model simulation results at the test phase for patient and control groups (mean performance). BD⫹ ⫽ bipolar disorder with psychotic features; BD⫺ ⫽ BD without psychotic features; SZ ⫽ schizophrenia; HC ⫽ healthy control; A: CN behavioral and simulated model data; B: BD⫺ behavioral and simulated model data; C: BD ⫹ behavioral and simulated model data; D: SZ behavioral and simulated model data.

HC groups did not differ on beta or the mixing parameter. These results are consistent with the interpretation that impairments in top-down control on striatal prediction error signaling in addition to increased exploratory behavior (as indicated by high values of the beta parameter) could result in poor integration of reward

contingencies and unstable reward value representation in SZ. Behaviorally, this could lead to an inability to make optimal choices based on previous experience with stimuli associated with varying degrees of reward (see Table 2).

Symptom Correlations Table 2 Measures of Fit for Each of the Three Models Group

Measure

Actor critic model

Q-learning model

Hybrid model

BP⫹

Pseudo-r2 AIC Pseudo-r2 AIC Pseudo-r2 AIC Pseudo-r2 AIC

0.320 (0.035) 5,531 0.319 (0.035) 5,529 0.334 (0.036) 5,534 0.258 (0.035) 5,858

0.301 (0.031) 5,650 0.309 (0.032) 5,655 0.310 (0.033) 5,668 0.192 (0.029) 6,268

0.343 (0.037) 5,455 0.324 (0.034) 5,423 0.356 (0.038) 5,467 0.189 (0.025) 6,225

BP⫺ HC SZ

Note. Values include mean pseudo-r2 and Akaike information criterion (AIC). Values in parentheses indicate standard error pseudo r2. BP⫹ ⫽ bipolar disorder with psychotic features; BP⫺ ⫽ BD without psychotic features; HC ⫽ healthy control; SZ ⫽ schizophrenia.

Table 3 presents Pearson correlations between the clinical groups’ positive and negative symptoms and PST behavioral performance. Positive symptoms negatively correlated with performance on the AB condition at acquisition and transfer, acquisition performance on the CD condition, and go/no-go learning at test. Negative symptoms did not correlate with any variables. Figure 4 presents scatter plots for SAPS and SANS data for the combined patient groups.

Antipsychotic Medications and Behavioral Performance Antipsychotic medication effects were analyzed by categorizing participants into low- and high-potency D2 blocking antipsychotics. Low-potency antipsychotics included clozapine, quetiapine,

REWARD AND PSYCHOSIS

Table 3 Correlations Between Behavioral Task Performance, Symptoms, and Haloperidol Equivalent Dosage Collapsed Across the Bipolar Disorder and Schizophrenia Groups

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Variable Early acquisition AB CD EF Postacquisition AB CD EF Reinforcement Go No-go

SAPS

SANS

⫺.31ⴱ ⫺.24ⴱ ⫺.10

.03 ⫺.03 ⫺.18

⫺.28ⴱ ⫺.18 .05

⫺.22 ⫺.02 .01

⫺.34ⴱ ⫺.25ⴱ

⫺.19 ⫺.20

Note. AB, CD, and EF are the three stimulus pairs. SAPS ⫽ Scale for the Assessment of Positive Symptoms; SANS ⫽ Scale for the Assessment of Negative Symptoms; PST ⫽ probabalistic selection task. Haloperidol units available for 37 participants. ⴱ p ⬍ .05.

and olanzapine and high potency drugs included aripriprazole, haloperidol, risperidone, fluphenazine, and ziprasidone. Participants were only included if they were prescribed one antipsychotic. ANOVAs revealed no differences between the groups on any PST variables (see supplemental materials).

Discussion Findings provide important insight into the nature of reinforcement learning in BD and SZ. If reinforcement learning is an important predictor of psychosis across diagnostic boundaries, one might expect two behavioral findings to emerge. First, BD⫹ and BD⫺ groups would be expected to significantly differ in behavioral performance, with BD⫹ displaying greater impairments at both test and acquisition. The observed results were inconsistent with this notion because BD⫹ and BD⫺ did not significantly differ in behavioral performance at acquisition or test phases. Second, one would expect BD⫹ and SZ to display similar patterns of behavioral performance. However, this was not observed: SZ generally performed more poorly than HC, BD⫺, and BD⫹ at acquisition and test phases. That BD⫹ and SZ did not perform similarly at acquisition or test phases appears to argue against the notion that reward dysfunction is associated with psychosis history across diagnostic categories. However, we did observe significant correlations between current severity of psychosis and performance on the AB pair at both acquisition and test, CD pair at acquisition, and go/no-go test variables. The significant correlation with psychosis in the face of nonsignificant group differences between SZ and BD⫹ may be informative. It has been suggested that current psychotic symptoms are associated with elevated tonic dopamine levels (Grace, 1991; Laruelle & Abi-Dargham, 1999), which may reduce fidelity of reward prediction errors, causing them to be scaled on the absolute difference between tonic and phasic dopamine levels. It is possible that BD patients with a history of psychosis fail to show the same pattern of reinforcement learning abnormalities as SZ because they do not have elevated tonic dopamine. Perhaps only currently psychotic patients, irre-

705

spective of diagnosis, manifest such abnormalities; to our knowledge, no studies have explored whether BD with a history of psychosis is associated with increased tonic dopamine. As in prior studies, the most profound reinforcement learning impairments were found among individuals with SZ (Waltz et al., 2007; Waltz, Frank, Wiecki, & Gold, 2011; Waltz & Gold, 2007; Waltz et al., 2013, 2010). We observed reduced performance at acquisition for the AB, CD, and EF stimulus pairs. However, contrary to prior studies indicating a selective go learning impairment at the test phase in SZ (Gold et al., 2013; Strauss, Frank, et al., 2011; Waltz et al., 2011), we observed similarly poor performance in both go and no-go conditions. Contrary to expectations, we did not find significant correlations between negative symptoms that were previously identified in SZ samples (Gold et al., 2012; Strauss et al., 2011; Waltz et al., 2007). Inconsistencies among findings may reflect differences in sampling because prior studies were enriched for negative symptoms and included only participants with SZ (Gold et al., 2013; Strauss, Frank, et al., 2011; Waltz et al., 2007), whereas the current sample had low levels of negative symptoms in the SZ group and also included individuals with BD who had a low severity of negative symptoms on average. Computational modeling provided a means of interpreting patterns of behavioral deficits using a method that is grounded in converging evidence from theoretical, cognitive, and neuroscientific models of dopamine and reinforcement learning (Sutton & Barto, 1998). The hybrid model was able to provide best fits to data at acquisition and test for HC, BD⫹ and BD⫺ and reproduced key features of the behavioral data, suggesting that learning in these groups was driven by prediction errors in the striatum that were influenced by top-down expected value representations from the orbitofrontal cortex. In contrast, a pure actor– critic model provided best fits for acquisition and test data in SZ. The actor– critic model that best fit the SZ data differed from the hybrid model by having a higher beta parameter and lower learning rate. SZ therefore differed not only with regard to which model best fit behavioral data, but also the parameters within the model. These findings imply that individuals with SZ use different neural processes and cognitive strategies when approaching reinforcement learning than controls. To the extent that these computational models are consistent with current knowledge about contributions of the orbitofrontal cortex and striatal prediction error signaling during reinforcement learning, results can guide interpretation regarding factors contributing to reinforcement learning performance and its association with psychosis. Psychosis is known to be associated with high tonic dopamine, which may dampen the magnitude of the prediction error signal, serving to reduce sensitivity to positive and negative outcomes and increase exploratory behavior. Increases in stochasticity may result in a slower integration of reward statistics over time and thus an impaired rate of learning. Evidence for reduced prediction error signaling in unmedicated SZ patients provides some suggestion that such deficits may not be a function of antipsychotics (Schlagenhauf et al., 2014). Results are also consistent with the notion that orbitofrontal cortex dysfunction contributes to poor reinforcement learning, leading to difficulty predicting the expected value of stimuli and associating them with actions, perhaps

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

706

STRAUSS ET AL.

Figure 4. Scatter plots depicting correlations between go and no-go test phase performance and symptoms in patients. SAPS ⫽ Scale for the Assessment of Positive Symptoms total score; SANS ⫽ Scale for the Assessment of Negative Symptoms total score. See the online article for the color version of this figure.

causing stimuli of differing value to seem similar regardless of their reinforcing properties. Thus, similar to neuroimaging studies (Corlett, Murray, et al., 2007), our behavioral and computational modeling findings are consistent with the idea that psychosis is related to a combination of reward related dysfunctions that stem from abnormalities in prediction error signaling and value representation. Certain limitations should be considered when interpreting our findings. First, sample sizes were small for the individual BD⫹, BD⫺, and SZ groups, thereby limiting power to observe group behavioral differences. Second, SZ patients performed near chance during acquisition, thereby preventing a full test of transfer performance. Secondary analyses examining participants who learned the reinforcement contingencies failed to confirm some of the primary findings. These analyses should be interpreted with caution since they are impacted by low power. Several factors may have contributed to poor acquisition in the SZ group. One factor is poorer general cognition. Although analyses covarying IQ indicated that group differences remained, general cognitive impairments likely had a greater impact on acquisition performance in the SZ than BD groups. Another factor relates to differences in switching rate between the groups. Contrary to prior studies indicating that SZ had poor performance because of greater rates of switching behavior (e.g., Waltz et al., 2013), our SZ patients did not exhibit higher rates of switching than HC or BD groups. Rather, it appears that SZ have a greater propensity toward random responding. This

finding converges with the computational modeling results indicating that SZ were best fit by a high beta parameter indicating greater stochasticity of action selection. One explanation for this pattern of behavior is that SZ had more trouble encoding the hiragana characters than the other groups. Greater difficulty with learning hiragana characters relative to other stimuli (e.g., clip art) has been demonstrated in prior SZ studies using this task (Waltz et al., 2007). It is therefore possible that our SZ group performed near chance during acquisition because of the difficult nature of the stimuli and more severe cognitive deficits that made these stimuli exceptionally difficult to learn. Future studies should evaluate this possibility using different stimulus types and larger sample sizes to account for lower power during analyses conducted on participants who met AB stimulus criterion. A third limitation of the study was that smoking status was not evaluated and we could therefore not evaluate the impact of smoking behavior on reinforcement learning among the groups. This will be an important future direction given prior evidence for an effect of smoking behavior on reinforcement learning (Barr, Pizzagalli, Culhane, Goff, & Evins, 2008). Finally, although analyses were conducted to determine potential group differences among patients prescribed antipsychotics of differing levels of D2 blockade, the role of antipsychotic medications could not be adequately evaluated. Prepost design studies using initially unmedicated first-episode psychotic patients should be conducted in the future.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

REWARD AND PSYCHOSIS

In summary, our data suggest that abnormalities in the reward system are more prominent in SZ than BD. Current psychotic symptoms but not a history of psychosis was associated with reinforcement learning abnormalities across diagnostic boundaries. In currently psychotic patients, abnormalities in the basic dopamine machinery of the basal ganglia may contribute to errors in prediction error signaling that maladaptively update cortical representations with irrelevant information. Aberrant prediction error signals may affect mechanisms responsible for gating information into the prefrontal cortex, causing it to attribute salience toward irrelevant environmental cues and fail to attend to cues that are motivationally relevant. Additional research that combines computational modeling and neuroimaging is needed to confirm these interpretations.

References Allen, D. N., Randall, C., Bello, D., Armstrong, C., Frantom, L., Cross, C., & Kinney, J. (2010). Are working memory deficits in bipolar disorder markers for psychosis? Neuropsychology, 24, 244 –254. http://dx.doi .org/10.1037/a0018159 American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). Arlington, VA: Author. Andreasen, N. C. (1984). Scale for the Assessment of Positive Symptoms (SAPS). Iowa City: University of Iowa. Barr, R. S., Pizzagalli, D. A., Culhane, M. A., Goff, D. C., & Evins, A. E. (2008). A single dose of nicotine enhances reward responsiveness in nonsmokers: Implications for development of dependence. Biological Psychiatry, 63, 1061–1065. http://dx.doi.org/10.1016/j.biopsych.2007 .09.015 Berrettini, W. (2003a). Bipolar disorder and schizophrenia: Not so distant relatives? World Psychiatry; Official Journal of the World Psychiatric Association (WPA), 2, 68 –72. Berrettini, W. (2003b). Evidence for shared susceptibility in bipolar disorder and schizophrenia. American Journal of Medical Genetics. Part C, Seminars in Medical Genetics, 123C(1), 59 – 64. http://dx.doi.org/ 10.1002/ajmg.c.20014 Corlett, P. R., Frith, C. D., & Fletcher, P. C. (2009). From drugs to deprivation: A Bayesian framework for understanding models of psychosis. Psychopharmacology, 206, 515–530. http://dx.doi.org/10.1007/ s00213-009-1561-0 Corlett, P. R., Honey, G. D., & Fletcher, P. C. (2007). From prediction error to psychosis: Ketamine as a pharmacological model of delusions. Journal of Psychopharmacology, 21, 238 –252. http://dx.doi.org/ 10.1177/0269881107077716 Corlett, P. R., Murray, G. K., Honey, G. D., Aitken, M. R., Shanks, D. R., Robbins, T. W., . . . Fletcher, P. C. (2007). Disrupted prediction-error signal in psychosis: Evidence for an associative account of delusions. Brain, 130(Pt. 9), 2387–2400. http://dx.doi.org/10.1093/brain/awm173 Craddock, N., O’Donovan, M. C., & Owen, M. J. (2005). The genetics of schizophrenia and bipolar disorder: Dissecting psychosis. Journal of Medical Genetics, 42, 193–204. Craddock, N., O’Donovan, M. C., & Owen, M. J. (2007). Phenotypic and genetic complexity of psychosis Invited commentary on . . . Schizophrenia: A common disease caused by multiple rare alleles. The British Journal of Psychiatry, 190, 200 –203. First, M. B., Spitzer, R. L., Gibbon, M., & Williams, J. B. W. (2001). Structured Clinical Interview for DSM-IV-TR Axis I Disorders—Patient Edition (SCID-I/P 2/2001 Revision). New York, NY: Biometrics Research Department, New York State Psychiatric Institute. Frank, M. J. (2008). Schizophrenia: A computational reinforcement learning perspective. Schizophrenia Bulletin, 34, 1008 –1011. http://dx.doi .org/10.1093/schbul/sbn123

707

Frank, M. J., & Claus, E. D. (2006). Anatomy of a decision: Striatoorbitofrontal interactions in reinforcement learning, decision making, and reversal. Psychological Review, 113, 300 –326. http://dx.doi.org/ 10.1037/0033-295X.113.2.300 Frank, M. J., Seeberger, L. C., & O’Reilly, R. C. (2004, December 10). By carrot or by stick: Cognitive reinforcement learning in Parkinsonism. Science, 306, 1940 –1943. http://dx.doi.org/10.1126/science.1102941 Frye, M. A., Ketter, T. A., Altshuler, L. L., Denicoff, K., Dunn, R. T., Kimbrell, T. A., . . . Post, R. M. (1998). Clozapine in bipolar disorder: Treatment implications for other atypical antipsychotics. Journal of Affective Disorders, 48, 91–104. Furuyashiki, T., & Gallagher, M. (2007). Neural encoding in the orbitofrontal cortex related to goal-directed behavior. Annals of the New York Academy of Sciences, 1121, 193–215. http://dx.doi.org/10.1196/annals .1401.037 Glahn, D. C., Bearden, C. E., Cakir, S., Barrett, J. A., Najt, P., Serap Monkul, E., . . . Soares, J. C. (2006). Differential working memory impairment in bipolar disorder and schizophrenia: Effects of lifetime history of psychosis. Bipolar Disorders, 8, 117–123. http://dx.doi.org/ 10.1111/j.1399-5618.2006.00296.x Gold, J. M., Strauss, G. P., Waltz, J. A., Robinson, B. M., Brown, J. K., & Frank, M. J. (2013). Negative symptoms of schizophrenia are associated with abnormal effort-cost computations. Biological Psychiatry, 74, 130 – 136. http://dx.doi.org/10.1016/j.biopsych.2012.12.022 Gold, J. M., Waltz, J. A., Matveeva, T. M., Kasanova, Z., Strauss, G. P., Herbener, E. S., . . . Frank, M. J. (2012). Negative symptoms and the failure to represent the expected reward value of actions: Behavioral and computational modeling evidence. Archives of General Psychiatry, 69, 129 –138. http://dx.doi.org/10.1001/archgenpsychiatry.2011.1269 Grace, A. A. (1991). Phasic versus tonic dopamine release and the modulation of dopamine system responsivity: A hypothesis for the etiology of schizophrenia. Neuroscience, 41, 1–24. Greene, T. (2007). The Kraepelinian dichotomy: The twin pillars crumbling? History of Psychiatry, 18, 361–379. http://dx.doi.org/10.1177/ 0957154X07078977 Heckers, S. (2008). Making progress in schizophrenia research. Schizophrenia Bulletin, 34, 591–594. http://dx.doi.org/10.1093/schbul/sbn046 Hill, S. K., Harris, M. S., Herbener, E. S., Pavuluri, M., & Sweeney, J. A. (2008). Neurocognitive allied phenotypes for schizophrenia and bipolar disorder. Schizophrenia Bulletin, 34, 743–759. http://dx.doi.org/ 10.1093/schbul/sbn027 Joel, D., Niv, Y., & Ruppin, E. (2002). Actor– critic models of the basal ganglia: New anatomical and computational perspectives. Neural Networks, 15, 535–547. http://dx.doi.org/10.1016/S0893-6080(02)00047-3 Keshavan, M. S., Morris, D. W., Sweeney, J. A., Pearlson, G., Thaker, G., Seidman, L. J., . . . Tamminga, C. (2011). A dimensional approach to the psychosis spectrum between bipolar disorder and schizophrenia: The Schizo-Bipolar Scale. Schizophrenia Research, 133, 250 –254. http://dx .doi.org/10.1016/j.schres.2011.09.005 Laruelle, M., & Abi-Dargham, A. (1999). Dopamine as the wind of the psychotic fire: New evidence from brain imaging studies. Journal of Psychopharmacology, 13, 358 –371. http://dx.doi.org/10.1177/ 026988119901300405 Linke, J., Sönnekes, C., & Wessa, M. (2011). Sensitivity to positive and negative feedback in euthymic patients with bipolar I disorder: The last episode makes the difference. Bipolar Disorders, 13, 638 – 650. http:// dx.doi.org/10.1111/j.1399-5618.2011.00956.x Montague, P. R. (1999). Reinforcement learning: An introduction. Trends in Cognitive Sciences, 3, 360 –361. http://dx.doi.org/10.1016/S13646613(99)01331-5 Nakamura, M., Nestor, P. G., Levitt, J. J., Cohen, A. S., Kawashima, T., Shenton, M. E., & McCarley, R. W. (2008). Orbitofrontal volume deficit in schizophrenia and thought disorder. Brain, 131, 180 –195. http://dx .doi.org/10.1093/brain/awm265

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

708

STRAUSS ET AL.

Pizzagalli, D. A., Goetz, E., Ostacher, M., Iosifescu, D. V., & Perlis, R. H. (2008). Euthymic patients with bipolar disorder show decreased reward learning in a probabilistic reward task. Biological Psychiatry, 64, 162– 168. http://dx.doi.org/10.1016/j.biopsych.2007.12.001 Plassmann, H., O’Doherty, J. P., & Rangel, A. (2010). Appetitive and aversive goal values are encoded in the medial orbitofrontal cortex at the time of decision making. Journal of Neuroscience, 30, 10799 –10808. http://dx.doi.org/10.1523/JNEUROSCI.0788-10.2010 Post, R. M., Frye, M. A., Denicoff, K. D., Leverich, G. S., Kimbrell, T. A., & Dunn, R. T. (1998). Beyond lithium in the treatment of bipolar illness. Neuropsychopharmacology, 19, 206 –219. http://dx.doi.org/10.1016/ S0893-133X(98)00020-7 Potash, J. B., Chiu, Y. F., MacKinnon, D. F., Miller, E. B., Simpson, S. G., McMahon, F. J., . . . DePaulo, J. R. (2003). Familial aggregation of psychotic symptoms in a replication set of 69 bipolar disorder pedigrees. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics, 116, 90 –97. Ringe, W. K., Saine, K. C., Lacritz, L. H., Hynan, L. S., & Cullum, C. M. (2002). Dyadic short forms of the Wechsler Adult Intelligence Scale–III. Assessment, 9, 254 –260. http://dx.doi.org/10.1177/1073191102009003004 Roesch, M. R., & Olson, C. R. (2007). Neuronal activity related to anticipated reward in frontal cortex: Does it represent value or reflect motivation? Annals of the New York Academy of Sciences, 1121, 431– 446. http://dx.doi.org/10.1196/annals.1401.004 Roiser, J. P., Cannon, D. M., Gandhi, S. K., Taylor Tavares, J., Erickson, K., Wood, S., . . . Drevets, W. C. (2009). Hot and cold cognition in unmedicated depressed subjects with bipolar disorder. Bipolar Disorders, 11, 178 –189. http://dx.doi.org/10.1111/j.1399-5618.2009.00669.x Schlagenhauf, F., Huys, Q. J., Deserno, L., Rapp, M. A., Beck, A., Heinze, H. J., . . . Heinz, A. (2014). Striatal dysfunction during reversal learning in unmedicated schizophrenia patients. Neuroimage, 89, 171–180. Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80, 1–27. Seidman, L. J., Kremen, W. S., Koren, D., Faraone, S. V., Goldstein, J. M., & Tsuang, M. T. (2002). A comparative profile analysis of neuropsychological functioning in patients with schizophrenia and bipolar psychoses. Schizophrenia Research, 53, 31– 44. Singh, M. K., Chang, K. D., Kelley, R. G., Cui, X., Sherdell, L., Howe, M. E., . . . Reiss, A. L. (2013). Reward processing in adolescents with bipolar I disorder. Journal of the American Academy of Child and Adolescent Psychiatry, 52, 68 – 83. http://dx.doi.org/10.1016/j.jaac.2012 .10.004 Stanfield, A. C., Moorhead, T. W., Job, D. E., McKirdy, J., Sussmann, J. E., Hall, J., . . . McIntosh, A. M. (2009). Structural abnormalities of ventrolateral and orbitofrontal cortex in patients with familial bipolar disorder. Bipolar Disorders, 11, 135–144. http://dx.doi.org/10.1111/j .1399-5618.2009.00666.x Strauss, G. P., Frank, M. J., Waltz, J. A., Kasanova, Z., Herbener, E. S., & Gold, J. M. (2011). Deficits in positive reinforcement learning and uncertainty-driven exploration are associated with distinct aspects of negative symptoms in schizophrenia. Biological Psychiatry, 69, 424 – 431. http://dx.doi.org/10.1016/j.biopsych.2010.10.015

Strauss, G. P., Robinson, B. M., Waltz, J. A., Frank, M. J., Kasanova, Z., Herbener, E. S., & Gold, J. M. (2011). Patients with schizophrenia demonstrate inconsistent preference judgments for affective and nonaffective stimuli. Schizophrenia Bulletin, 37, 1295–1304. http://dx.doi.org/ 10.1093/schbul/sbq047 Sutton, R., & Barto, A. G. (1998). Reinforcement Learning. Cambridge, MA: MIT Press. Thaler, N. S., Allen, D. N., Sutton, G. P., Vertinski, M., & Ringdahl, E. N. (2013). Differential impairment of social cognition factors in bipolar disorder with and without psychotic features and schizophrenia. Journal of Psychiatric Research, 47, 2004 –2010. http://dx.doi.org/10.1016/j .jpsychires.2013.09.010 Thaler, N. S., Strauss, G. P., Sutton, G. P., Vertinski, M., Ringdahl, E. N., Snyder, J. S., & Allen, D. N. (2013). Emotion perception abnormalities across sensory modalities in bipolar disorder with psychotic features and schizophrenia. Schizophrenia Research, 147, 287–292. http://dx.doi.org/ 10.1016/j.schres.2013.04.001 Waltz, J. A., Frank, M. J., Robinson, B. M., & Gold, J. M. (2007). Selective reinforcement learning deficits in schizophrenia support predictions from computational models of striatal-cortical dysfunction. Biological Psychiatry, 62, 756 –764. http://dx.doi.org/10.1016/j.biopsych.2006.09 .042 Waltz, J. A., Frank, M. J., Wiecki, T. V., & Gold, J. M. (2011). Altered probabilistic learning and response biases in schizophrenia: Behavioral evidence and neurocomputational modeling. Neuropsychology, 25, 86 – 97. http://dx.doi.org/10.1037/a0020882 Waltz, J. A., & Gold, J. M. (2007). Probabilistic reversal learning impairments in schizophrenia: Further evidence of orbitofrontal dysfunction. Schizophrenia Research, 93, 296 –303. http://dx.doi.org/10.1016/j .schres.2007.03.010 Waltz, J. A., Kasanova, Z., Ross, T. J., Salmeron, B. J., McMahon, R. P., Gold, J. M., & Stein, E. A. (2013). The roles of reward, default, and executive control networks in set-shifting impairments in schizophrenia. PLoS ONE, 8(2), e57257. http://dx.doi.org/10.1371/journal.pone .0057257 Waltz, J. A., Schweitzer, J. B., Ross, T. J., Kurup, P. K., Salmeron, B. J., Rose, E. J., . . . Stein, E. A. (2010). Abnormal responses to monetary outcomes in cortex, but not in the basal ganglia, in schizophrenia. Neuropsychopharmacology, 35, 2427–2439. http://dx.doi.org/10.1038/ npp.2010.126 Wechsler, D. (1997). Wechsler Adult Intelligence Scale (3rd ed.) administration and scoring manual. San Antonio, TX: Psychological Corporation. Yu, K., Cheung, C., Leung, M., Li, Q., Chua, S., & McAlonan, G. (2010). Are bipolar disorder and schizophrenia neuroanatomically distinct? An anatomical likelihood meta-analysis. Frontiers in Human Neuroscience, 4, 189. http://dx.doi.org/10.3389/fnhum.2010.00189

Received March 29, 2014 Revision received December 3, 2014 Accepted December 4, 2014 䡲

Predicting psychosis across diagnostic boundaries: Behavioral and computational modeling evidence for impaired reinforcement learning in schizophrenia and bipolar disorder with a history of psychosis.

There is increasing evidence that schizophrenia (SZ) and bipolar disorder (BD) share a number of cognitive, neurobiological, and genetic markers. Shar...
599KB Sizes 0 Downloads 6 Views