Ventral Striatum and Orbitofrontal Cortex Are Both Required for Model-Based, But Not Model-Free, Reinforcement Learning

被引：169

作者：

McDannald, Michael A. ^{[1
]}

Lucantonio, Federica ^{[2
]}

Burke, Kathryn A. ^{[2
]}

Niv, Yael ^{[4
,5
]}

Schoenbaum, Geoffrey ^{[1
,2
,3
]}

机构：

[1] Univ Maryland, Dept Anat & Neurobiol, Sch Med, Baltimore, MD 21201 USA

[2] Univ Maryland, Program Neurosci, Sch Med, Baltimore, MD 21201 USA

[3] Univ Maryland, Program Neurosci, Dept Psychiat, Baltimore, MD 21201 USA

[4] Princeton Univ, Inst Neurosci, Princeton, NJ 08540 USA

[5] Princeton Univ, Dept Psychol, Princeton, NJ 08540 USA

来源：

JOURNAL OF NEUROSCIENCE | 2011年 / 31卷 / 07期

关键词：

ORBITAL PREFRONTAL CORTEX; NUCLEUS-ACCUMBENS CORE; BASOLATERAL AMYGDALA; REWARD PREFERENCE; DORSAL STRIATUM; PREDICTION; DISSOCIATION; SYSTEMS; LESIONS; INFORMATION;

D O I：

10.1523/JNEUROSCI.5499-10.2011

中图分类号：

Q189 [神经科学];

学科分类号：

071006 ;

摘要：

In many cases, learning is thought to be driven by differences between the value of rewards we expect and rewards we actually receive. Yet learning can also occur when the identity of the reward we receive is not as expected, even if its value remains unchanged. Learning from changes in reward identity implies access to an internal model of the environment, from which information about the identity of the expected reward can be derived. As a result, such learning is not easily accounted for by model-free reinforcement learning theories such as temporal difference reinforcement learning (TDRL), which predicate learning on changes in reward value, but not identity. Here, we used unblocking procedures to assess learning driven by value-versus identity-based prediction errors. Rats were trained to associate distinct visual cues with different food quantities and identities. These cues were subsequently presented in compound with novel auditory cues and the reward quantity or identity was selectively changed. Unblocking was assessed by presenting the auditory cues alone in a probe test. Consistent with neural implementations of TDRL models, we found that the ventral striatum was necessary for learning in response to changes in reward value. However, this area, along with orbitofrontal cortex, was also required for learning driven by changes in reward identity. This observation requires that existing models of TDRL in the ventral striatum be modified to include information about the specific features of expected outcomes derived from model-based representations, and that the role of orbitofrontal cortex in these models be clearly delineated.

引用

页码：2700 / 2705

页数：6

共 42 条

[1] Separate neural substrates for skill learning and performance in the ventral and dorsal striatum [J].

Atallah, Hisham E. ;

Lopez-Paniagua, Dan ;

Rudy, Jerry W. ;

O'Reilly, Randall C. .

NATURE NEUROSCIENCE, 2007, 10 (01) :126-131

[2]

Barto A.G., 1994, Models of Information Processing in the Basal Ganglia, P215, DOI [10.7551/mitpress/4708.003.0018, DOI 10.7551/MITPRESS/4708.003.0018]

[3] The role of the orbitofrontal cortex in the pursuit of happiness and more specific rewards [J].

Burke, Kathryn A. ;

Franz, Theresa M. ;

Miller, Danielle N. ;

Schoenbaum, Geoffrey .

NATURE, 2008, 454 (7202) :340-U45

[4] Emotion and motivation: the role of the amygdala, ventral striatum, and prefrontal cortex [J].

Cardinal, RN ;

Parkinson, JA ;

Hall, J ;

Everitt, BJ .

NEUROSCIENCE AND BIOBEHAVIORAL REVIEWS, 2002, 26 (03) :321-352

[5] The role of the nucleus accumbens in instrumental conditioning: Evidence of a functional dissociation between accumbens core and shell [J].

Corbit, LH ;

Muir, JL ;

Balleine, BW .

JOURNAL OF NEUROSCIENCE, 2001, 21 (09) :3251-3260

[6] Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control [J].

Daw, ND ;

Niv, Y ;

Dayan, P .

NATURE NEUROSCIENCE, 2005, 8 (12) :1704-1711

[7] The misbehavior of value and the discipline of the will [J].

Dayan, Peter ;

Niv, Yael ;

Seymour, Ben ;

Daw, Nathaniel D. .

NEURAL NETWORKS, 2006, 19 (08) :1153-1160

[8]

Gallagher M, 1999, J NEUROSCI, V19, P6610

[9] Dissociable cost and benefit encoding of future rewards by mesolimbic dopamine [J].

Gan, Jerylin O. ;

Walton, Mark E. ;

Phillips, Paul E. M. .

NATURE NEUROSCIENCE, 2010, 13 (01) :25-27

[10] States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning [J].

Glaescher, Jan ;

Daw, Nathaniel ;

Dayan, Peter ;

O'Doherty, John P. .

NEURON, 2010, 66 (04) :585-595

← 1 2 3 4 5 →