Credit assignment in movement-dependent reinforcement learning

被引:49
作者
McDougle, Samuel D. [1 ,2 ]
Boggess, Matthew J. [3 ]
Crossley, Matthew J. [3 ]
Parvin, Darius [3 ]
Ivry, Richard B. [3 ,4 ]
Taylor, Jordan A. [1 ,2 ]
机构
[1] Princeton Univ, Dept Psychol, Princeton, NJ 08544 USA
[2] Princeton Univ, Princeton Neurosci Inst, Princeton, NJ 08544 USA
[3] Univ Calif Berkeley, Dept Psychol, 3210 Tolman Hall, Berkeley, CA 94720 USA
[4] Univ Calif Berkeley, Helen Wills Neurosci Inst, Berkeley, CA 94720 USA
基金
美国国家科学基金会;
关键词
decision-making; reinforcement learning; sensory prediction error; reward prediction error; cerebellum; SENSORY PREDICTION ERRORS; EXPLICIT STRATEGY; MULTIPLE ROLES; CEREBELLUM; DECISION; ADAPTATION; MODULATION;
D O I
10.1073/pnas.1523669113
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
When a person fails to obtain an expected reward from an object in the environment, they face a credit assignment problem: Did the absence of reward reflect an extrinsic property of the environment or an intrinsic error in motor execution? To explore this problem, we modified a popular decision-making task used in studies of reinforcement learning, the two-armed bandit task. We compared a version in which choices were indicated by key presses, the standard response in such tasks, to a version in which the choices were indicated by reaching movements, which affords execution failures. In the key press condition, participants exhibited a strong risk aversion bias; strikingly, this bias reversed in the reaching condition. This result can be explained by a reinforcement model wherein movement errors influence decision-making, either by gating reward prediction errors or by modifying an implicit representation of motor competence. Two further experiments support the gating hypothesis. First, we used a condition in which we provided visual cues indicative of movement errors but informed the participants that trial outcomes were independent of their actual movements. The main result was replicated, indicating that the gating process is independent of participants' explicit sense of control. Second, individuals with cerebellar degeneration failed to modulate their behavior between the key press and reach conditions, providing converging evidence of an implicit influence of movement error signals on reinforcement learning. These results provide a mechanistically tractable solution to the credit assignment problem.
引用
收藏
页码:6797 / 6802
页数:6
相关论文
共 35 条
[1]   The basal ganglia communicate with the cerebellum [J].
Bostan, Andreea C. ;
Dum, Richard P. ;
Strick, Peter L. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (18) :8452-8456
[2]   The organization of the human cerebellum estimated by intrinsic functional connectivity [J].
Buckner, Randy L. ;
Krienen, Fenna M. ;
Castellanos, Angela ;
Diaz, Julio C. ;
Yeo, B. T. Thomas .
JOURNAL OF NEUROPHYSIOLOGY, 2011, 106 (05) :2322-2345
[3]   Short latency cerebellar modulation of the basal ganglia [J].
Chen, Christopher H. ;
Fremont, Rachel ;
Arteaga-Bracho, Eduardo E. ;
Khodakhah, Kamran .
NATURE NEUROSCIENCE, 2014, 17 (12) :1767-1775
[4]   Cortical substrates for exploratory decisions in humans [J].
Daw, Nathaniel D. ;
O'Doherty, John P. ;
Dayan, Peter ;
Seymour, Ben ;
Dolan, Raymond J. .
NATURE, 2006, 441 (7095) :876-879
[5]   Model-Based Influences on Humans' Choices and Striatal Prediction Errors [J].
Daw, Nathaniel D. ;
Gershman, Samuel J. ;
Seymour, Ben ;
Dayan, Peter ;
Dolan, Raymond J. .
NEURON, 2011, 69 (06) :1204-1215
[6]   Opponent interactions between serotonin and dopamine [J].
Daw, ND ;
Kakade, S ;
Dayan, P .
NEURAL NETWORKS, 2002, 15 (4-6) :603-616
[7]   Quantitative Assessment of Brain Stem and Cerebellar Atrophy in Spinocerebellar Ataxia Types 3 and 6: Impact on Clinical Status [J].
Eichler, L. ;
Bellenberg, B. ;
Hahn, H. K. ;
Koester, O. ;
Schoels, L. ;
Lukas, C. .
AMERICAN JOURNAL OF NEURORADIOLOGY, 2011, 32 (05) :890-897
[8]   Testing a simplified method for measuring velocity integration in saccades using a manipulation of target contrast [J].
Etchells, Peter J. ;
Benton, Christopher P. ;
Ludwig, Casimir J. H. ;
Gilchrist, Iain D. .
FRONTIERS IN PSYCHOLOGY, 2011, 2
[9]   Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning [J].
Frank, Michael J. ;
Moustafa, Ahmed A. ;
Haughey, Heather M. ;
Curran, Tim ;
Hutchison, Kent E. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (41) :16311-16316
[10]   Do learning rates adapt to the distribution of rewards? [J].
Gershman, Samuel J. .
PSYCHONOMIC BULLETIN & REVIEW, 2015, 22 (05) :1320-1327