Retrospective model-based inference guides model-free credit assignment

被引:26
作者
Moran, Rani [1 ,2 ]
Keramati, Mehdi [1 ,2 ,3 ]
Dayan, Peter [1 ,4 ,5 ]
Dolan, Raymond J. [1 ,2 ]
机构
[1] UCL, Max Planck UCL Ctr Computat Psychiat & Ageing Res, 10-12 Russell Sq, London WC1B 5EH, England
[2] UCL, Wellcome Ctr Human Neuroimaging, London WC1N 3BG, England
[3] City Univ London, Dept Psychol, London EC1R 0JD, England
[4] UCL, Gatsby Computat Neurosci Unit, London W1T 4JG, England
[5] Max Planck Inst Biol Cybernet, Max Plank Ring 8, D-72076 Tubingen, Germany
基金
英国惠康基金;
关键词
PREFRONTAL CORTEX; PREDICTION; DOPAMINE; REINFORCEMENT; HABITS; CHOICE; STRIATUM; ERRORS;
D O I
10.1038/s41467-019-08662-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
An extensive reinforcement learning literature shows that organisms assign credit efficiently, even under conditions of state uncertainty. However, little is known about credit-assignment when state uncertainty is subsequently resolved. Here, we address this problem within the framework of an interaction between model-free (MF) and model-based (MB) control systems. We present and support experimentally a theory of MB retrospective-inference. Within this framework, a MB system resolves uncertainty that prevailed when actions were taken thus guiding an MF credit-assignment. Using a task in which there was initial uncertainty about the lotteries that were chosen, we found that when participants' momentary uncertainty about which lottery had generated an outcome was resolved by provision of subsequent information, participants preferentially assigned credit within a MF system to the lottery they retrospectively inferred was responsible for this outcome. These findings extend our knowledge about the range of MB functions and the scope of system interactions.
引用
收藏
页数:14
相关论文
共 42 条
[1]   INSTRUMENTAL RESPONDING FOLLOWING REINFORCER DEVALUATION [J].
ADAMS, CD ;
DICKINSON, A .
QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY SECTION B-COMPARATIVE AND PHYSIOLOGICAL PSYCHOLOGY, 1981, 33 (MAY) :109-121
[2]   Belief state representation in the dopamine system [J].
Babayan, Benedicte M. ;
Uchida, Naoshige ;
Gershman, Samuel. J. .
NATURE COMMUNICATIONS, 2018, 9
[3]   Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action [J].
Balleine, Bernard W. ;
O'Doherty, John P. .
NEUROPSYCHOPHARMACOLOGY, 2010, 35 (01) :48-69
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]   Intrinsic Valuation of Information in Decision Making under Uncertainty [J].
Bennett, Daniel ;
Bode, Stefan ;
Brydevall, Maja ;
Warren, Hayley ;
Murawski, Carsten .
PLOS COMPUTATIONAL BIOLOGY, 2016, 12 (07)
[6]   Lateral habenula neurons signal errors in the prediction of reward information [J].
Bromberg-Martin, Ethan S. ;
Hikosaka, Okihide .
NATURE NEUROSCIENCE, 2011, 14 (09) :1209-U149
[7]   Habitual control of goal selection in humans [J].
Cushman, Fiery ;
Morris, Adam .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (45) :13817-13822
[8]   Representation and timing in theories of the dopamine system [J].
Daw, Nathaniel D. ;
Courville, Aaron C. ;
Tourtezky, David S. .
NEURAL COMPUTATION, 2006, 18 (07) :1637-1677
[9]   Model-Based Influences on Humans' Choices and Striatal Prediction Errors [J].
Daw, Nathaniel D. ;
Gershman, Samuel J. ;
Seymour, Ben ;
Dayan, Peter ;
Dolan, Raymond J. .
NEURON, 2011, 69 (06) :1204-1215
[10]   Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control [J].
Daw, ND ;
Niv, Y ;
Dayan, P .
NATURE NEUROSCIENCE, 2005, 8 (12) :1704-1711