Reinforcement learning and human behavior

被引:65
作者
Shteingart, Hanan [1 ]
Loewenstein, Yonatan [1 ,2 ,3 ,4 ]
机构
[1] Hebrew Univ Jerusalem, Edmond & Lily Safra Ctr Brain Sci, IL-91904 Jerusalem, Israel
[2] Hebrew Univ Jerusalem, Alexander Silberman Inst Life Sci, Dept Neurobiol, IL-91904 Jerusalem, Israel
[3] Hebrew Univ Jerusalem, Dept Cognit Sci, IL-91904 Jerusalem, Israel
[4] Hebrew Univ Jerusalem, Ctr Study Rat, IL-91904 Jerusalem, Israel
基金
以色列科学基金会;
关键词
SYNAPTIC PLASTICITY; REWARD; PROBABILITY; PREDICTION; MODEL; COVARIANCE; STRIATUM; SURPRISE; SIGNALS; EVENTS;
D O I
10.1016/j.conb.2013.12.004
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
The dominant computational approach to model operant learning and its underlying neural activity is model-free reinforcement learning (RL). However, there is accumulating behavioral and neuronal-related evidence that human (and animal) operant learning is far more multifaceted. Theoretical advances in RL, such as hierarchical and model-based RL extend the explanatory power of RL to account for some of these findings. Nevertheless, some other aspects of human behavior remain inexplicable even in the simplest tasks. Here we review developments and remaining challenges in relating RL models to human operant learning. In particular, we emphasize that learning a model of the world is an essential step before or in parallel to learning the policy in RL and discuss alternative models that directly learn a policy without an explicit world model in terms of state-action pairs.
引用
收藏
页码:93 / 98
页数:6
相关论文
共 53 条
[1]   A basal ganglia-forebrain circuit in the songbird biases motor output to avoid vocal errors [J].
Andalman, Aaron S. ;
Fee, Michale S. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (30) :12518-12523
[2]  
[Anonymous], P 26 AAAI C ART INT
[3]  
[Anonymous], 2020, Reinforcement Learning, An Introduction
[4]  
Barto AG, 2003, DISCRETE EVENT DYN S, V13, P343
[5]   Infinite-horizon policy-gradient estimation [J].
Baxter, J ;
Bartlett, PL .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 15 :319-350
[6]   Model-Based Influences on Humans' Choices and Striatal Prediction Errors [J].
Daw, Nathaniel D. ;
Gershman, Samuel J. ;
Seymour, Ben ;
Dayan, Peter ;
Dolan, Raymond J. .
NEURON, 2011, 69 (06) :1204-1215
[7]   Adaptive learning and risk taking [J].
Denrell, Jerker .
PSYCHOLOGICAL REVIEW, 2007, 114 (01) :177-187
[8]   Hierarchical Learning Induces Two Simultaneous, But Separable, Prediction Errors in Human Basal Ganglia [J].
Diuk, Carlos ;
Tsai, Karin ;
Wallis, Jonathan ;
Botvinick, Matthew ;
Niv, Yael .
JOURNAL OF NEUROSCIENCE, 2013, 33 (13) :5797-5805
[9]   Model of birdsong learning based on gradient estimation by dynamic perturbation of neural conductances [J].
Fiete, Ila R. ;
Fee, Michale S. ;
Seung, H. Sebastian .
JOURNAL OF NEUROPHYSIOLOGY, 2007, 98 (04) :2038-2057
[10]   Discrete coding of reward probability and uncertainty by dopamine neurons [J].
Fiorillo, CD ;
Tobler, PN ;
Schultz, W .
SCIENCE, 2003, 299 (5614) :1898-1902