Multiple model-based reinforcement learning explains dopamine neuronal activity

被引:13
作者
Bertin, Mathieu
Schweighofer, Nicolas
Doya, Kenji
机构
[1] ATR Computat Neurosci Labs, Kyoto 6190288, Japan
[2] Univ Paris 06, Lab Informat Paris 6, F-75005 Paris, France
[3] Univ So Calif, Dept Biokinesiol & Phys Therapy, Los Angeles, CA 90089 USA
[4] Okinawa Inst Sci & Technol, Initial Res Project Lab, Neural Computat Unit, Okinawa 9042234, Japan
基金
美国国家科学基金会;
关键词
dopamine; reinforcement learning; multiple model; timing prediction; classical conditioning;
D O I
10.1016/j.neunet.2007.04.028
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A number of computational in ode Is have explained the behavior of dopamine, neurons in terms of temporal difference learning. However, earlier models cannot account for recent results of conditioning experiments; specifically, the behavior of dopamine neurons in case of variation of the interval between a cue stimulus and a reward has not been satisfyingly accounted for. We address this problem by using a modular architecture, in which each module consists of a reward predictor and a value estimator. A "responsibility signal", computed from the accuracy of the predictions of the reward predictors. is used to weight the contributions and learning of the value estimators. This multiple-model architecture gives an accurate account of the behavior of dopamine neurons in two specific experiments: when the reward is delivered earlier than expected, and when the Stimulus-reward interval varies uniformly over a fixed range. (c) 2007 Elsevier Ltd. All rights reserved.
引用
收藏
页码:668 / 675
页数:8
相关论文
共 28 条
[1]  
Courville AaronC., 2004, ADV NEURAL INFORM PR, V17, P313
[2]   Representation and timing in theories of the dopamine system [J].
Daw, Nathaniel D. ;
Courville, Aaron C. ;
Tourtezky, David S. .
NEURAL COMPUTATION, 2006, 18 (07) :1637-1677
[3]   Long-term reward prediction in TD models of the dopamine system [J].
Daw, ND ;
Touretzky, DS .
NEURAL COMPUTATION, 2002, 14 (11) :2567-2583
[4]  
DAW ND, 2003, THESIS CMU DEPT COMP
[5]   Complementary roles of basal ganglia and cerebellum in learning and motor control [J].
Doya, K .
CURRENT OPINION IN NEUROBIOLOGY, 2000, 10 (06) :732-739
[6]   Multiple model-based reinforcement learning [J].
Doya, K ;
Samejima, K ;
Katagiri, K ;
Kawato, M .
NEURAL COMPUTATION, 2002, 14 (06) :1347-1369
[7]  
FIALA JC, 1996, J NEUROSCI, V16, P3734
[8]  
FIORILLO CD, 2001, SOC NEUR ABSTR, V27, P827
[9]   NEURAL DYNAMICS OF ADAPTIVE TIMING AND TEMPORAL DISCRIMINATION DURING ASSOCIATIVE LEARNING [J].
GROSSBERG, S ;
SCHMAJUK, NA .
NEURAL NETWORKS, 1989, 2 (02) :79-102
[10]   A NEURAL-NETWORK MODEL OF ADAPTIVELY TIMED REINFORCEMENT LEARNING AND HIPPOCAMPAL DYNAMICS [J].
GROSSBERG, S ;
MERRILL, JWL .
COGNITIVE BRAIN RESEARCH, 1992, 1 (01) :3-38