Multiple model-based reinforcement learning explains dopamine neuronal activity

被引：13

作者：

Bertin, Mathieu

Schweighofer, Nicolas

Doya, Kenji

机构：

[1] ATR Computat Neurosci Labs, Kyoto 6190288, Japan

[2] Univ Paris 06, Lab Informat Paris 6, F-75005 Paris, France

[3] Univ So Calif, Dept Biokinesiol & Phys Therapy, Los Angeles, CA 90089 USA

[4] Okinawa Inst Sci & Technol, Initial Res Project Lab, Neural Computat Unit, Okinawa 9042234, Japan

来源：

NEURAL NETWORKS | 2007年 / 20卷 / 06期

基金：

美国国家科学基金会;

关键词：

dopamine; reinforcement learning; multiple model; timing prediction; classical conditioning;

D O I：

10.1016/j.neunet.2007.04.028

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A number of computational in ode Is have explained the behavior of dopamine, neurons in terms of temporal difference learning. However, earlier models cannot account for recent results of conditioning experiments; specifically, the behavior of dopamine neurons in case of variation of the interval between a cue stimulus and a reward has not been satisfyingly accounted for. We address this problem by using a modular architecture, in which each module consists of a reward predictor and a value estimator. A "responsibility signal", computed from the accuracy of the predictions of the reward predictors. is used to weight the contributions and learning of the value estimators. This multiple-model architecture gives an accurate account of the behavior of dopamine neurons in two specific experiments: when the reward is delivered earlier than expected, and when the Stimulus-reward interval varies uniformly over a fixed range. (c) 2007 Elsevier Ltd. All rights reserved.

引用

页码：668 / 675

页数：8

共 28 条

[1]

Courville AaronC., 2004, ADV NEURAL INFORM PR, V17, P313

[2] Representation and timing in theories of the dopamine system [J].