Learning from delayed feedback: neural responses in temporal credit assignment

被引:47
作者
Walsh, Matthew M. [1 ]
Anderson, John R. [1 ]
机构
[1] Carnegie Mellon Univ, Dept Psychol, Pittsburgh, PA 15213 USA
关键词
Actor/critic; Credit assignment; Eligibility traces; Event-related potentials; Q-learning; SARSA; Temporal difference learning; ANTERIOR CINGULATE CORTEX; MEDIAL FRONTAL-CORTEX; STIMULUS-PRECEDING NEGATIVITY; ERROR-RELATED NEGATIVITY; DOPAMINE NEURONS ENCODE; TIME-ESTIMATION TASK; REWARD PREDICTION; BRAIN POTENTIALS; DECISION-MAKING; BAD OUTCOMES;
D O I
10.3758/s13415-011-0027-0
中图分类号
B84 [心理学]; C [社会科学总论]; Q98 [人类学];
学科分类号
03 ; 0303 ; 030303 ; 04 ; 0402 ;
摘要
When feedback follows a sequence of decisions, relationships between actions and outcomes can be difficult to learn. We used event-related potentials (ERPs) to understand how people overcome this temporal credit assignment problem. Participants performed a sequential decision task that required two decisions on each trial. The first decision led to an intermediate state that was predictive of the trial outcome, and the second decision was followed by positive or negative trial feedback. The feedback-related negativity (fERN), a component thought to reflect reward prediction error, followed negative feedback and negative intermediate states. This suggests that participants evaluated intermediate states in terms of expected future reward, and that these evaluations supported learning of earlier actions within sequences. We examine the predictions of several temporal-difference models to determine whether the behavioral and ERP results reflected a reinforcement-learning process.
引用
收藏
页码:131 / 143
页数:13
相关论文
共 65 条
[1]  
[Anonymous], 1994, ON LINE Q LEARNING U
[2]  
[Anonymous], 1963, Computers and thought
[3]  
[Anonymous], 2012, Event-related potentials
[4]   Which Way Do I Go? Neural Activation in Response to Feedback and Spatial Processing in a Virtual T-Maze [J].
Baker, Travis E. ;
Holroyd, Clay B. .
CEREBRAL CORTEX, 2009, 19 (08) :1708-1722
[5]   NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS [J].
BARTO, AG ;
SUTTON, RS ;
ANDERSON, CW .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05) :834-846
[6]   Stimulus-preceding negativity induced by fear:: a manifestation of affective anticipation [J].
Böcker, KBE ;
Baas, JMP ;
Kenemans, JL ;
Verbaten, MN .
INTERNATIONAL JOURNAL OF PSYCHOPHYSIOLOGY, 2001, 43 (01) :77-90
[7]   Short-term memory traces for action bias in human reinforcement learning [J].
Bogacz, Rafal ;
McClure, Samuel M. ;
Li, Jian ;
Cohen, Jonathan D. ;
Montague, P. Read .
BRAIN RESEARCH, 2007, 1153 :111-121
[8]   Anterior cingulate cortex and response conflict: Effects of frequency, inhibition and errors [J].
Braver, TS ;
Barch, DM ;
Gray, JR ;
Molfese, DL ;
Snyder, A .
CEREBRAL CORTEX, 2001, 11 (09) :825-836
[9]   Reinforcement learning signals predict future decisions [J].
Cohen, Michael X. ;
Ranganath, Charan .
JOURNAL OF NEUROSCIENCE, 2007, 27 (02) :371-378
[10]   LOCALIZATION OF A NEURAL SYSTEM FOR ERROR-DETECTION AND COMPENSATION [J].
DEHAENE, S ;
POSNER, MI ;
TUCKER, DM .
PSYCHOLOGICAL SCIENCE, 1994, 5 (05) :303-305