Simple Plans or Sophisticated Habits? State, Transition and Learning Interactions in the Two-Step Task

被引:72
作者
Akam, Thomas [1 ,2 ]
Costa, Rui [1 ]
Dayan, Peter [3 ]
机构
[1] Champalimaud Ctr, Champalimaud Neurosci Program, Lisbon, Portugal
[2] Univ Oxford, Dept Expt Psychol, S Parks Rd, Oxford OX1 3UD, England
[3] UCL, Gatsby Computat Neurosci Unit, London, England
基金
欧洲研究理事会; 英国惠康基金;
关键词
MEDIAL PREFRONTAL CORTEX; DORSOMEDIAL STRIATUM; CONTINGENCY; DOPAMINE; LESIONS; DISCRIMINATION; INACTIVATION; DECISIONS; SYSTEMS; DISRUPT;
D O I
10.1371/journal.pcbi.1004648
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The recently developed 'two-step' behavioural task promises to differentiate model-based from model-free reinforcement learning, while generating neurophysiologically-friendly decision datasets with parametric variation of decision variables. These desirable features have prompted its widespread adoption. Here, we analyse the interactions between a range of different strategies and the structure of transitions and outcomes in order to examine constraints on what can be learned from behavioural performance. The task involves a trade-off between the need for stochasticity, to allow strategies to be discriminated, and a need for determinism, so that it is worth subjects' investment of effort to exploit the contingencies optimally. We show through simulation that under certain conditions model-free strategies can masquerade as being model-based. We first show that seemingly innocuous modifications to the task structure can induce correlations between action values at the start of the trial and the subsequent trial events in such a way that analysis based on comparing successive trials can lead to erroneous conclusions. We confirm the power of a suggested correction to the analysis that can alleviate this problem. We then consider model-free reinforcement learning strategies that exploit correlations between where rewards are obtained and which actions have high expected value. These generate behaviour that appears model-based under these, and also more sophisticated, analyses. Exploiting the full potential of the two-step task as a tool for behavioural neuroscience requires an understanding of these issues.
引用
收藏
页数:25
相关论文
共 55 条
[1]   INSTRUMENTAL RESPONDING FOLLOWING REINFORCER DEVALUATION [J].
ADAMS, CD ;
DICKINSON, A .
QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY SECTION B-COMPARATIVE AND PHYSIOLOGICAL PSYCHOLOGY, 1981, 33 (MAY) :109-121
[2]  
[Anonymous], MOL PSYCHIAT
[3]  
[Anonymous], 1998, Reinforcement Learning: An Introduction
[4]   Goal-directed instrumental action: contingency and incentive learning and their cortical substrates [J].
Balleine, BW ;
Dickinson, A .
NEUROPHARMACOLOGY, 1998, 37 (4-5) :407-419
[5]   The effect of lesions of the basolateral amygdala on instrumental conditioning [J].
Balleine, BW ;
Killcross, AS ;
Dickinson, A .
JOURNAL OF NEUROSCIENCE, 2003, 23 (02) :666-675
[6]   Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective [J].
Botvinick, Matthew M. ;
Niv, Yael ;
Barto, Andrew C. .
COGNITION, 2009, 113 (03) :262-280
[7]   Rats and Humans Can Optimally Accumulate Evidence for Decision-Making [J].
Brunton, Bingni W. ;
Botvinick, Matthew M. ;
Brody, Carlos D. .
SCIENCE, 2013, 340 (6128) :95-98
[8]   POSTCONDITIONING DEVALUATION OF A REINFORCER AFFECTS INSTRUMENTAL RESPONDING [J].
COLWILL, RM ;
RESCORLA, RA .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-ANIMAL BEHAVIOR PROCESSES, 1985, 11 (01) :120-132
[9]   The role of prelimbic cortex in instrumental conditioning [J].
Corbit, LH ;
Balleine, BW .
BEHAVIOURAL BRAIN RESEARCH, 2003, 146 (1-2) :145-157
[10]   Reversal Learning and Dopamine: A Bayesian Perspective [J].
Costa, Vincent D. ;
Tran, Valery L. ;
Turchi, Janita ;
Averbeck, Bruno B. .
JOURNAL OF NEUROSCIENCE, 2015, 35 (06) :2407-2416