Credit assignment to state-independent task representations and its relationship with model-based decision making

被引:34
作者
Shahar, Nitzan [1 ,2 ]
Moran, Rani [1 ,2 ]
Hauser, Tobias U. [1 ,2 ]
Kievit, Rogier A. [2 ,3 ]
McNamee, Daniel [1 ,2 ]
Moutoussis, Michael [1 ,2 ]
Dolan, Raymond J. [1 ,2 ]
机构
[1] UCL, Wellcome Ctr Human Neuroimaging, London WC1N 3BG, England
[2] Max Planck Univ Coll London, Ctr Computat Psychiat & Res, Dept Imaging Neurosci, London WC1B 5EH, England
[3] Univ Cambridge, MRC, Cognit & Brain Sci Unit, Cambridge CB2 7EF, England
基金
英国惠康基金;
关键词
reinforcement learning; decision making; motor learning; FRONTAL-CORTEX; STIMULUS-VALUE; CHOICES; HUMANS;
D O I
10.1073/pnas.1821647116
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Model-free learning enables an agent to make better decisions based on prior experience while representing only minimal knowledge about an environment's structure. It is generally assumed that model-free state representations are based on outcome-relevant features of the environment. Here, we challenge this assumption by providing evidence that a putative model-free system assigns credit to task representations that are irrelevant to an outcome. We examined data from 769 individuals performing a well-described 2-step reward decision task where stimulus identity but not spatial-motor aspects of the task predicted reward. We show that participants assigned value to spatial-motor representations despite it being outcome irrelevant. Strikingly, spatial-motor value associations affected behavior across all outcome-relevant features and stages of the task, consistent with credit assignment to low-level state-independent task representations. Individual difference analyses suggested that the impact of spatial-motor value formation was attenuated for individuals who showed greater deployment of goal-directed (model-based) strategies. Our findings highlight a need for a reconsideration of how model-free representations are formed and regulated according to the structure of the environment.
引用
收藏
页码:15871 / 15876
页数:6
相关论文
共 34 条
  • [31] Dorsal striatum is necessary for stimulus-value but not action-value learning in humans
    Vo, Khoi
    Rutledge, Robb B.
    Chatterjee, Anjan
    Kable, Joseph W.
    [J]. BRAIN, 2014, 137 : 3129 - 3135
  • [32] Disorders of compulsivity: a common bias towards learning habits
    Voon, V.
    Derbyshire, K.
    Ruck, C.
    Irvine, M. A.
    Worbe, Y.
    Enander, J.
    Schreiber, L. R. N.
    Gillan, C.
    Fineberg, N. A.
    Sahakian, B. J.
    Robbins, T. W.
    Harrison, N. A.
    Wood, J.
    Daw, N. D.
    Dayan, P.
    Grant, J. E.
    Bullmore, E. T.
    [J]. MOLECULAR PSYCHIATRY, 2015, 20 (03) : 345 - 352
  • [33] Neural computations underlying action-based decision making in the human brain
    Wunderlich, Klaus
    Rangel, Antonio
    O'Doherty, John P.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (40) : 17199 - 17204
  • [34] Coefficient Alpha: A Reliability Coefficient for the 21st Century?
    Yang, Yanyun
    Green, Samuel B.
    [J]. JOURNAL OF PSYCHOEDUCATIONAL ASSESSMENT, 2011, 29 (04) : 377 - 392