Testing models of context-dependent outcome encoding in reinforcement learning

被引:9
作者
Hayes, William M. [1 ,2 ]
Wedell, Douglas H. [1 ]
机构
[1] Univ South Carolina, Dept Psychol, Columbia, SC 29208 USA
[2] Univ South Carolina, Dept Psychol, 1512 Pendleton St, Columbia, SC 29208 USA
关键词
Relative encoding; Decisions from experience; Range -frequency theory; Reference point dependence; Decision by sampling; DECISION; ADAPTATION; REPRESENTATIONS; NORMALIZATION; PERCEPTIONS; EXPERIENCE; JUDGMENT; RECALL; MEMORY; PRICE;
D O I
10.1016/j.cognition.2022.105280
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
Previous studies of reinforcement learning (RL) have established that choice outcomes are encoded in a contextdependent fashion. Several computational models have been proposed to explain context-dependent encoding, including reference point centering and range adaptation models. The former assumes that outcomes are centered around a running estimate of the average reward in each choice context, while the latter assumes that outcomes are compared to the minimum reward and then scaled by an estimate of the range of outcomes in each choice context. However, there are other computational mechanisms that can explain context dependence in RL. In the present study, a frequency encoding model is introduced that assumes outcomes are evaluated based on their proportional rank within a sample of recently experienced outcomes from the local context. A rangefrequency model is also considered that combines the range adaptation and frequency encoding mechanisms. We conducted two fully incentivized behavioral experiments using choice tasks for which the candidate models make divergent predictions. The results were most consistent with models that incorporate frequency or rankbased encoding. The findings from these experiments deepen our understanding of the underlying computational processes mediating context-dependent outcome encoding in human RL.
引用
收藏
页数:24
相关论文
共 73 条
[61]   Decision by sampling [J].
Stewart, Neil ;
Chater, Nick ;
Brown, Gordon D. A. .
COGNITIVE PSYCHOLOGY, 2006, 53 (01) :1-26
[62]  
Sutton RS, 2018, ADAPT COMPUT MACH LE, P1
[63]   Adaptive coding of reward value by dopamine neurons [J].
Tobler, PN ;
Fiorillo, CD ;
Schultz, W .
SCIENCE, 2005, 307 (5715) :1642-1645
[64]  
Torrance G W, 1989, Int J Technol Assess Health Care, V5, P559
[65]   Relative reward preference in primate orbitofrontal cortex [J].
Tremblay, L ;
Schultz, W .
NATURE, 1999, 398 (6729) :704-708
[66]   Being paid relatively well most of the time: Negatively skewed payments are more satisfying [J].
Tripp, James ;
Brown, Gordon D. A. .
MEMORY & COGNITION, 2016, 44 (06) :966-973
[67]   The Price of Pain and the Value of Suffering [J].
Vlaev, Ivo ;
Seymour, Ben ;
Dolan, Raymond J. ;
Chater, Nick .
PSYCHOLOGICAL SCIENCE, 2009, 20 (03) :309-317
[68]  
Volkmann John., 1951, Social Psychology at the Crossroads
[69]   THE CATEGORY EFFECT IN SOCIAL JUDGMENT - EXPERIMENTAL RATINGS OF HAPPINESS [J].
WEDELL, DH ;
PARDUCCI, A .
JOURNAL OF PERSONALITY AND SOCIAL PSYCHOLOGY, 1988, 55 (03) :341-356
[70]   A constructive-associative model of the contextual dependence of unidimensional similarity [J].
Wedell, DH .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 1996, 22 (03) :634-661