Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences

被引:0
作者
Sophie Bavard
Maël Lebreton
Mehdi Khamassi
Giorgio Coricelli
Stefano Palminteri
机构
[1] Institut National de la Santé et Recherche Médicale,Laboratoire de Neurosciences Cognitives Computationnelles
[2] Ecole Normale Supérieure,Département d’Etudes Cognitives
[3] Université de Paris Sciences et Lettres,Institut d’Etudes de la Cognition
[4] University of Amsterdam,CREED lab, Amsterdam School of Economics, Faculty of Business and Economics
[5] University of Amsterdam,Amsterdam Brain and Cognition
[6] University of Geneva,Swiss Centre for Affective Sciences
[7] Centre National de la Recherche Scientifique,Institut des Systèmes Intelligents et Robotiques
[8] Sorbonne Universités,Institut des Sciences de l’Information et de leurs Interactions
[9] University of Southern California,Department of Economics
[10] Università di Trento,Centro Mente e Cervello
来源
Nature Communications | / 9卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
In economics and perceptual decision-making contextual effects are well documented, where decision weights are adjusted as a function of the distribution of stimuli. Yet, in reinforcement learning literature whether and how contextual information pertaining to decision states is integrated in learning algorithms has received comparably little attention. Here, we investigate reinforcement learning behavior and its computational substrates in a task where we orthogonally manipulate outcome valence and magnitude, resulting in systematic variations in state-values. Model comparison indicates that subjects’ behavior is best accounted for by an algorithm which includes both reference point-dependence and range-adaptation—two crucial features of state-dependent valuation. In addition, we find that state-dependent outcome valuation progressively emerges, is favored by increasing outcome information and correlated with explicit understanding of the task structure. Finally, our data clearly show that, while being locally adaptive (for instance in negative valence and small magnitude contexts), state-dependent valuation comes at the cost of seemingly irrational choices, when options are extrapolated out from their original contexts.
引用
收藏
相关论文
共 134 条
  • [1] Guitart-Masip M(2014)Action versus valence in decision making Trends Cogn. Sci. 18 194-202
  • [2] Duzel E(2014)Inferring affect from fMRI data Trends Cogn. Sci. 18 422-428
  • [3] Dolan R(2013)Losses as modulators of attention: review and analysis of the unique effects of losses over gains Psychol. Bull. 139 497-518
  • [4] Dayan P(1998)Reinforcement learning: an introduction IEEE Trans. Neural Netw. 9 1054-1054
  • [5] Knutson B(1972)A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement Class. Cond. II Curr. Res. Theory 2 64-99
  • [6] Katovich K(1997)A neural substrate of prediction and reward Science 275 1593-1599
  • [7] Suri G(2004)Dissociable roles of ventral and dorsal striatum in instrumental conditioning Science 304 452-454
  • [8] Yechiam E(2004)By carrot or by stick: cognitive reinforcement learning in parkinsonism Science 306 1940-1943
  • [9] Hochman G(2006)Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans Nature 442 1042-1045
  • [10] Sutton RS(2009)Pharmacological modulation of subliminal learning in Parkinson’s and Tourette’s syndromes Proc. Natl Acad. Sci. USA 106 19179-19184