A simple computational algorithm of model-based choice preference

被引:13
作者
Toyama, Asako [1 ,2 ,3 ]
Katahira, Kentaro [1 ,2 ]
Ohira, Hideki [1 ,2 ]
机构
[1] Nagoya Univ, Grad Sch Environm Studies, Dept Psychol, Nagoya, Aichi, Japan
[2] Nagoya Univ, Grad Sch Informat, Dept Psychol, Chikusa Ku, Furo Cho, Nagoya, Aichi 4648601, Japan
[3] Japan Soc Promot Sci, Tokyo, Japan
基金
日本学术振兴会;
关键词
Computational model; Model-free; Model-based; Eligibility trace; Reinforcement learning; PREFRONTAL CORTEX; DECISION-MAKING; REINFORCEMENT; HABITS; EXPLORATION; MODULATION; PROTECTS; BEHAVIOR; STIMULI; SYSTEMS;
D O I
10.3758/s13415-017-0511-2
中图分类号
B84 [心理学]; C [社会科学总论]; Q98 [人类学];
学科分类号
03 ; 0303 ; 030303 ; 04 ; 0402 ;
摘要
A broadly used computational framework posits that two learning systems operate in parallel during the learning of choice preferences-namely, the model-free and model-based reinforcement-learning systems. In this study, we examined another possibility, through which model-free learning is the basic system and model-based information is its modulator. Accordingly, we proposed several modified versions of a temporal-difference learning model to explain the choice-learning process. Using the two-stage decision task developed by Daw, Gershman, Seymour, Dayan, and Dolan (2011), we compared their original computational model, which assumes a parallel learning process, and our proposed models, which assume a sequential learning process. Choice data from 23 participants showed a better fit with the proposed models. More specifically, the proposed eligibility adjustment model, which assumes that the environmental model can weight the degree of the eligibility trace, can explain choices better under both model-free and model-based controls and has a simpler computational algorithm than the original model. In addition, the forgetting learning model and its variation, which assume changes in the values of unchosen actions, substantially improved the fits to the data. Overall, we show that a hybrid computational model best fits the data. The parameters used in this model succeed in capturing individual tendencies with respect to both model use in learning and exploration behavior. This computational model provides novel insights into learning with interacting model-free and model-based components.
引用
收藏
页码:764 / 783
页数:20
相关论文
共 42 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]  
[Anonymous], 2010, THINKING FAST SLOW
[3]   Rostrolateral Prefrontal Cortex and Individual Differences in Uncertainty-Driven Exploration [J].
Badre, David ;
Doll, Bradley B. ;
Long, Nicole M. ;
Frank, Michael J. .
NEURON, 2012, 73 (03) :595-607
[4]   Prefrontal cortex and decision making in a mixed-strategy game [J].
Barraclough, DJ ;
Conroy, ML ;
Lee, D .
NATURE NEUROSCIENCE, 2004, 7 (04) :404-410
[5]   Multiple model-based reinforcement learning explains dopamine neuronal activity [J].
Bertin, Mathieu ;
Schweighofer, Nicolas ;
Doya, Kenji .
NEURAL NETWORKS, 2007, 20 (06) :668-675
[6]   Short-term memory traces for action bias in human reinforcement learning [J].
Bogacz, Rafal ;
McClure, Samuel M. ;
Li, Jian ;
Cohen, Jonathan D. ;
Montague, P. Read .
BRAIN RESEARCH, 2007, 1153 :111-121
[7]   Experience-weighted attraction learning in normal form games [J].
Camerer, C ;
Ho, TH .
ECONOMETRICA, 1999, 67 (04) :827-874
[8]   Beyond working memory: the role of persistent activity in decision making [J].
Curtis, Clayton E. ;
Lee, Daeyeol .
TRENDS IN COGNITIVE SCIENCES, 2010, 14 (05) :216-222
[9]   Representation and timing in theories of the dopamine system [J].
Daw, Nathaniel D. ;
Courville, Aaron C. ;
Tourtezky, David S. .
NEURAL COMPUTATION, 2006, 18 (07) :1637-1677
[10]   Model-Based Influences on Humans' Choices and Striatal Prediction Errors [J].
Daw, Nathaniel D. ;
Gershman, Samuel J. ;
Seymour, Ben ;
Dayan, Peter ;
Dolan, Raymond J. .
NEURON, 2011, 69 (06) :1204-1215