Model-based and model-free Pavlovian reward learning: Revaluation, revision, and revelation

被引:245
作者
Dayan, Peter [1 ]
Berridge, Kent C. [2 ]
机构
[1] UCL, Gatsby Computat Neurosci Unit, London, England
[2] Univ Michigan, Dept Psychol, Ann Arbor, MI USA
关键词
Pavlovian; classical conditioning; Decision making; Basal ganglia; Dopamine; Reward; Motivation; VENTRAL TEGMENTAL AREA; INCENTIVE-SENSITIZATION THEORY; MEDIAL PREFRONTAL CORTEX; DOPAMINE NEURONS ENCODE; GOAL-DIRECTED BEHAVIOR; OUTCOME-SPECIFIC FORMS; SPATIAL DECISION TASK; INSTRUMENTAL TRANSFER; NUCLEUS-ACCUMBENS; UNCONDITIONED STIMULUS;
D O I
10.3758/s13415-014-0277-8
中图分类号
B84 [心理学]; C [社会科学总论]; Q98 [人类学];
学科分类号
03 ; 0303 ; 030303 ; 04 ; 0402 ;
摘要
Evidence supports at least two methods for learning about reward and punishment and making predictions for guiding actions. One method, called model-free, progressively acquires cached estimates of the long-run values of circumstances and actions from retrospective experience. The other method, called model-based, uses representations of the environment, expectations, and prospective calculations to make cognitive predictions of future value. Extensive attention has been paid to both methods in computational analyses of instrumental learning. By contrast, although a full computational analysis has been lacking, Pavlovian learning and prediction has typically been presumed to be solely model-free. Here, we revise that presumption and review compelling evidence from Pavlovian revaluation experiments showing that Pavlovian predictions can involve their own form of model-based evaluation. In model-based Pavlovian evaluation, prevailing states of the body and brain influence value computations, and thereby produce powerful incentive motivations that can sometimes be quite new. We consider the consequences of this revised Pavlovian view for the computational landscape of prediction, response, and choice. We also revisit differences between Pavlovian and instrumental learning in the control of incentive motivation.
引用
收藏
页码:473 / 492
页数:20
相关论文
共 184 条
[1]  
[Anonymous], 1994, TECHNICAL REPORT
[2]  
[Anonymous], 1941, Conditioned reflexes and psychiatry
[3]  
[Anonymous], 1982, Visual perception
[4]   SOURCES OF REINFORCEMENT IN ESTABLISHMENT OF SELF-PUNITIVE BEHAVIOR [J].
ANSON, JE ;
BENDER, L ;
MELVIN, KB .
JOURNAL OF COMPARATIVE AND PHYSIOLOGICAL PSYCHOLOGY, 1969, 67 (03) :376-&
[5]  
BALLEINE B, 1991, Q J EXP PSYCHOL-B, V43, P279
[6]  
BALLEINE B, 1994, Q J EXP PSYCHOL-B, V47, P211
[7]   Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action [J].
Balleine, Bernard W. ;
O'Doherty, John P. .
NEUROPSYCHOPHARMACOLOGY, 2010, 35 (01) :48-69
[8]   MOTIVATIONAL CONTROL OF HETEROGENEOUS INSTRUMENTAL CHAINS [J].
BALLEINE, BW ;
GARNER, C ;
GONZALEZ, F ;
DICKINSON, A .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-ANIMAL BEHAVIOR PROCESSES, 1995, 21 (03) :203-217
[9]   Neural bases of food-seeking: Affect, arousal and reward in corticostriatolimbic circuits [J].
Balleine, BW .
PHYSIOLOGY & BEHAVIOR, 2005, 86 (05) :717-730
[10]   Online evaluation of novel choices by simultaneous representation of multiple memories [J].
Barron, Helen C. ;
Dolan, Raymond J. ;
Behrens, Timothy E. J. .
NATURE NEUROSCIENCE, 2013, 16 (10) :1492-+