Computational evidence for hierarchically structured reinforcement learning in humans

被引：49

作者：

Eckstein, Maria K. ^{[1
]}

Collins, Anne G. E. ^{[1
]}

机构：

[1] Univ Calif Berkeley, Dept Psychol, Berkeley, CA 94704 USA

来源：

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA | 2020年 / 117卷 / 47期

关键词：

computational modeling; reinforcement learning; hierarchy; structure learning; task-sets; COGNITIVE CONTROL; PREDICTION ERROR; BASAL GANGLIA; REPRESENTATION; ORGANIZATION; FOUNDATIONS; STATISTICS; MECHANISMS; ATTENTION; BEHAVIOR;

D O I：

10.1073/pnas.1912330117

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Humans have the fascinating ability to achieve goals in a complex and constantly changing world, still surpassing modern machine learning algorithms in terms of flexibility and learning speed. It is generally accepted that a crucial factor for this ability is the use of abstract, hierarchical representations, which employ structure in the environment to guide learning and decision making. Nevertheless, how we create and use these hierarchical representations is poorly understood. This study presents evidence that human behavior can be characterized as hierarchical reinforcement learning (RL). We designed an experiment to test specific predictions of hierarchical RL using a series of subtasks in the realm of context-based learning and observed several behavioral markers of hierarchical RL, such as asymmetric switch costs between changes in higher-level versus lower-level features, faster learning in higher-valued compared to lower-valued contexts, and preference for higher-valued compared to lower-valued contexts. We replicated these results across three independent samples. We simulated three models-a classic RL, a hierarchical RL, and a hierarchical Bayesian model-and compared their behavior to human results. While the flat RL model captured some aspects of participants' sensitivity to outcome values, and the hierarchical Bayesian model captured some markers of transfer, only hierarchical RL accounted for all patterns observed in human behavior. This work shows that hierarchical RL, a biologically inspired and computationally simple algorithm, can capture human behavior in complex, hierarchical environments and opens the avenue for future research in this field.

引用

页码：29381 / 29389

页数：9

共 60 条

[1] Prediction error as a linear function of reward probability is coded in human nucleus accumbens [J].

Abler, Birgit ;

Walter, Henrik ;

Erk, Susanne ;

Kammerer, Hannes ;

Spitzer, Manfred .

NEUROIMAGE, 2006, 31 (02) :790-795

[2] PARALLEL ORGANIZATION OF FUNCTIONALLY SEGREGATED CIRCUITS LINKING BASAL GANGLIA AND CORTEX [J].

ALEXANDER, GE ;

DELONG, MR ;

STRICK, PL .

ANNUAL REVIEW OF NEUROSCIENCE, 1986, 9 :357-381

[3] Hierarchical Error Representation: A Computational Model of Anterior Cingulate and Dorsolateral Prefrontal Cortex [J].

Alexander, William H. ;

Brown, Joshua W. .

NEURAL COMPUTATION, 2015, 27 (11) :2354-2410

[4]

[Anonymous], 2015, Reinforcement Learning: An Introduction

[5] Cognitive control, hierarchy, and the rostro-caudal organization of the frontal lobes [J].

Badre, David .

TRENDS IN COGNITIVE SCIENCES, 2008, 12 (05) :193-200

[6] Mechanisms of Hierarchical Reinforcement Learning in Cortico-Striatal Circuits 2: Evidence from fMRI [J].

Badre, David ;

Frank, Michael J. .

CEREBRAL CORTEX, 2012, 22 (03) :527-536

[7] Is the rostro-caudal axis of the frontal lobe hierarchical? [J].

Badre, David ;

D'Esposito, Mark .

NATURE REVIEWS NEUROSCIENCE, 2009, 10 (09) :659-669

[8] Hierarchical control of goal-directed action in the cortical-basal ganglia network [J].

Balleine, Bernard W. ;

Dezfouli, Amir ;

Ito, Makato ;

Doya, Kenji .

CURRENT OPINION IN BEHAVIORAL SCIENCES, 2015, 5 :1-7

[9] Midbrain dopamine neurons encode a quantitative reward prediction error signal [J].

Bayer, HM ;

Glimcher, PW .

NEURON, 2005, 47 (01) :129-141

[10] Reinforcement learning, efficient coding, and the statistics of natural tasks [J].

Botvinick, Matthew ;

Weinstein, Ari ;

Solway, Alec ;

Barto, Andrew .

CURRENT OPINION IN BEHAVIORAL SCIENCES, 2015, 5 :71-77

← 1 2 3 4 5 6 →