When, What, and How Much to Reward in Reinforcement Learning-Based Models of Cognition

被引：26

作者：

Janssen, Christian P. ^{[1
,2
,3
]}

Gray, Wayne D. ^{[2
]}

机构：

[1] UCL, UCL Interact Ctr, London WC1E 6BT, England

[2] Rensselaer Polytech Inst, Dept Cognit Sci, Troy, NY 12181 USA

[3] Univ Groningen, Dept Artificial Intelligence, NL-9700 AB Groningen, Netherlands

来源：

COGNITIVE SCIENCE | 2012年 / 36卷 / 02期

关键词：

Reinforcement learning; Choice; Strategy selection; Adaptive behavior; Expected utility; Expected value; Cognitive architecture; Skill acquisition and learning; MANIPULATING INFORMATION ACCESS; INTERACTIVE BEHAVIOR; SOFT CONSTRAINTS; RECURRENT CHOICE; TASK; STRATEGIES; ENVIRONMENT; ADAPTATION; MEMORY; ALLOCATION;

D O I：

10.1111/j.1551-6709.2011.01222.x

中图分类号：

B84 [心理学];

学科分类号：

04 ; 0402 ;

摘要：

Reinforcement learning approaches to cognitive modeling represent task acquisition as learning to choose the sequence of steps that accomplishes the task while maximizing a reward. However, an apparently unrecognized problem for modelers is choosing when, what, and how much to reward; that is, when (the moment: end of trial, subtask, or some other interval of task performance), what (the objective function: e.g., performance time or performance accuracy), and how much (the magnitude: with binary, categorical, or continuous values). In this article, we explore the problem space of these three parameters in the context of a task whose completion entails some combination of 36 stateaction pairs, where all intermediate states (i.e., after the initial state and prior to the end state) represent progressive but partial completion of the task. Different choices produce profoundly different learning paths and outcomes, with the strongest effect for moment. Unfortunately, there is little discussion in the literature of the effect of such choices. This absence is disappointing, as the choice of when, what, and how much needs to be made by a modeler for every learning model.

引用

页码：333 / 358

页数：26

共 52 条

[1] Comparison of Decision Learning Models Using the Generalization Criterion Method [J].

Ahn, Woo-Young ;

Busemeyer, Jerome R. ;

Wagenmakers, Eric-Jan ;

Stout, Julie C. .

COGNITIVE SCIENCE, 2008, 32 (08) :1376-1402

[2]

Anderson J.R., 1998, The Atomic Components of Thought

[3] REFLECTIONS OF THE ENVIRONMENT IN MEMORY [J].

ANDERSON, JR ;

SCHOOLER, LJ .

PSYCHOLOGICAL SCIENCE, 1991, 2 (06) :396-408

[4] An integrated theory of the mind [J].

Anderson, JR ;

Bothell, D ;

Byrne, MD ;

Douglass, S ;

Lebiere, C ;

Qin, YL .

PSYCHOLOGICAL REVIEW, 2004, 111 (04) :1036-1060

[5]

[Anonymous], 2007, CAN HUMAN MIND OCCUR, DOI DOI 10.1093/ACPROF:OSO/9780195324259.001.0001

[6]

[Anonymous], 1993, Rules of the Mind

[7]

[Anonymous], 2009, P ANN C COGN SCI SOC

[8]

BALLARD D., 2007, INTEGRATED MODELS CO, P283

[9] MEMORY REPRESENTATIONS IN NATURAL TASKS [J].

BALLARD, DH ;

HAYHOE, MM ;

PELZ, JB .

JOURNAL OF COGNITIVE NEUROSCIENCE, 1995, 7 (01) :66-80

[10] Deictic codes for the embodiment of cognition [J].

Ballard, DH ;

Hayhoe, MM ;

Pook, PK ;

Rao, RPN .

BEHAVIORAL AND BRAIN SCIENCES, 1997, 20 (04) :723-+

← 1 2 3 4 5 6 →