Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework

被引:259
作者
Gershman, Samuel J. [1 ,2 ]
Daw, Nathaniel D. [3 ,4 ]
机构
[1] Harvard Univ, Dept Psychol, Cambridge, MA 02138 USA
[2] Harvard Univ, Ctr Brain Sci, Cambridge, MA 02138 USA
[3] Princeton Univ, Princeton Neurosci Inst, Princeton, NJ 08544 USA
[4] Princeton Univ, Dept Psychol, Princeton, NJ 08544 USA
来源
ANNUAL REVIEW OF PSYCHOLOGY, VOL 68 | 2017年 / 68卷
关键词
reinforcement learning; memory; decision making; PREDICTION ERRORS; WORKING-MEMORY; ORBITOFRONTAL CORTEX; DOPAMINE NEURONS; STRIATAL SYSTEMS; MODEL; REWARD; DECISION; HIPPOCAMPUS; CHOICE;
D O I
10.1146/annurev-psych-122414-033625
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
We review the psychology and neuroscience of reinforcement learning (RL), which has experienced significant progress in the past two decades, enabled by the comprehensive experimental study of simple learning and decision-making tasks. However, one challenge in the study of RL is computational: The simplicity of these tasks ignores important aspects of reinforcement learning in the real world: (a) State spaces are high-dimensional, continuous, and partially observable; this implies that (b) data are relatively sparse and, indeed, precisely the same situation may never be encountered twice; furthermore, (c) rewards depend on the long-term consequences of actions in ways that violate the classical assumptions that make RL tractable. A seemingly distinct challenge is that, cognitively, theories of RL have largely involved procedural and semantic memory, the way in which knowledge about action values or world models extracted gradually from many experiences can drive choice. This focus on semantic memory leaves out many aspects of memory, such as episodic memory, related to the traces of individual events. We suggest that these two challenges are related. The computational challenge can be dealt with, in part, by endowing RL systems with episodic memory, allowing them to (a) efficiently approximate value functions over complex state spaces, (b) learn with very little data, and (c) bridge long-term dependencies between actions and rewards. We review the computational theory underlying this proposal and the empirical evidence to support it. Our proposal suggests that the ubiquitous and diverse roles of memory in RL may function as part of an integrated learning system.
引用
收藏
页码:101 / 128
页数:28
相关论文
共 129 条
  • [1] VARIATIONS IN THE SENSITIVITY OF INSTRUMENTAL RESPONDING TO REINFORCER DEVALUATION
    ADAMS, CD
    [J]. QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY SECTION B-COMPARATIVE AND PHYSIOLOGICAL PSYCHOLOGY, 1982, 34 (MAY): : 77 - 98
  • [2] FUNCTIONAL ARCHITECTURE OF BASAL GANGLIA CIRCUITS - NEURAL SUBSTRATES OF PARALLEL PROCESSING
    ALEXANDER, GE
    CRUTCHER, MD
    [J]. TRENDS IN NEUROSCIENCES, 1990, 13 (07) : 266 - 271
  • [3] [Anonymous], 2005, P 22 INT C MACH LEAR, DOI DOI 10.1145/1102351.1102377
  • [4] [Anonymous], 2002, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
  • [5] [Anonymous], 2000, P 13 ANN C COMPUTATI
  • [6] Small feedback-based decisions and their limited correspondence to description-based decisions
    Barron, G
    Erev, I
    [J]. JOURNAL OF BEHAVIORAL DECISION MAKING, 2003, 16 (03) : 215 - 233
  • [7] Bellman R. E., 1957, Dynamic programming. Princeton landmarks in mathematics
  • [8] Bertsekas D. P., 1996, NEURODYNAMIC PROGRAM
  • [9] Bettman J.R., 1979, An Information Processing Theory of Consumer Choice
  • [10] Learning, risk attitude and hot stoves in restless bandit problems
    Biele, Guido
    Erev, Ido
    Ert, Eyal
    [J]. JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2009, 53 (03) : 155 - 167