Reinforcement Learning, Fast and Slow

被引:378
作者
Botvinick, Matthew [1 ,2 ]
Ritter, Sam [1 ,3 ]
Wang, Jane X. [1 ]
Kurth-Nelson, Zeb [1 ,2 ]
Blundell, Charles [1 ]
Hassabis, Demis [1 ,2 ]
机构
[1] DeepMind, London, England
[2] UCL, London, England
[3] Princeton Univ, Princeton, NJ 08544 USA
关键词
PREFRONTAL CORTEX; NEURAL-NETWORKS; MODEL; PREDICTION; SYSTEMS; REWARD; DOPAMINE; MEMORY; BRAIN; GO;
D O I
10.1016/j.tics.2019.02.006
中图分类号
B84 [心理学]; C [社会科学总论]; Q98 [人类学];
学科分类号
03 ; 0303 ; 030303 ; 04 ; 0402 ;
摘要
Deep reinforcement learning (RL) methods have driven impressive advances in artificial intelligence in recent years, exceeding human performance in domains ranging from Atari to Go to no-limit poker. This progress has drawn the attention of cognitive scientists interested in understanding human learning. However, the concern has been raised that deep RL may be too sample-inefficient that is, it may simply be too slow - to provide a plausible model of how humans learn. In the present review, we counter this critique by describing recently developed techniques that allow deep RL to operate more nimbly, solving problems much more quickly than previous methods. Although these techniques were developed in an AI context, we propose that they may have rich implications for psychology and neuroscience. A key insight, arising from these AI methods, concerns the fundamental connection between fast RL and slower, more incremental forms of learning.
引用
收藏
页码:408 / 422
页数:15
相关论文
共 104 条
[1]  
Andrychowicz M, 2016, ADV NEUR IN, V29
[2]  
[Anonymous], 1989, EXPLORATIONS PARALLE
[3]  
[Anonymous], 5 INT C LEARN REPR
[4]  
[Anonymous], 160805343 ARXIV
[5]  
[Anonymous], 2016, 161102779 ARXIV
[6]  
[Anonymous], 2014, 14090473 ARXIV
[7]  
[Anonymous], 360537 BIORXIV
[8]  
[Anonymous], 161105763 ARXIV
[9]  
[Anonymous], 2015, Nature, DOI [10.1038/nature14539, DOI 10.1038/NATURE14539]
[10]  
[Anonymous], SCIENCE