Optimizing agent behavior over long time scales by transporting value

被引:43
作者
Hung, Chia-Chun [1 ]
Lillicrap, Timothy [1 ]
Abramson, Josh [1 ]
Wu, Yan [1 ]
Mirza, Mehdi [1 ]
Carnevale, Federico [1 ]
Ahuja, Arun [1 ]
Wayne, Greg [1 ]
机构
[1] DeepMind, 5 New St Sq, London EC4A 3TW, England
关键词
REINFORCEMENT; FUTURE;
D O I
10.1038/s41467-019-13073-w
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Humans prolifically engage in mental time travel. We dwell on past actions and experience satisfaction or regret. More than storytelling, these recollections change how we act in the future and endow us with a computationally important ability to link actions and consequences across spans of time, which helps address the problem of long-term credit assignment: the question of how to evaluate the utility of actions within a long-duration behavioral sequence. Existing approaches to credit assignment in AI cannot solve tasks with long delays between actions and consequences. Here, we introduce a paradigm where agents use recall of specific memories to credit past actions, allowing them to solve problems that are intractable for existing algorithms. This paradigm broadens the scope of problems that can be investigated in AI and offers a mechanistic account of behaviors that may inspire models in neuroscience, psychology, and behavioral economics.
引用
收藏
页数:12
相关论文
共 51 条
[1]  
[Anonymous], INTRAHOUSEHOLD RESOU
[2]  
[Anonymous], 2016, MODEL FREE EPISODIC
[3]  
[Anonymous], PROC CVPR IEEE
[4]  
[Anonymous], 2016, PROC INT C MACH LEAR
[5]  
[Anonymous], 2014, The recursive mind: The origins of human language, thought, and civilization
[6]  
[Anonymous], 2013, THESIS
[7]  
[Anonymous], 2018, REINFORCEMENT LEARNI
[8]  
[Anonymous], 2014, NEURAL TURING MACHIN
[9]  
[Anonymous], ADV NEURAL INFORM PR
[10]  
[Anonymous], 2008, INT C MACHINE LEARNI