Mastering Atari, Go, chess and shogi by planning with a learned model

被引:902
作者
Schrittwieser, Julian [1 ]
Antonoglou, Ioannis [1 ,2 ]
Hubert, Thomas [1 ]
Simonyan, Karen [1 ]
Sifre, Laurent [1 ]
Schmitt, Simon [1 ]
Guez, Arthur [1 ]
Lockhart, Edward [1 ]
Hassabis, Demis [1 ]
Graepel, Thore [1 ,2 ]
Lillicrap, Timothy [1 ]
Silver, David [1 ,2 ]
机构
[1] DeepMind, London, England
[2] UCL, London, England
关键词
DEEP NEURAL-NETWORKS; LEVEL; ENVIRONMENT; GAME;
D O I
10.1038/s41586-020-03051-4
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess(1) and Go(2), where a perfect simulator is available. However, in real-world problems, the dynamics governing the environment are often complex and unknown. Here we present the MuZero algorithm, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. The MuZero algorithm learns an iterable model that produces predictions relevant to planning: the action-selection policy, the value function and the reward. When evaluated on 57 different Atari games(3)-the canonical video game environment for testing artificial intelligence techniques, in which model-based planning approaches have historically struggled(4)-the MuZero algorithm achieved state-of-the-art performance. When evaluated on Go, chess and shogi-canonical environments for high-performance planning-the MuZero algorithm matched, without any knowledge of the game dynamics, the superhuman performance of the AlphaZero algorithm(5) that was supplied with the rules of the game.
引用
收藏
页码:604 / +
页数:9
相关论文
共 52 条
[1]  
[Anonymous], 2018, ADV NEURAL INFORM PR
[2]  
[Anonymous], 2011, ICML
[3]  
[Anonymous], 2015, ADV NEURAL INFORM PR
[4]  
Azizzadenesheli K, 2018, PREPRINT
[5]   The Arcade Learning Environment: An Evaluation Platform for General Agents [J].
Bellemare, Marc G. ;
Naddaf, Yavar ;
Veness, Joel ;
Bowling, Michael .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2013, 47 :253-279
[6]   Superhuman AI for heads-up no-limit poker: Libratus beats top professionals [J].
Brown, Noam ;
Sandholm, Tuomas .
SCIENCE, 2018, 359 (6374) :418-+
[7]  
Buesing L, 2018, PREPRINT
[8]   Deep blue [J].
Campbell, M ;
Hoane, AJ ;
Hsu, FH .
ARTIFICIAL INTELLIGENCE, 2002, 134 (1-2) :57-83
[9]  
Cloud TPU, 2019, GOOGL CLOUD
[10]  
Coulom R, 2007, LECT NOTES COMPUT SC, V4630, P72