Mastering Atari, Go, chess and shogi by planning with a learned model

被引：902

作者：

Schrittwieser, Julian ^{[1
]}

Antonoglou, Ioannis ^{[1
,2
]}

Hubert, Thomas ^{[1
]}

Simonyan, Karen ^{[1
]}

Sifre, Laurent ^{[1
]}

Schmitt, Simon ^{[1
]}

Guez, Arthur ^{[1
]}

Lockhart, Edward ^{[1
]}

Hassabis, Demis ^{[1
]}

Graepel, Thore ^{[1
,2
]}

Lillicrap, Timothy ^{[1
]}

Silver, David ^{[1
,2
]}

机构：

[1] DeepMind, London, England

[2] UCL, London, England

来源：

NATURE | 2020年 / 588卷 / 7839期

关键词：

DEEP NEURAL-NETWORKS; LEVEL; ENVIRONMENT; GAME;

D O I：

10.1038/s41586-020-03051-4

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess(1) and Go(2), where a perfect simulator is available. However, in real-world problems, the dynamics governing the environment are often complex and unknown. Here we present the MuZero algorithm, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. The MuZero algorithm learns an iterable model that produces predictions relevant to planning: the action-selection policy, the value function and the reward. When evaluated on 57 different Atari games(3)-the canonical video game environment for testing artificial intelligence techniques, in which model-based planning approaches have historically struggled(4)-the MuZero algorithm achieved state-of-the-art performance. When evaluated on Go, chess and shogi-canonical environments for high-performance planning-the MuZero algorithm matched, without any knowledge of the game dynamics, the superhuman performance of the AlphaZero algorithm(5) that was supplied with the rules of the game.

引用

页码：604 / +

页数：9

共 52 条

[1]

[Anonymous], 2018, ADV NEURAL INFORM PR

[2]

[Anonymous], 2011, ICML

[3]

[Anonymous], 2015, ADV NEURAL INFORM PR

[4]

Azizzadenesheli K, 2018, PREPRINT