Grandmaster level in StarCraft II using multi-agent reinforcement learning

被引：2429

作者：

Vinyals, Oriol ^{[1
]}

Babuschkin, Igor ^{[1
]}

Czarnecki, Wojciech M. ^{[1
]}

Mathieu, Michael ^{[1
]}

Dudzik, Andrew ^{[1
]}

Chung, Junyoung ^{[1
]}

Choi, David H. ^{[1
]}

Powell, Richard ^{[1
]}

Ewalds, Timo ^{[1
]}

Georgiev, Petko ^{[1
]}

Oh, Junhyuk ^{[1
]}

Horgan, Dan ^{[1
]}

Kroiss, Manuel ^{[1
]}

Danihelka, Ivo ^{[1
]}

Huang, Aja ^{[1
]}

Sifre, Laurent ^{[1
]}

Cai, Trevor ^{[1
]}

Agapiou, John P. ^{[1
]}

Jaderberg, Max ^{[1
]}

Vezhnevets, Alexander S. ^{[1
]}

Leblond, Remi ^{[1
]}

Pohlen, Tobias ^{[1
]}

Dalibard, Valentin ^{[1
]}

Budden, David ^{[1
]}

Sulsky, Yury ^{[1
]}

Molloy, James ^{[1
]}

Paine, Tom L. ^{[1
]}

Gulcehre, Caglar ^{[1
]}

Wang, Ziyu ^{[1
]}

Pfaff, Tobias ^{[1
]}

Wu, Yuhuai ^{[1
]}

Ring, Roman ^{[1
]}

Yogatama, Dani ^{[1
]}

Wunsch, Dario ^{[2
]}

McKinney, Katrina ^{[1
]}

Smith, Oliver ^{[1
]}

Schaul, Tom ^{[1
]}

Lillicrap, Timothy ^{[1
]}

Kavukcuoglu, Koray ^{[1
]}

Hassabis, Demis ^{[1
]}

Apps, Chris ^{[1
]}

Silver, David ^{[1
]}

机构：

[1] DeepMind, London, England

[2] Team Liquid, Utrecht, Netherlands

来源：

NATURE | 2019年 / 575卷 / 7782期

关键词：

GO;

D O I：

10.1038/s41586-019-1724-z

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Many real-world applications require artificial agents to compete and coordinate with other agents in complex environments. As a stepping stone to this goal, the domain of StarCraft has emerged as an important challenge for artificial intelligence research, owing to its iconic and enduring status among the most difficult professional esports and its relevance to the real world in terms of its raw complexity and multi-agent challenges. Over the course of a decade and numerous competitions(1-3), the strongest agents have simplified important aspects of the game, utilized superhuman capabilities, or employed hand-crafted sub-systems(4). Despite these advantages, no previous agent has come close to matching the overall skill of top StarCraft players. We chose to address the challenge of StarCraft using general-purpose learning methods that are in principle applicable to other complex domains: a multi-agent reinforcement learning algorithm that uses data from both human and agent games within a diverse league of continually adapting strategies and counter-strategies, each represented by deep neural networks(5,6). We evaluated our agent, AlphaStar, in the full game of StarCraft II, through a series of online games against human players. AlphaStar was rated at Grandmaster level for all three StarCraft races and above 99.8% of officially ranked human players.

引用

页码：350 / +

页数：20

共 56 条

[1]

[Anonymous], ART INT INT DIG ENT

[2]

[Anonymous], 2003, P INT JOINT C ART IN

[3]

[Anonymous], EPISODIC EXPLORATION

[4]

[Anonymous], ART INT INT DIG ENT

[5]

[Anonymous], 2016, PROC INT C MACH LEAR

[6]

[Anonymous], 2015, Nature, DOI [10.1038/nature14539, DOI 10.1038/NATURE14539]

[7]

[Anonymous], 2018, FILM VISUAL REASONIN

[8]

[Anonymous], ICML 00 P 17 INT C M

[9]

[Anonymous], RATING CHESSPLAYERS

[10]

[Anonymous], 2006, Pattern Recognition and Machine Learning

← 1 2 3 4 5 6 →