The Application of AlphaZero to Wargaming

被引：5

作者：

Moy, Glennn ^{[1
]}

Shekh, Slava ^{[1
]}

机构：

[1] Def Sci & Technol Grp, Edinburgh, Australia

来源：

AI 2019: ADVANCES IN ARTIFICIAL INTELLIGENCE | 2019年 / 11919卷

关键词：

Wargaming; Deep reinforcement learning; AlphaZero; GAME;

D O I：

10.1007/978-3-030-35288-2_1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we explore the process of automatically learning to play wargames using AlphaZero deep reinforcement learning. We consider a simple wargame, Coral Sea, which is a turn-based game played on a hexagonal grid between two players. We explore the differences between Coral Sea and traditional board games, where the successful use of AlphaZero has been demonstrated. Key differences include: problem representation, wargame asymmetry, limited strategic depth, and the requirement for significant hardware resources. We demonstrate how bootstrapping AlphaZero with supervised learning can overcome these challenges. In the context of Coral Sea, this enables AlphaZero to learn optimal play and outperform the supervised examples on which it was trained.

引用

页码：3 / 14

页数：12

共 18 条

[1] [Anonymous], MASA SWORD
[2] [Anonymous], ALPHASTAR MASTERING
[3] [Anonymous], STOCKFISH CONTINUES
[4] Australian Defence Force, 2016, JOINT MIL APPR PROC
[5] Reinforcement Learning, Fast and Slow
Botvinick, Matthew
Ritter, Sam
Wang, Jane X.
Kurth-Nelson, Zeb
Blundell, Charles
Hassabis, Demis
[J]. TRENDS IN COGNITIVE SCIENCES, 2019, 23 (05) : 408 - 422
[6] A COMPARISON OF MINIMAX TREE-SEARCH ALGORITHMS
CAMPBELL, MS
MARSLAND, TA
[J]. ARTIFICIAL INTELLIGENCE, 1983, 20 (04) : 347 - 367
[7] Edwards Steven J., 1994, Portable game notation specification and implementation guide
[8] Genesereth M, 2005, AI MAG, V26, P62
[9] Imitation Learning: A Survey of Learning Methods
Hussein, Ahmed
Gaber, Mohamed Medhat
Elyan, Eyad
Jayne, Chrisina
[J]. ACM COMPUTING SURVEYS, 2017, 50 (02)
[10] Bandit based Monte-Carlo planning
Kocsis, Levente
Szepesvari, Csaba
[J]. MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 282 - 293

← 1 2 →