The Application of AlphaZero to Wargaming

被引:5
作者
Moy, Glennn [1 ]
Shekh, Slava [1 ]
机构
[1] Def Sci & Technol Grp, Edinburgh, Australia
来源
AI 2019: ADVANCES IN ARTIFICIAL INTELLIGENCE | 2019年 / 11919卷
关键词
Wargaming; Deep reinforcement learning; AlphaZero; GAME;
D O I
10.1007/978-3-030-35288-2_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we explore the process of automatically learning to play wargames using AlphaZero deep reinforcement learning. We consider a simple wargame, Coral Sea, which is a turn-based game played on a hexagonal grid between two players. We explore the differences between Coral Sea and traditional board games, where the successful use of AlphaZero has been demonstrated. Key differences include: problem representation, wargame asymmetry, limited strategic depth, and the requirement for significant hardware resources. We demonstrate how bootstrapping AlphaZero with supervised learning can overcome these challenges. In the context of Coral Sea, this enables AlphaZero to learn optimal play and outperform the supervised examples on which it was trained.
引用
收藏
页码:3 / 14
页数:12
相关论文
共 18 条
  • [1] [Anonymous], MASA SWORD
  • [2] [Anonymous], ALPHASTAR MASTERING
  • [3] [Anonymous], STOCKFISH CONTINUES
  • [4] Australian Defence Force, 2016, JOINT MIL APPR PROC
  • [5] Reinforcement Learning, Fast and Slow
    Botvinick, Matthew
    Ritter, Sam
    Wang, Jane X.
    Kurth-Nelson, Zeb
    Blundell, Charles
    Hassabis, Demis
    [J]. TRENDS IN COGNITIVE SCIENCES, 2019, 23 (05) : 408 - 422
  • [6] A COMPARISON OF MINIMAX TREE-SEARCH ALGORITHMS
    CAMPBELL, MS
    MARSLAND, TA
    [J]. ARTIFICIAL INTELLIGENCE, 1983, 20 (04) : 347 - 367
  • [7] Edwards Steven J., 1994, Portable game notation specification and implementation guide
  • [8] Genesereth M, 2005, AI MAG, V26, P62
  • [9] Imitation Learning: A Survey of Learning Methods
    Hussein, Ahmed
    Gaber, Mohamed Medhat
    Elyan, Eyad
    Jayne, Chrisina
    [J]. ACM COMPUTING SURVEYS, 2017, 50 (02)
  • [10] Bandit based Monte-Carlo planning
    Kocsis, Levente
    Szepesvari, Csaba
    [J]. MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 282 - 293