Proximal Policy Optimization with Elo-based Opponent Selection and Combination with Enhanced Rolling Horizon Evolution Algorithm

被引:2
作者
Liang, Rongqin [1 ]
Zhu, Yuanheng [2 ]
Tang, Zhentao [2 ]
Yang, Mu [3 ]
Zhu, Xiaolong [3 ]
机构
[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[2] Chinese Acad Sci, Univ Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[3] Parametrix Ai, Dept Game AI Res, Shenzhen, Peoples R China
来源
2021 IEEE CONFERENCE ON GAMES (COG) | 2021年
关键词
game AI; PPO; deep reinforcement learning; FightingICE; opponent selection;
D O I
10.1109/COG52621.2021.9619146
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Two-player zero-sum video game is a basic and important problem in game artificial intelligence. In 2020, enhanced rolling horizon evolution algorithm with policy gradient (ERHEAPI) beat heuristics, Monte-Carlo tree search and other methods to win the championship of Fighting Game Artificial Intelligence Competition (FTGAIC). However, the performance of ERHEAPI in the first round was not good. In this paper, we present an effective method noted as ERHEAPPO that combines proximal policy optimization (PPO) and enhanced rolling horizon evolution algorithm (ERHEA) with opponent model learning to further improve performance. We train the PPO agent and find that the Elo-based opponent selection can improve the sample efficiency. We compare the performance of the proposed ERHEAPPO with ERHEAPI. The experimental results demonstrate the effectiveness of ERHEAPPO.
引用
收藏
页码:1024 / 1027
页数:4
相关论文
共 12 条
[1]  
Feiyu Lu, 2013, 2013 IEEE 2nd Global Conference on Consumer Electronics (GCCE), P320, DOI 10.1109/GCCE.2013.6664844
[2]  
Gaina Raluca D., 2017, 2017 IEEE Conference on Computational Intelligence and Games (CIG), P88, DOI 10.1109/CIG.2017.8080420
[3]  
He H, 2016, PR MACH LEARN RES, V48
[4]   Human-level control through deep reinforcement learning [J].
Mnih, Volodymyr ;
Kavukcuoglu, Koray ;
Silver, David ;
Rusu, Andrei A. ;
Veness, Joel ;
Bellemare, Marc G. ;
Graves, Alex ;
Riedmiller, Martin ;
Fidjeland, Andreas K. ;
Ostrovski, Georg ;
Petersen, Stig ;
Beattie, Charles ;
Sadik, Amir ;
Antonoglou, Ioannis ;
King, Helen ;
Kumaran, Dharshan ;
Wierstra, Daan ;
Legg, Shane ;
Hassabis, Demis .
NATURE, 2015, 518 (7540) :529-533
[5]  
Perez-Liebana D, 2016, 30 AAAI C ARTIFICIAL
[6]   Mastering Atari, Go, chess and shogi by planning with a learned model [J].
Schrittwieser, Julian ;
Antonoglou, Ioannis ;
Hubert, Thomas ;
Simonyan, Karen ;
Sifre, Laurent ;
Schmitt, Simon ;
Guez, Arthur ;
Lockhart, Edward ;
Hassabis, Demis ;
Graepel, Thore ;
Lillicrap, Timothy ;
Silver, David .
NATURE, 2020, 588 (7839) :604-+
[7]  
Schulman J, 2017, Arxiv, DOI [arXiv:1707.06347, DOI 10.48550/ARXIV.1707.06347]
[8]   StarCraft Micromanagement With Reinforcement Learning and Curriculum Transfer Learning [J].
Shao, Kun ;
Zhu, Yuanheng ;
Zha, Dongbin .
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2019, 3 (01) :73-84
[9]   Recent progress of deep reinforcement learning: from AlphaGo to AlphaGo Zero [J].
Tang Z.-T. ;
Shao K. ;
Zhao D.-B. ;
Zhu Y.-H. .
Kongzhi Lilun Yu Yingyong/Control Theory and Applications, 2017, 34 (12) :1529-1546
[10]   Enhanced Rolling Horizon Evolution Algorithm With Opponent Model Learning: Results for the Fighting Game AI Competition [J].
Tang, Zhentao ;
Zhu, Yuanheng ;
Zhao, Dongbin ;
Lucas, Simon M. .
IEEE TRANSACTIONS ON GAMES, 2023, 15 (01) :5-15