Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play

被引:102
作者
Sun, Zhixiao [1 ]
Piao, Haiyin [1 ]
Yang, Zhen [1 ]
Zhao, Yiyang [1 ]
Zhan, Guang [1 ]
Zhou, Deyun [1 ]
Meng, Guanglei [2 ]
Chen, Hechang [3 ]
Chen, Xing [3 ]
Qu, Bohao [3 ]
Lu, Yuanjie [4 ]
机构
[1] Northwestern Polytech Univ, Xian 710072, Shanxi, Peoples R China
[2] Shenyang Aerosp Univ, Shenyang 110036, Liaoning, Peoples R China
[3] Jilin Univ, Changchun 130000, Jilin, Peoples R China
[4] Chinese Aeronaut Estab, Beijing 100012, Peoples R China
关键词
Air combat; Artificial intelligence; Multi-agent reinforcement learning; LEVEL; GAME; GO;
D O I
10.1016/j.engappai.2020.104112
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Air-to-air confrontation has attracted wide attention from artificial intelligence scholars. However, in the complex air combat process, operational strategy selection depends heavily on aviation expert knowledge, which is usually expensive and difficult to obtain. Moreover, it is challenging to select optimal action sequences efficiently and accurately with existing methods, due to the high complexity of action selection when involving hybrid actions, e.g., discrete/continuous actions. In view of this, we propose a novel Multi-Agent Hierarchical Policy Gradient algorithm (MAHPG), which is capable of learning various strategies and transcending expert cognition by adversarial self-play learning. Besides, a hierarchical decision network is adopted to deal with the complicated and hybrid actions. It has a hierarchical decision-making ability similar to humankind, and thus, reduces the action ambiguity efficiently. Extensive experimental results demonstrate that the MAHPG outperforms the state-of-the-art air combat methods in terms of both defense and offense ability. Notably, it is discovered that the MAHPG has the ability of Air Combat Tactics Interplay Adaptation, and new operational strategies emerged that surpass the level of experts.
引用
收藏
页数:14
相关论文
共 41 条
[11]  
Byrnes M, 2014, AIR SPACE POWER J, V28, P48
[12]   Deep blue [J].
Campbell, M ;
Hoane, AJ ;
Hsu, FH .
ARTIFICIAL INTELLIGENCE, 2002, 134 (1-2) :57-83
[13]  
Ernest N., 2016, J DEFEN MANAGE, V6, DOI [10.4172/2167-0374.1000144, DOI 10.4172/2167-0374.1000144]
[14]  
Floyd MW, 2017, PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P4714
[15]  
Glorot X., 2010, JMLR WORKSHOP C P, P249
[16]  
Goodrich K.H., 1993, A high-fidelity, six-degree-of-freedom batch simulation environment for tactical guidance research and evaluation, V4440
[17]  
GOODRICH KH, 1989, A COLLECTION OF TECHNICAL PAPERS, P350
[18]  
Heinrich J, 2015, PR MACH LEARN RES, V37, P805
[19]   A survey and critique of multiagent deep reinforcement learning [J].
Hernandez-Leal, Pablo ;
Kartal, Bilal ;
Taylor, Matthew E. .
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2019, 33 (06) :750-797
[20]   Human-level performance in 3D multiplayer games with population-based reinforcement learning [J].
Jaderberg, Max ;
Czarnecki, Wojciech M. ;
Dunning, Iain ;
Marris, Luke ;
Lever, Guy ;
Castaneda, Antonio Garcia ;
Beattie, Charles ;
Rabinowitz, Neil C. ;
Morcos, Ari S. ;
Ruderman, Avraham ;
Sonnerat, Nicolas ;
Green, Tim ;
Deason, Louise ;
Leibo, Joel Z. ;
Silver, David ;
Hassabis, Demis ;
Kavukcuoglu, Koray ;
Graepel, Thore .
SCIENCE, 2019, 364 (6443) :859-+