Learning Continuous 3-DoF Air-to-Air Close-in Combat Strategy using Proximal Policy Optimization

被引:11
作者
Li, Luntong [1 ]
Zhou, Zhiming [2 ]
Chai, Jiajun [1 ]
Liu, Zhen [2 ]
Zhu, Yuanheng [1 ]
Yi, Jianqiang [2 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Integrated Infoimat Syst Res Ctr, Inst Automat, Beijing 100190, Peoples R China
来源
2022 IEEE CONFERENCE ON GAMES, COG | 2022年
基金
中国国家自然科学基金;
关键词
air-combat; reinforcement learning; proximal policy optimization; flight simulation;
D O I
10.1109/CoG51982.2022.9893690
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Air-to-air close-in combat is based on many basic fighter maneuvers and can be largely modeled as an algorithmic function of inputs. This paper studies autonomous close-in combat, to learn new strategy that can adapt to different circumstances to fight against an opponent. Current methods for learning close-in combat strategy are largely limited to discrete action sets whether in the form of rules, actions or sub-polices. In contrast, we consider one-on-one air combat game with continuous action space and present a deep reinforcement learning method based on proximal policy optimization (PPO) that learns close-in combat strategy from observations in an end-to-end manner. The state space is designed to promote the learning efficiency of PPO. We also design a minimax strategy for the game. Simulation results show that the learned PPO agent is able to defeat the minimax opponent with about 97% win rate.
引用
收藏
页码:616 / 619
页数:4
相关论文
共 14 条
[1]  
Byrnes M.W., 2014, AIR SPACE POWER J, V28, P48
[2]  
Ding Z., 2021, IEEE Trans. Neural Netw. Learn. Syst.
[3]  
Ernest, 2016, J DEFEN MANAGE, DOI [10.4172/2167-0374.1000144, DOI 10.4172/2167-0374.1000144]
[4]  
Hao S., 2021, P 11 INT C INT CONTR
[5]  
Kang YM, 2019, CHIN AUTOM CONGR, P5231, DOI [10.1109/CAC48633.2019.8996232, 10.1109/cac48633.2019.8996232]
[6]   Missile guidance with assisted deep reinforcement learning for head-on interception of maneuvering target [J].
Li, Weifan ;
Zhu, Yuanheng ;
Zhao, Dongbin .
COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (02) :1205-1216
[7]   Air-Combat Strategy Using Approximate Dynamic Programming [J].
McGrew, James S. ;
How, Jonathan P. ;
Williams, Brian ;
Roy, Nicholas .
JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2010, 33 (05) :1641-1654
[8]  
Pope A.P., 2021, arXiv
[9]  
Ramirez M, 2018, PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), P1318
[10]  
Schulman J, 2017, Arxiv, DOI arXiv:1707.06347