Learning Continuous 3-DoF Air-to-Air Close-in Combat Strategy using Proximal Policy Optimization

被引：11

作者：

Li, Luntong ^{[1
]}

Zhou, Zhiming ^{[2
]}

Chai, Jiajun ^{[1
]}

Liu, Zhen ^{[2
]}

Zhu, Yuanheng ^{[1
]}

Yi, Jianqiang ^{[2
]}

机构：

[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China

[2] Chinese Acad Sci, Integrated Infoimat Syst Res Ctr, Inst Automat, Beijing 100190, Peoples R China

来源：

2022 IEEE CONFERENCE ON GAMES, COG | 2022年

基金：

中国国家自然科学基金;

关键词：

air-combat; reinforcement learning; proximal policy optimization; flight simulation;

D O I：

10.1109/CoG51982.2022.9893690

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Air-to-air close-in combat is based on many basic fighter maneuvers and can be largely modeled as an algorithmic function of inputs. This paper studies autonomous close-in combat, to learn new strategy that can adapt to different circumstances to fight against an opponent. Current methods for learning close-in combat strategy are largely limited to discrete action sets whether in the form of rules, actions or sub-polices. In contrast, we consider one-on-one air combat game with continuous action space and present a deep reinforcement learning method based on proximal policy optimization (PPO) that learns close-in combat strategy from observations in an end-to-end manner. The state space is designed to promote the learning efficiency of PPO. We also design a minimax strategy for the game. Simulation results show that the learned PPO agent is able to defeat the minimax opponent with about 97% win rate.

引用

页码：616 / 619

页数：4

共 14 条

[1]

Byrnes M.W., 2014, AIR SPACE POWER J, V28, P48

[2]

Ding Z., 2021, IEEE Trans. Neural Netw. Learn. Syst.

[3]

Ernest, 2016, J DEFEN MANAGE, DOI [10.4172/2167-0374.1000144, DOI 10.4172/2167-0374.1000144]

[4]

Hao S., 2021, P 11 INT C INT CONTR

[5]

Kang YM, 2019, CHIN AUTOM CONGR, P5231, DOI [10.1109/CAC48633.2019.8996232, 10.1109/cac48633.2019.8996232]

[6] Missile guidance with assisted deep reinforcement learning for head-on interception of maneuvering target [J].

Li, Weifan ;

Zhu, Yuanheng ;

Zhao, Dongbin .

COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (02) :1205-1216

[7] Air-Combat Strategy Using Approximate Dynamic Programming [J].

McGrew, James S. ;

How, Jonathan P. ;

Williams, Brian ;

Roy, Nicholas .

JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2010, 33 (05) :1641-1654

[8]

Pope A.P., 2021, arXiv

[9]

Ramirez M, 2018, PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), P1318

[10]

Schulman J, 2017, Arxiv, DOI arXiv:1707.06347

← 1 2 →