A Multi-UCAV Cooperative Decision-Making Method Based on an MAPPO Algorithm for Beyond-Visual-Range Air Combat

被引：24

作者：

Liu, Xiaoxiong ^{[1
]}

Yin, Yi ^{[1
]}

Su, Yuzhan ^{[1
]}

Ming, Ruichen ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Automat, Xian 710129, Peoples R China

来源：

AEROSPACE | 2022年 / 9卷 / 10期

基金：

中国国家自然科学基金;

关键词：

multiple unmanned combat aerial vehicles; multi-agent proximal policy optimization; the missile attack area model; comprehensive reward; centralized training and distributed execution; STRATEGY;

D O I：

10.3390/aerospace9100563

中图分类号：

V [航空、航天];

学科分类号：

08 ; 0825 ;

摘要：

To solve the problems of autonomous decision making and the cooperative operation of multiple unmanned combat aerial vehicles (UCAVs) in beyond-visual-range air combat, this paper proposes an air combat decision-making method that is based on a multi-agent proximal policy optimization (MAPPO) algorithm. Firstly, the model of the unmanned combat aircraft is established on the simulation platform, and the corresponding maneuver library is designed. In order to simulate the real beyond-visual-range air combat, the missile attack area model is established, and the probability of damage occurring is given according to both the enemy and us. Secondly, to overcome the sparse return problem of traditional reinforcement learning, according to the angle, speed, altitude, distance of the unmanned combat aircraft, and the damage of the missile attack area, this paper designs a comprehensive reward function. Finally, the idea of centralized training and distributed implementation is adopted to improve the decision-making ability of the unmanned combat aircraft and improve the training efficiency of the algorithm. The simulation results show that this algorithm can carry out a multi-aircraft air combat confrontation drill, form new tactical decisions in the drill process, and provide new ideas for multi-UCAV air combat.

引用

页数：19

共 28 条

[1] An approximate dynamic programming approach for solving an air combat maneuvering problem [J].

Crumpacker, James B. ;

Robbins, Matthew J. ;

Jenkins, Phillip R. .

EXPERT SYSTEMS WITH APPLICATIONS, 2022, 203

[2]

Ernest, 2016, J DEFEN MANAGE, DOI [10.4172/2167-0374.1000144, DOI 10.4172/2167-0374.1000144]

[3]

Ernest NicholasD., 2015, AIAA Infotech @ Aerospace, DOI DOI 10.2514/6.2015-0888

[4]

Fu L, 2013, CHIN CONT DECIS CONF, P586

[5] Active target defence differential game: fast defender case [J].

Garcia, Eloy ;

Casbeer, David W. ;

Pachter, Meir .

IET CONTROL THEORY AND APPLICATIONS, 2017, 11 (17) :2985-2993

[6]

Han Seung Jo, 2016, [Journal of Digital Convergence, 디지털융복합연구], V14, P143, DOI 10.14400/JDC.2016.14.6.143

[7] A survey and critique of multiagent deep reinforcement learning [J].

Hernandez-Leal, Pablo ;

Kartal, Bilal ;

Taylor, Matthew E. .

AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2019, 33 (06) :750-797

[8] Explainability in deep reinforcement learning [J].

Heuillet, Alexandre ;

Couthouis, Fabien ;

Diaz-Rodriguez, Natalia .

KNOWLEDGE-BASED SYSTEMS, 2021, 214 (214)

[9] Application of Deep Reinforcement Learning in Maneuver Planning of Beyond-Visual-Range Air Combat [J].

Hu, Dongyuan ;

Yang, Rennong ;

Zuo, Jialiang ;

Zhang, Ze ;

Wu, Jun ;

Wang, Ying .

IEEE ACCESS, 2021, 9 :32282-32297

[10] Air combat decision-making of multiple UCAVs based on constraint strategy games [J].

Li, Shou-yi ;

Chen, Mou ;

Wang, Yu-hui ;

Wu, Qing-xian .

DEFENCE TECHNOLOGY, 2022, 18 (03) :368-383

← 1 2 3 →