Maneuver Strategy Generation of UCAV for within Visual Range Air Combat Based on Multi-Agent Reinforcement Learning and Target Position Prediction

被引：34

作者：

Kong, Weiren ^{[1
]}

Zhou, Deyun ^{[1
]}

Yang, Zhen ^{[1
]}

Zhang, Kai ^{[1
]}

Zeng, Lina ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Elect & Informat, Xian 710072, Peoples R China

来源：

APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 15期

基金：

中国国家自然科学基金;

关键词：

air combat; multi-agent deep reinforcement learning; maneuver strategy; network training; unmanned combat aerial vehicle;

D O I：

10.3390/app10155198

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

With the development of unmanned combat air vehicles (UCAVs) and artificial intelligence (AI), within visual range (WVR) air combat confrontations utilizing intelligent UCAVs are expected to be widely used in future air combats. As controlling highly dynamic and uncertain WVR air combats from the ground stations of the UCAV is not feasible, it is necessary to develop an algorithm that can generate highly intelligent air combat strategies in order to enable UCAV to independently complete air combat missions. In this paper, a 1-vs.-1 WVR air combat strategy generation algorithm is proposed using the multi-agent deep deterministic policy gradient (MADDPG). A 1-vs.-1 WVR air combat is modeled as a two-player zero-sum Markov game (ZSMG). A method for predicting the position of the target is introduced into the model in order to enable the UCAV to predict the target's actions and position. Moreover, to ensure that the UCAV is not limited by the constraints of the basic fighter maneuver (BFM) library, the action space is considered to be a continuous one. At the same time, a potential-based reward shaping method is proposed in order to improve the efficiency of the air combat strategy generation algorithm. Finally, the efficiency of the air combat strategy generation algorithm and the intelligence level of the resulting strategy is verified through simulation experiments. The results show that an air combat strategy using target position prediction is superior to the one that does not use target position prediction.

引用

页数：23

共 27 条

[1]

[Anonymous], 2015, P INT C LEARN REPR I

[2]

Austin F., 1987, GUIDANCE NAVIGATION, P2393

[3]

Burgin GH, 1988, Technical report

[4]

Devlin S., 2011, 10 INT C AUT AG MULT, P225

[5]

Devlin S., 2012, AUTONOMOUS AGENTS MU, P433

[6] Multiple UCAVs Cooperative Air Combat Simulation Platform Based on PSO, ACO, and Game Theory [J].

Duan, Haibin ;

Wei, Xingxing ;

Dong, Zhuoning .

IEEE AEROSPACE AND ELECTRONIC SYSTEMS MAGAZINE, 2013, 28 (11) :12-19

[7]

Guo H., 2010, ELECT OPTICAL CONTRO, V17, P28

[8] A predator-prey particle swarm optimization approach to multiple UCAV air combat modeled by dynamic game theory [J].

Duan, Haibin ;

Li, Pei ;

Yu, Yaxiang .

IEEE/CAA Journal of Automatica Sinica, 2015, 2 (01) :11-18

[9] PERFORMANCE-MEASUREMENT DURING SIMULATED AIR-TO-AIR COMBAT [J].

KELLY, MJ .

HUMAN FACTORS, 1988, 30 (04) :495-506

[10]

Lagoudakis Michail, 2012, ARXIV13010580

← 1 2 3 →