Air Combat Maneuver Decision Method Based on A3C Deep Reinforcement Learning

被引：27

作者：

Fan, Zihao ^{[1
]}

Xu, Yang ^{[2
,3
]}

Kang, Yuhang ^{[4
]}

Luo, Delin ^{[1
]}

机构：

[1] Xiamen Univ, Sch Aerosp Engn, Xiamen 361102, Peoples R China

[2] Northwestern Polytech Univ, Sch Civil Aviat, Xian 710072, Peoples R China

[3] NPU, Yangtze River Delta Res Inst, Suzhou 215400, Peoples R China

[4] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China

来源：

MACHINES | 2022年 / 10卷 / 11期

基金：

中国国家自然科学基金;

关键词：

deep reinforcement learning; UCAV; maneuver decision; A3C; asynchronous mechanism;

D O I：

10.3390/machines10111033

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

To solve the maneuvering decision problem in air combat of unmanned combat aircraft vehicles (UCAVs), in this paper, an autonomous maneuver decision method is proposed for a UCAV based on deep reinforcement learning. Firstly, the UCAV flight maneuver model and maneuver library of both opposing sides are established. Then, considering the different state transition effects of various actions when the pitch angles of the UCAVs are different, the 10 state variables including the pitch angle, are taken as the state space. Combined with the air combat situation threat assessment index model, a two-layer reward mechanism combining internal reward and sparse reward is designed as the evaluation basis of reinforcement learning. Then, the neural network model of the full connection layer is built according to an Asynchronous Advantage Actor-Critic (A3C) algorithm. In the way of multi-threading, our UCAV keeps interactively learning with the environment to train the model and gradually learns the optimal air combat maneuver countermeasure strategy, and guides our UCAV to conduct action selection. The algorithm reduces the correlation between samples through multi-threading asynchronous learning. Finally, the effectiveness and feasibility of the method are verified in three different air combat scenarios.

引用

页数：18

共 26 条

[1]

Austin F, 1987, Guidance, Navigation and Control Conference, V87, P656

[2] Drone Deep Reinforcement Learning: A Review [J].

Azar, Ahmad Taher ;

Koubaa, Anis ;

Ali Mohamed, Nada ;

Ibrahim, Habiba A. ;

Ibrahim, Zahra Fathy ;

Kazim, Muhammad ;

Ammar, Adel ;

Benjdira, Bilel ;

Khamis, Alaa M. ;

Hameed, Ibrahim A. ;

Casalino, Gabriella .

ELECTRONICS, 2021, 10 (09)

[3]

Burgin G.H., 1986, Technical report

[4] Counter a Drone in a Complex Neighborhood Area by Deep Reinforcement Learning [J].

Cetin, Ender ;

Barrado, Cristina ;

Pastor, Enric .

SENSORS, 2020, 20 (08)

[5]

[邓可 Deng Ke], 2019, [火力与指挥控制, Fire Control & Command Control], V44, P61

[6]

Ernest, 2016, J DEFEN MANAGE, DOI [10.4172/2167-0374.1000144, DOI 10.4172/2167-0374.1000144]

[7]

Feng C, 2016, 2016 IEEE CHINESE GUIDANCE, NAVIGATION AND CONTROL CONFERENCE (CGNCC), P687, DOI 10.1109/CGNCC.2016.7828869

[8]

Fu Li, 2015, Journal of Beijing University of Aeronautics and Astronautics, V41, P1994, DOI 10.13700/j.bh.1001-5965.2014.0726

[9] Active target defence differential game: fast defender case [J].

Garcia, Eloy ;

Casbeer, David W. ;

Pachter, Meir .

IET CONTROL THEORY AND APPLICATIONS, 2017, 11 (17) :2985-2993

[10]

Ha JS, 2015, P AMER CONTR CONF, P3728, DOI 10.1109/ACC.2015.7171909

← 1 2 3 →