Decision-making method for air combat maneuver based on explainable reinforcement learning

被引：0

作者：

Yang, Shuheng ^{[1
,2
]}

Zhang, Dong ^{[1
,2
]}

Xiong, Wei ^{[1
,2
]}

Ren, Zhi ^{[1
,2
]}

Tang, Shuo ^{[1
,2
]}

机构：

[1] School of Astronautics, Northwestern Polytechnical University, Xi’an

[2] Shaanxi Key Laboratory of Aerospace Flight Vehicle Design, Northwestern Polytechnical University, Xi’an

来源：

Hangkong Xuebao/Acta Aeronautica et Astronautica Sinica | 2024年 / 45卷 / 18期

关键词：

explainability; identification of air combat intention; intelligent air combat; maneuver decision-making; reinforcement learning;

D O I：

10.7527/S1000-6893.2023.29922

中图分类号：

学科分类号：

摘要：

Intelligent air combat is the trend of air combat in the future，and deep reinforcement learning is an impor- tant technical way to realize intelligent decision-making in air combat. However，due to the characteristic of“black box model”，deep reinforcement learning has the shortcomings such as difficulty in explaining strategies，understanding in- tentions，and trusting decisions，which brings challenges to the application of deep reinforcement learning in intelligent air combat. To solve these problems，an intelligent air combat maneuver decision-making method is proposed based on explainable reinforcement learning. Firstly，based on the strategy-level explanation method and dynamic Bayesian network，an interpretability model and the maneuvering intention recognition model are constructed. Secondly，through calculation of the importance of the decision and the probability of maneuvering intention，the intention-level of the Unmanned Aerial Vehicle（UAV）maneuver decision-making process can be explained. Finally，based on the in- tent interpretation results，the reward function and training strategy of the deep reinforcement learning algorithm are modified，and the effectiveness of the proposed method is verified by simulation and comparative analysis. The pro- posed method can obtain air combat maneuver strategies with excellent effectiveness，strong reliability，and high credibility. © 2024 Chinese Society of Astronautics. All rights reserved.

引用

共 23 条

[1] SUN Z X, YANG S Q, Et al., A survey of air combat artificial intelligence［J］, Acta Aeronautica et Astronautica Sinica, 42, 8, (2021)
[2] GETZ W M，, PACHTER M., Two-target pursuit-evasion differential games in the plane［J］, Journal of Op- timization Theory and Applications, 34, 3, pp. 383-403, (1981)
[3] MA D Q., Study on tactical decision of UAV medium-range air combat［C］∥ The 26th Chinese Control and Decision Conference, pp. 135-139, (2014)
[4] VIRTANEN K，, RAIVIO T，, HAMALAINEN R P., Modeling pilot’s sequential maneuvering decisions by a multistage influence diagram［J］, Journal of Guidance，Control，and Dynamics, 27, 4, pp. 665-677, (2004)
[5] LIANG S Y，, TIAN L Y, Proceedings of the 3rd International Conference on Computer Science and Application Engineering, pp. 1-5, (2019)
[6] ZHOU P, HUANG J T, ZHANG S，, Et al., Intelligent air combat decision making and simulation based on deep reinforcement learning［J］, Acta Aeronautica et Astronau- tica Sinica, 44, 4, (2023)
[7] LI W T, WANG Z Y，, Et al., Intelligent ma- neuvering decision-making in two-UCAVs cooperative air combat based on improved MADDPG with hybird hy- per network［J/OL］, Acta Aeronautica et Astronautica Si- nica
[8] LI Z L, BAI S X，, Et al., UAV autonomous air com- bat decision-making based on AM-SAC［J］, Acta Arma- mentarii, 44, 9, pp. 2849-2858, (2023)
[9] FU X W, ZHU J D, Et al., Maneuvering decision-making of multi-UAV attack-defence confrontation based on PER-MATD3［J］, Acta Aeronautica et Astro- nautica Sinica, 44, 7, (2023)
[10] TOPIN N, FANG F，, Et al., Iterative bound- ing MDPs： Learning interpretable policies via non-interpretable methods, Proceedings of the AAAI Conference on Artificial Intelligence, (2021)

← 1 2 3 →