Dynamic Penetration Decision of Loitering Munition Group Based on Knowledge-assisted Reinforcement Learning

被引:0
作者
Sun, Hao [1 ]
Li, Haiqing [1 ,2 ]
Liang, Yan [1 ]
Ma, Chaoxiong [1 ]
Wu, Han [1 ]
机构
[1] School of Automation, Northwestern Polytechnical University, Shaanxi, Xian
[2] Xian Modern Control Technology Research Institute, Shaanxi, Xian
来源
Binggong Xuebao/Acta Armamentarii | 2024年 / 45卷 / 09期
关键词
control decision; dynamic environment penetration; knowledge-assisted deep reinforcement learning; loitering munition group; soft actor-critic algorithm;
D O I
10.12382/bgxb.2023.0827
中图分类号
学科分类号
摘要
The loitering munition group penetration control decision (LMGPCD) is the key to improve the autonomy and intelligence of loitering munition group combat. A knowledge-assisted reinforcement learning-based LMGPCD algorithm is proposed to solve the issue due to the difficult online generation of penetration maneuver command for loitering munition group in the dynamic environment containing interceptors and air defenses. The state space and reward function are improved by domain knowledge and rule knowledge to enhance the generalization ability and training convergence speed of the algorithm. A LMGPCD decision framework based on the soft actor-critic (SAC) algorithm is constructed to increase the exploration efficiency of the algorithm. An expert experience applying and imitation learning method is utilized against the lacking of initial efficient training experience for the algorithm due to the narrow solution space caused by increasing number of missiles and threats. The experimental results show that the proposed algorithm can generate more effective penetration maneuver command in real time in a dynamic environment compared to other algorithm, which verifies the effectiveness of the proposed algorithm. © 2024 China Ordnance Industry Corporation. All rights reserved.
引用
收藏
页码:3161 / 3176
页数:15
相关论文
共 24 条
[1]  
SUN Y N, ZHONG X M, WANG L Y, Et al., Space-based information supports long-range precision strike operations and its system construction requirements, Tactical Missile Technology, 5, pp. 13-18, (2018)
[2]  
ZHANG K, LIU Z K, HUA S, Et al., Influence of different bore structures on engraving process on projectile, Acta Armamentarii, 44, 6, pp. 1576-1587, (2023)
[3]  
YANG L, ZHANG X J, ZHANG Y, Et al., Collision free 4D path planning for multiple UAVs based on spatial refined voting mechanism and PSO approach [J], Chinese Journal of Aeronautics, 32, 6, pp. 1504-1519, (2019)
[4]  
WANG N Y, BAI Y L, WEI J P, Et al., Guidance law for multimissile optimal cooperative lured penetration, Journal of Astronautics, 43, 4, pp. 434-444, (2022)
[5]  
ZHAO J M, HE H Z, WANG S Q, Et al., Research on joint path planning for multiple UAVs target tracking and obstacle avoidance in complicated environment, Acta Armamentarii, 44, 9, pp. 2685-2696, (2023)
[6]  
GUO H, GUO X H., Local path planning algorithm for UAV based on improved velocity obstacle method, Acta Aeronautica et Astronautica Sinica, 44, 11, pp. 271-281, (2023)
[7]  
SU W S, YAO D N, LI K B, Et al., A novel biased proportional navigation guidance law for close approach phase [J], Chinese Journal of Aeronautics, 19, 1, pp. 228-237, (2016)
[8]  
ZHANG N, GAI W D, ZHONG M Y, Et al., A fast finite-time convergent guidance law with nonlinear disturbance observer for unmanned aerial vehicles collision avoidance [J], Aerospace Science & Technology, 86, pp. 204-214, (2019)
[9]  
QIAN M S, WU Z, JIANG B., Cerebellar model articulation neural network-based distributed fault tolerant tracking control with obstacle avoidance for fixed-wing UAVs, IEEE Transactions on Aerospace and Electronic Systems, 59, 5, pp. 6841-6852, (2023)
[10]  
WANG Y X, TIAN Y Y, LI X, Et al., Self-adaptive dynamic window approach in dense obstacles, Control and Decision, 34, 5, pp. 927-936, (2019)