A Reinforcement Learning Method Based on an Improved Sampling Mechanism for Unmanned Aerial Vehicle Penetration

被引:5
作者
Wang, Yue [1 ]
Li, Kexv [1 ]
Zhuang, Xing [1 ]
Liu, Xinyu [1 ]
Li, Hanyu [1 ]
机构
[1] Beijing Inst Technol, Sch Mechatron Engn, Beijing 100081, Peoples R China
关键词
UAV penetration; reinforcement learning; sample utilization; task completion division; ATTITUDE-CONTROL; OPTIMIZATION; UAV; ALGORITHM;
D O I
10.3390/aerospace10070642
中图分类号
V [航空、航天];
学科分类号
08 ; 0825 ;
摘要
The penetration of unmanned aerial vehicles (UAVs) is an important aspect of UAV games. In recent years, UAV penetration has generally been solved using artificial intelligence methods such as reinforcement learning. However, the high sample demand of the reinforcement learning method poses a significant challenge specifically in the context of UAV games. To improve the sample utilization in UAV penetration, this paper innovatively proposes an improved sampling mechanism called task completion division (TCD) and combines this method with the soft actor critic (SAC) algorithm to form the TCD-SAC algorithm. To compare the performance of the TCD-SAC algorithm with other related baseline algorithms, this study builds a dynamic environment, a UAV game, and conducts training and testing experiments in this environment. The results show that among all the algorithms, the TCD-SAC algorithm has the highest sample utilization rate and the best actual penetration results, and the algorithm has a good adaptability and robustness in dynamic environments.
引用
收藏
页数:21
相关论文
共 36 条
[1]   UAV assistance paradigm: State-of-the-art in applications and challenges [J].
Alzahrani, Bander ;
Oubbati, Omar Sami ;
Barnawi, Ahmed ;
Atiquzzaman, Mohammed ;
Alghazzawi, Daniyal .
JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2020, 166
[2]   A MARKOVIAN DECISION PROCESS [J].
BELLMAN, R .
JOURNAL OF MATHEMATICS AND MECHANICS, 1957, 6 (05) :679-684
[3]   Six-DOF Spacecraft Optimal Trajectory Planning and Real-Time Attitude Control: A Deep Neural Network-Based Approach [J].
Chai, Runqi ;
Tsourdos, Antonios ;
Savvaris, Al ;
Chai, Senchun ;
Xia, Yuanqing ;
Chen, C. L. Philip .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (11) :5005-5013
[4]   Solving Trajectory Optimization Problems in the Presence of Probabilistic Constraints [J].
Chai, Runqi ;
Savvaris, Al ;
Tsourdos, Antonios ;
Chai, Senchun ;
Xia, Yuanqing ;
Wang, Shuo .
IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (10) :4332-4345
[5]   Improved Gradient-Based Algorithm for Solving Aeroassisted Vehicle Trajectory Optimization Problems [J].
Chai, Runqi ;
Savvaris, Al ;
Tsourdos, Antonios ;
Chai, Senchun ;
Xia, Yuanqing .
JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2017, 40 (08) :2093-2101
[6]   Reward Learning From Very Few Demonstrations [J].
Eteke, Cem ;
Kebude, Dogancan ;
Akgun, Baris .
IEEE TRANSACTIONS ON ROBOTICS, 2021, 37 (03) :893-904
[7]   On Trajectory Homotopy to Explore and Penetrate Dynamically of Multi-UAV [J].
Fu, Jinyu ;
Sun, Guanghui ;
Yao, Weiran ;
Wu, Ligang .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (12) :24008-24019
[8]   Improved Tentacle-Based Guidance for Reentry Gliding Hypersonic Vehicle With No-Fly Zone Constraint [J].
Gao, Yang ;
Cai, Guangbin ;
Yang, Xiaogang ;
Hou, Mingzhe .
IEEE ACCESS, 2019, 7 :119246-119258
[9]  
Han SC, 2009, INT J CONTROL AUTOM, V7, P553, DOI [10.1007/S12555-009-0407-1, 10.1007/s12555-009-0407-1]
[10]   Novel trajectory prediction algorithms for hypersonic gliding vehicles based on maneuver mode on-line identification and intent inference [J].
Hu, Yudong ;
Gao, Changsheng ;
Li, Junlong ;
Jing, Wuxing ;
Li, Zhen .
MEASUREMENT SCIENCE AND TECHNOLOGY, 2021, 32 (11)