UAV Maneuvering Decision-Making Algorithm Based on Deep Reinforcement Learning Under the Guidance of Expert Experience

被引:0
作者
Zhan, Guang [1 ]
Zhang, Kun [1 ,2 ]
Li, Ke [1 ]
Piao, Haiyin [1 ]
机构
[1] Northwestern Polytech Univ, Sch Elect & Informat, Xian 710072, Peoples R China
[2] Sci & Technol Electroopt Control Lab, Luoyang 471009, Peoples R China
关键词
unmanned aerial vehicle (UAV); maneuvering decision-making; autonomous air-delivery; deep reinforcement learning; reward shaping; expert experience;
D O I
10.23919/JSEE.2024.000022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Autonomous umanned aerial vehicle (UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battle-field. A large amount of research has been devoted to improving the autonomous decision-making ability of UAV in an interactive environment, where finding the optimal maneuvering decision-making policy became one of the key issues for enabling the intelligence of UAV. In this paper, we propose a maneuvering decision-making algorithm for autonomous air-delivery based on deep reinforcement learning under the guidance of expert experience. Specifically, we refine the guidance towards area and guidance towards specific point tasks for the air-delivery process based on the traditional air-to-surface fire control methods. Moreover, we construct the UAV maneuvering decision-making model based on Markov decision processes (MDPs). Specifically, we present a reward shaping method for the guidance towards area and guidance towards specific point tasks using potential-based function and expert-guided advice. The proposed algorithm could accelerate the convergence of the maneuvering decision-making policy and increase the stability of the policy in terms of the output during the later stage of training process. The effectiveness of the proposed maneuvering decision-making policy is illustrated by the curves of training parameters and extensive experimental results for testing the trained policy.
引用
收藏
页码:644 / 665
页数:22
相关论文
共 34 条
[1]  
BABAK B, 2023, P IEEE 13 ANN COMP C, DOI [10.1109/CCWC-57344.2023.10099211.33, DOI 10.1109/CCWC-57344.2023.10099211.33]
[2]  
Bo-bo Meng, 2010, Proceedings of the 2010 International Conference on Intelligent Computation Technology and Automation (ICICTA 2010), P1106, DOI 10.1109/ICICTA.2010.235
[3]   Using interpolation to improve path planning:: The field D* algorithm [J].
Ferguson, Dave ;
Stentz, Anthony .
JOURNAL OF FIELD ROBOTICS, 2006, 23 (02) :79-101
[4]   An Introduction to Deep Reinforcement Learning [J].
Francois-Lavet, Vincent ;
Henderson, Peter ;
Islam, Riashat ;
Bellemare, Marc G. ;
Pineau, Joelle .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2018, 11 (3-4) :219-354
[5]  
JESIMAR D S, 2017, International Journal onArtificial Intelligence Tools, V26
[6]  
Kaluder H., 2011, 2011 Proceedings of 34th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 20111), P717
[7]   Trajectory Tracking Control of Multirotors from Modelling to Experiments: A Survey [J].
Lee, Hyeonbeom ;
Kim, H. Jin .
INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2017, 15 (01) :281-292
[8]   Feasibility Analyses of Real-Time Detection of Wildlife Using UAV-Derived Thermal and RGB Images [J].
Lee, Seunghyeon ;
Song, Youngkeun ;
Kil, Sung-Ho .
REMOTE SENSING, 2021, 13 (11)
[9]  
LI K, 2021, Sensors, V21
[10]  
Lillicrap T. P., 2019, Continuous control with deep reinforcement learning