UAV Maneuvering Decision-Making Algorithm Based on Deep Reinforcement Learning Under the Guidance of Expert Experience

被引：0

作者：

Zhan, Guang ^{[1
]}

Zhang, Kun ^{[1
,2
]}

Li, Ke ^{[1
]}

Piao, Haiyin ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Elect & Informat, Xian 710072, Peoples R China

[2] Sci & Technol Electroopt Control Lab, Luoyang 471009, Peoples R China

来源：

JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS | 2024年 / 35卷 / 03期

关键词：

unmanned aerial vehicle (UAV); maneuvering decision-making; autonomous air-delivery; deep reinforcement learning; reward shaping; expert experience;

D O I：

10.23919/JSEE.2024.000022

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Autonomous umanned aerial vehicle (UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battle-field. A large amount of research has been devoted to improving the autonomous decision-making ability of UAV in an interactive environment, where finding the optimal maneuvering decision-making policy became one of the key issues for enabling the intelligence of UAV. In this paper, we propose a maneuvering decision-making algorithm for autonomous air-delivery based on deep reinforcement learning under the guidance of expert experience. Specifically, we refine the guidance towards area and guidance towards specific point tasks for the air-delivery process based on the traditional air-to-surface fire control methods. Moreover, we construct the UAV maneuvering decision-making model based on Markov decision processes (MDPs). Specifically, we present a reward shaping method for the guidance towards area and guidance towards specific point tasks using potential-based function and expert-guided advice. The proposed algorithm could accelerate the convergence of the maneuvering decision-making policy and increase the stability of the policy in terms of the output during the later stage of training process. The effectiveness of the proposed maneuvering decision-making policy is illustrated by the curves of training parameters and extensive experimental results for testing the trained policy.

引用

页码：644 / 665

页数：22

共 34 条

[1]

BABAK B, 2023, P IEEE 13 ANN COMP C, DOI [10.1109/CCWC-57344.2023.10099211.33, DOI 10.1109/CCWC-57344.2023.10099211.33]

[2]

Bo-bo Meng, 2010, Proceedings of the 2010 International Conference on Intelligent Computation Technology and Automation (ICICTA 2010), P1106, DOI 10.1109/ICICTA.2010.235

[3] Using interpolation to improve path planning:: The field D* algorithm [J].

Ferguson, Dave ;

Stentz, Anthony .

JOURNAL OF FIELD ROBOTICS, 2006, 23 (02) :79-101

[4] An Introduction to Deep Reinforcement Learning [J].

Francois-Lavet, Vincent ;

Henderson, Peter ;

Islam, Riashat ;

Bellemare, Marc G. ;

Pineau, Joelle .

FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2018, 11 (3-4) :219-354

[5]

JESIMAR D S, 2017, International Journal onArtificial Intelligence Tools, V26

[6]

Kaluder H., 2011, 2011 Proceedings of 34th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 20111), P717

[7] Trajectory Tracking Control of Multirotors from Modelling to Experiments: A Survey [J].

Lee, Hyeonbeom ;

Kim, H. Jin .

INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2017, 15 (01) :281-292

[8] Feasibility Analyses of Real-Time Detection of Wildlife Using UAV-Derived Thermal and RGB Images [J].

Lee, Seunghyeon ;

Song, Youngkeun ;

Kil, Sung-Ho .

REMOTE SENSING, 2021, 13 (11)

[9]

LI K, 2021, Sensors, V21

[10]

Lillicrap T. P., 2019, Continuous control with deep reinforcement learning

← 1 2 3 4 →