Deep Reinforcement Learning of UAV Tracking Control Under Wind Disturbances Environments

被引：64

作者：

Ma, Bodi ^{[1
]}

Liu, Zhenbao ^{[1
,2
]}

Dang, Qingqing ^{[1
]}

Zhao, Wen ^{[1
]}

Wang, Jingyan ^{[3
]}

Cheng, Yao ^{[3
]}

Yuan, Zhirong ^{[4
]}

机构：

[1] Northwestern Polytech Univ, Sch Civil Aviat, Xian 710072, Peoples R China

[2] Northwestern Polytech Univ, Res & Dev Inst Shenzhen, Shenzhen 518071, Peoples R China

[3] Beijing Inst Spacecraft Syst Engn, Beijing 100094, Peoples R China

[4] Northwestern Polytech Univ, Inst 365, Xian 710072, Peoples R China

来源：

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT | 2023年 / 72卷

关键词：

Heuristic algorithms; Aerospace control; Autonomous aerial vehicles; Vehicle dynamics; Process control; Adaptation models; Robustness; Dynamic environment; reinforcement learning; tracking control; unmanned aerial vehicles (UAVs); wind disturbances;

D O I：

10.1109/TIM.2023.3265741

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Aiming at the problems of strong nonlinearity, strong coupling, and unknown interference encountered in the flight control process of unmanned aerial vehicles (UAVs) in a complex dynamic environment and reinforcement-learning-based algorithm generalization, this study presents an innovative incremental reinforcement-learning-based algorithm for UAV tracking control in a dynamic environment. The main goal is to make a UAV able to adjust its control policy in a dynamic environment. The UAV tracking control task is transformed into a Markov decision process (MDP) and further investigated using an incremental reinforcement-learning-based method. First, a policy relief (PR) method is used to make UAVs capable of performing an appropriate exploration in a new environment. In this way, a UAV controller can mitigate the conflict between a new environment and the current knowledge to ensure better adaptability to a dynamic environment. In addition, a significance weighting (SW) method is developed to improve the utilization of episodes with higher importance and richer information. In the proposed method, learning episodes that include more useful information are assigned with higher importance weights. The numerical simulation, hardware-in-the-loop (HITL) experiments, and real-world flight experiments are conducted to evaluate the performance of the proposed method. The results demonstrate high accuracy and effectiveness and good robustness of the proposed control algorithm in a dynamic flight environment.

引用

页数：13

共 42 条

[1]

Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265

[2] Survey of Deep Reinforcement Learning for Motion Planning of Autonomous Vehicles [J].

Aradi, Szilard .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (02) :740-759

[3]

Birnbaum Z, 2015, INT CONF UNMAN AIRCR, P1310, DOI 10.1109/ICUAS.2015.7152425

[4] Robust adaptive recursive sliding mode attitude control for a quadrotor with unknown disturbances [J].

Chen, Lulu ;

Liu, Zhenbao ;

Gao, Honggang ;

Wang, Guodong .

ISA TRANSACTIONS, 2022, 122 :114-125

[5] Joint Speed Control and Energy Replenishment Optimization for UAV-Assisted IoT Data Collection With Deep Reinforcement Transfer Learning [J].

Chu, Nam H. ;

Hoang, Dinh Thai ;

Nguyen, Diep N. ;

Huynh, Nguyen Van ;

Dutkiewicz, Eryk .

IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (07) :5778-5793

[6]

Eslamiat Hossein, 2021, 2021 International Conference on Unmanned Aircraft Systems (ICUAS), P843, DOI 10.1109/ICUAS51884.2021.9476719

[7]

Gao HG, 2020, INT CONF UNMAN AIRCR, P1650, DOI [10.1109/icuas48674.2020.9213836, 10.1109/ICUAS48674.2020.9213836]

[8]

Gu S. S., 2017, ADV NEUR IN, V30, P1

[9] Full mode flight dynamics modelling and control of stopped-rotor UAV [J].

He, Ao ;

Gao, Honggang ;

Zhang, Shanshan ;

Gao, Zhenghong ;

Ma, Bodi ;

Chen, Lulu ;

Dai, Wei .

CHINESE JOURNAL OF AERONAUTICS, 2022, 35 (10) :95-105

[10]

Kaushik P., 2021, PROC IEEE 18 INDIA C, P1

← 1 2 3 4 5 →