UAV navigation in high dynamic environments: A deep reinforcement learning approach

被引：104

作者：

Guo, Tong ^{[1
,2
]}

Jiang, Nan ^{[1
]}

Li, Biyue ^{[1
,2
]}

Zhu, Xi ^{[3
]}

Wang, Ya ^{[4
,5
]}

Du, Wenbo ^{[1
,2
]}

机构：

[1] Beihang Univ, Sch Elect & Informat Engn, Beijing 100083, Peoples R China

[2] Beihang Univ, Minist Ind & Informat Technol China, Key Lab Adv Technol Near Space Informat Syst, Beijing 100083, Peoples R China

[3] Beihang Univ, Res Inst Frontier Sci, Beijing 100083, Peoples R China

[4] Beihang Univ, Coll Software, Beijing 100083, Peoples R China

[5] Beihang Univ, State Key Lab Software Dev Environm, Beijing 100083, Peoples R China

来源：

CHINESE JOURNAL OF AERONAUTICS | 2021年 / 34卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Autonomous vehicles; Deep learning; Motion planning; Navigation; Reinforcement learning; Unmanned Aerial Vehicle (UAV);

D O I：

10.1016/j.cja.2020.05.011

中图分类号：

V [航空、航天];

学科分类号：

08 ; 0825 ;

摘要：

Unmanned Aerial Vehicle (UAV) navigation is aimed at guiding a UAV to the desired destinations along a collision-free and efficient path without human interventions, and it plays a crucial role in autonomous missions in harsh environments. The recently emerging Deep Reinforcement Learning (DRL) methods have shown promise for addressing the UAV navigation problem, but most of these methods cannot converge due to the massive amounts of interactive data when a UAV is navigating in high dynamic environments, where there are numerous obstacles moving fast. In this work, we propose an improved DRL-based method to tackle these fundamental limitations. To be specific, we develop a distributed DRL framework to decompose the UAV navigation task into two simpler sub-tasks, each of which is solved through the designed Long Short-Term Memory (LSTM) based DRL network by using only part of the interactive data. Furthermore, a clipped DRL loss function is proposed to closely stack the two sub-solutions into one integral for the UAV navigation problem. Extensive simulation results are provided to corroborate the superiority of the proposed method in terms of the convergence and effectiveness compared with those of the state-of-the-art DRL methods. ? 2020 Production and hosting by Elsevier Ltd. on behalf of Chinese Society of Aeronautics and Astronautics. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/ licenses/by-nc-nd/4.0/).

引用

页码：479 / 489

页数：11

共 42 条

[1]

Bu SH, 2016, 2016 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2016), P4564, DOI 10.1109/IROS.2016.7759672

[2] Path planning with modified A star algorithm for a mobile robot [J].

Duchon, Frantisek ;

Babinec, Andrej ;

Kajan, Martin ;

Beno, Peter ;

Florek, Martin ;

Fico, Tomas ;

Jurisica, Ladislav .

MODELLING OF MECHANICAL AND MECHATRONIC SYSTEMS, 2014, 96 :59-69

[3]

Erdelj M, 2016, 2016 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS (ICNC)

[4]

Faical Bruno S., 2016, 2016 17th IEEE International Conference on Mobile Data Management (MDM), P32, DOI 10.1109/MDM.2016.96

[5]

Hausknecht M. J., 2015, Computing Research Repository, DOI DOI 10.48550/ARXIV.1507.06527

[6]

Hu YR, 2004, IEEE INT CONF ROBOT, P4350

[7]

Pham HX, 2018, IEEE INT SYMP SAFE

[8]

Imanberdiyev N, 2016, 14 INT C CONTROL AUT

[9] Distributed Finite-Time Cooperative Control for Quadrotor Formation [J].

Li, Yue ;

Yang, Jun ;

Zhang, Ke .

IEEE ACCESS, 2019, 7 :66753-66763

[10] Incrementally reducing dispersion by increasing Voronoi bias in RRTs [J].

Lindemann, SR ;

LaValle, SM .

2004 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1- 5, PROCEEDINGS, 2004, :3251-3257

← 1 2 3 4 5 →