Multi-UAV Adaptive Cooperative Formation Trajectory Planning Based on an Improved MATD3 Algorithm of Deep Reinforcement Learning

被引：22

作者：

Xing, Xiaojun ^{[1
,2
]}

Zhou, Zhiwei ^{[1
]}

Li, Yan ^{[1
]}

Xiao, Bing ^{[1
]}

Xun, Yilin ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Automat, Xian 710129, Peoples R China

[2] Northwestern Polytech Univ Shenzhen, Res & Dev Inst, Shenzhen 518063, Peoples R China

来源：

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY | 2024年 / 73卷 / 09期

基金：

中国国家自然科学基金;

关键词：

Autonomous aerial vehicles; Trajectory planning; Trajectory; Deep reinforcement learning; Planning; Reinforcement learning; Long short term memory; Multi-unmanned aerial vehicle (multi-UAV) cooperative formation trajectory planning; deep reinforcement learning; potential field-based dense reward; adaptive formation strategy; hierarchical training mechanism; SPACECRAFT; NAVIGATION; AVOIDANCE; DESIGN;

D O I：

10.1109/TVT.2024.3389555

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Multi-unmanned aerial vehicle (multi-UAV) cooperative trajectory planning is an extremely challenging issue in UAV research field due to its NP-hard characteristic, collision avoiding constraints, close formation requirement, consensus convergence and high-dimensional action space etc. Especially, the difficulty of multi-UAV trajectory planning will boost comparatively when there are complex obstacles and narrow passages in unknown environments. Accordingly, a novel multi-UAV adaptive cooperative formation trajectory planning approach is proposed in this article based on an improved deep reinforcement learning algorithm in unknown obstacle environments, which innovatively introduces long short-term memory (LSTM) recurrent neural network (RNN) into the environment perception end of multi-agent twin delayed deep deterministic policy gradient (MATD3) network, and develops an improved potential field-based dense reward function to strengthen the policy learning efficiency and accelerates the convergence respectively. Moreover, a hierarchical deep reinforcement learning training mechanism, including adaptive formation layer, trajectory planning layer and action execution layer is implemented to explore an optimal trajectory planning policy. Additionally, an adaptive formation maintaining and transformation strategy is presented for UAV swarm to comply with the environment with narrow passages. Simulation results show that the proposed approach is better in policy learning efficiency, optimality of trajectory planning policy and adaptability to narrow passages than that using multi-agent deep deterministic policy gradient (MADDPG) and MATD3.

引用

页码：12484 / 12499

页数：16

共 39 条

[1] Faster Fixed-Time Control of Flexible Spacecraft Attitude Stabilization [J].

Cao, Lu ;

Xiao, Bing ;

Golestani, Mehdi ;

Ran, Dechao .

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (02) :1281-1290

[2] Concentrated Coverage Path Planning Algorithm of UAV Formation tor Aerial Photography [J].

Cao, Yi ;

Cheng, Xianghong ;

Mu, Jinzhen .

IEEE SENSORS JOURNAL, 2022, 22 (11) :11098-11111

[3] Interference Management for Cellular-Connected UAVs: A Deep Reinforcement Learning Approach [J].

Challita, Ursula ;

Saad, Walid ;

Bettstetter, Christian .

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2019, 18 (04) :2125-2140

[4] A Clustering-Based Coverage Path Planning Method for Autonomous Heterogeneous UAVs [J].

Chen, Jinchao ;

Du, Chenglie ;

Zhang, Ying ;

Han, Pengcheng ;

Wei, Wei .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (12) :25546-25556

[5] Enhancing AIoT Device Association With Task Offloading in Aerial MEC Networks [J].

Chen, Jingxuan ;

Yang, Peng ;

Ren, Siqiao ;

Zhao, Zhongliang ;

Cao, Xianbin ;

Wu, Dapeng .

IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (01) :174-187

[6] Deep Reinforcement Learning Based Resource Allocation in Multi-UAV-Aided MEC Networks [J].

Chen, Jingxuan ;

Cao, Xianbin ;

Yang, Peng ;

Xiao, Meng ;

Ren, Siqiao ;

Zhao, Zhongliang ;

Wu, Dapeng Oliver .

IEEE TRANSACTIONS ON COMMUNICATIONS, 2023, 71 (01) :296-309

[7] Deep Reinforcement Learning for Trajectory Path Planning and Distributed Inference in Resource-Constrained UAV Swarms [J].

Dhuheir, Marwan ;

Baccour, Emna ;

Erbad, Aiman ;

Al-Obaidi, Sinan Sabeeh ;

Hamdi, Mounir .

IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (09) :8185-8201

[8] Neural Combinatorial Deep Reinforcement Learning for Age-Optimal Joint Trajectory and Scheduling Design in UAV-Assisted Networks [J].

Ferdowsi, Aidin ;

Abd-Elmagid, Mohamed A. ;

Saad, Walid ;

Dhillon, Harpreet S. .

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2021, 39 (05) :1250-1265

[9] A Transfer Learning Approach for UAV Path Design With Connectivity Outage Constraint [J].

Fontanesi, Gianluca ;

Zhu, Anding ;

Arvaneh, Mahnaz ;

Ahmadi, Hamed .

IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (06) :4998-5012

[10] Improved Deep Deterministic Policy Gradient for Dynamic Obstacle Avoidance of Mobile Robot [J].

Gao, Xiaoshan ;

Yan, Liang ;

Li, Zhijun ;

Wang, Gang ;

Chen, I-Ming .

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (06) :3675-3682

← 1 2 3 4 →