Multi-UAV Adaptive Cooperative Formation Trajectory Planning Based on an Improved MATD3 Algorithm of Deep Reinforcement Learning

被引：8

作者：

Xing, Xiaojun ^{[1
,2
]}

Zhou, Zhiwei ^{[1
]}

Li, Yan ^{[1
]}

Xiao, Bing ^{[1
]}

Xun, Yilin ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Automat, Xian 710129, Peoples R China

[2] Northwestern Polytech Univ Shenzhen, Res & Dev Inst, Shenzhen 518063, Peoples R China

来源：

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY | 2024年 / 73卷 / 09期

基金：

中国国家自然科学基金;

关键词：

Autonomous aerial vehicles; Trajectory planning; Trajectory; Deep reinforcement learning; Planning; Reinforcement learning; Long short term memory; Multi-unmanned aerial vehicle (multi-UAV) cooperative formation trajectory planning; deep reinforcement learning; potential field-based dense reward; adaptive formation strategy; hierarchical training mechanism; SPACECRAFT; NAVIGATION; AVOIDANCE; DESIGN;

D O I：

10.1109/TVT.2024.3389555

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Multi-unmanned aerial vehicle (multi-UAV) cooperative trajectory planning is an extremely challenging issue in UAV research field due to its NP-hard characteristic, collision avoiding constraints, close formation requirement, consensus convergence and high-dimensional action space etc. Especially, the difficulty of multi-UAV trajectory planning will boost comparatively when there are complex obstacles and narrow passages in unknown environments. Accordingly, a novel multi-UAV adaptive cooperative formation trajectory planning approach is proposed in this article based on an improved deep reinforcement learning algorithm in unknown obstacle environments, which innovatively introduces long short-term memory (LSTM) recurrent neural network (RNN) into the environment perception end of multi-agent twin delayed deep deterministic policy gradient (MATD3) network, and develops an improved potential field-based dense reward function to strengthen the policy learning efficiency and accelerates the convergence respectively. Moreover, a hierarchical deep reinforcement learning training mechanism, including adaptive formation layer, trajectory planning layer and action execution layer is implemented to explore an optimal trajectory planning policy. Additionally, an adaptive formation maintaining and transformation strategy is presented for UAV swarm to comply with the environment with narrow passages. Simulation results show that the proposed approach is better in policy learning efficiency, optimality of trajectory planning policy and adaptability to narrow passages than that using multi-agent deep deterministic policy gradient (MADDPG) and MATD3.

引用

页码：12484 / 12499

页数：16

共 50 条

[31] Reinforcement-Learning-Assisted Multi-UAV Task Allocation and Path Planning for IIoT
Zhao, Guodong
Wang, Ye
Mu, Tong
Meng, Zhijun
Wang, Zichen
IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (16): : 26766 - 26777
[32] Adaptive Multi-UAV Trajectory Planning Leveraging Digital Twin Technology for Urban IIoT Applications
Zhao, Liang
Li, Shuo
Guan, Yunchong
Wan, Shaohua
Hawbani, Ammar
Bi, Yuanguo
Guizani, Mohsen
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2024, 11 (06): : 5349 - 5363
[33] Multi-Agent Deep Reinforcement Learning for Trajectory Design and Power Allocation in Multi-UAV Networks
Zhao, Nan
Liu, Zehua
Cheng, Yiqiang
IEEE ACCESS, 2020, 8 : 139670 - 139679
[34] Multi-UAV simultaneous target assignment and path planning based on deep reinforcement learning in dynamic multiple obstacles environments
Kong, Xiaoran
Zhou, Yatong
Li, Zhe
Wang, Shaohai
FRONTIERS IN NEUROROBOTICS, 2024, 17
[35] Onboard Distributed Trajectory Planning through Intelligent Search for Multi-UAV Cooperative Flight
Lu, Kunfeng
Hu, Ruiguang
Yao, Zheng
Wang, Huixia
DRONES, 2023, 7 (01)
[36] Deep reinforcement learning based trajectory design and resource allocation for task-aware multi-UAV enabled MEC networks
Li, Zewu
Xu, Chen
Zhang, Zhanpeng
Wu, Runze
COMPUTER COMMUNICATIONS, 2024, 213 : 88 - 98
[37] Deep Reinforcement Learning for Multi-UAV Exploration Under Energy Constraints
Zhou, Yating
Shi, Dianxi
Yang, Huanhuan
Hu, Haomeng
Yang, Shaowu
Zhang, Yongjun
COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2022, PT II, 2022, 461 : 363 - 379
[38] Three-Dimension Trajectory Design for Multi-UAV Wireless Network With Deep Reinforcement Learning
Zhang, Wenqi
Wang, Qiang
Liu, Xiao
Liu, Yuanwei
Chen, Yue
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2021, 70 (01) : 600 - 612
[39] Multi-UAV Trajectory Planning for Energy-Efficient Content Coverage: A Decentralized Learning-Based Approach
Zhao, Chenxi
Liu, Junyu
Sheng, Min
Teng, Wei
Zheng, Yang
Li, Jiandong
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2021, 39 (10) : 3193 - 3207
[40] A deep reinforcement learning based distributed multi-UAV dynamic area coverage algorithm for complex environment
Xiao, Jian
Yuan, Guohui
Xue, Yuxi
He, Jinhui
Wang, Yaoting
Zou, Yuanjiang
Wang, Zhuoran
NEUROCOMPUTING, 2024, 595

← 1 2 3 4 5 →