Multi-UAV Adaptive Cooperative Formation Trajectory Planning Based on an Improved MATD3 Algorithm of Deep Reinforcement Learning

被引:8
|
作者
Xing, Xiaojun [1 ,2 ]
Zhou, Zhiwei [1 ]
Li, Yan [1 ]
Xiao, Bing [1 ]
Xun, Yilin [1 ]
机构
[1] Northwestern Polytech Univ, Sch Automat, Xian 710129, Peoples R China
[2] Northwestern Polytech Univ Shenzhen, Res & Dev Inst, Shenzhen 518063, Peoples R China
基金
中国国家自然科学基金;
关键词
Autonomous aerial vehicles; Trajectory planning; Trajectory; Deep reinforcement learning; Planning; Reinforcement learning; Long short term memory; Multi-unmanned aerial vehicle (multi-UAV) cooperative formation trajectory planning; deep reinforcement learning; potential field-based dense reward; adaptive formation strategy; hierarchical training mechanism; SPACECRAFT; NAVIGATION; AVOIDANCE; DESIGN;
D O I
10.1109/TVT.2024.3389555
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Multi-unmanned aerial vehicle (multi-UAV) cooperative trajectory planning is an extremely challenging issue in UAV research field due to its NP-hard characteristic, collision avoiding constraints, close formation requirement, consensus convergence and high-dimensional action space etc. Especially, the difficulty of multi-UAV trajectory planning will boost comparatively when there are complex obstacles and narrow passages in unknown environments. Accordingly, a novel multi-UAV adaptive cooperative formation trajectory planning approach is proposed in this article based on an improved deep reinforcement learning algorithm in unknown obstacle environments, which innovatively introduces long short-term memory (LSTM) recurrent neural network (RNN) into the environment perception end of multi-agent twin delayed deep deterministic policy gradient (MATD3) network, and develops an improved potential field-based dense reward function to strengthen the policy learning efficiency and accelerates the convergence respectively. Moreover, a hierarchical deep reinforcement learning training mechanism, including adaptive formation layer, trajectory planning layer and action execution layer is implemented to explore an optimal trajectory planning policy. Additionally, an adaptive formation maintaining and transformation strategy is presented for UAV swarm to comply with the environment with narrow passages. Simulation results show that the proposed approach is better in policy learning efficiency, optimality of trajectory planning policy and adaptability to narrow passages than that using multi-agent deep deterministic policy gradient (MADDPG) and MATD3.
引用
收藏
页码:12484 / 12499
页数:16
相关论文
共 50 条
  • [31] Reinforcement-Learning-Assisted Multi-UAV Task Allocation and Path Planning for IIoT
    Zhao, Guodong
    Wang, Ye
    Mu, Tong
    Meng, Zhijun
    Wang, Zichen
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (16): : 26766 - 26777
  • [32] Adaptive Multi-UAV Trajectory Planning Leveraging Digital Twin Technology for Urban IIoT Applications
    Zhao, Liang
    Li, Shuo
    Guan, Yunchong
    Wan, Shaohua
    Hawbani, Ammar
    Bi, Yuanguo
    Guizani, Mohsen
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2024, 11 (06): : 5349 - 5363
  • [33] Multi-Agent Deep Reinforcement Learning for Trajectory Design and Power Allocation in Multi-UAV Networks
    Zhao, Nan
    Liu, Zehua
    Cheng, Yiqiang
    IEEE ACCESS, 2020, 8 : 139670 - 139679
  • [34] Multi-UAV simultaneous target assignment and path planning based on deep reinforcement learning in dynamic multiple obstacles environments
    Kong, Xiaoran
    Zhou, Yatong
    Li, Zhe
    Wang, Shaohai
    FRONTIERS IN NEUROROBOTICS, 2024, 17
  • [35] Onboard Distributed Trajectory Planning through Intelligent Search for Multi-UAV Cooperative Flight
    Lu, Kunfeng
    Hu, Ruiguang
    Yao, Zheng
    Wang, Huixia
    DRONES, 2023, 7 (01)
  • [36] Deep reinforcement learning based trajectory design and resource allocation for task-aware multi-UAV enabled MEC networks
    Li, Zewu
    Xu, Chen
    Zhang, Zhanpeng
    Wu, Runze
    COMPUTER COMMUNICATIONS, 2024, 213 : 88 - 98
  • [37] Deep Reinforcement Learning for Multi-UAV Exploration Under Energy Constraints
    Zhou, Yating
    Shi, Dianxi
    Yang, Huanhuan
    Hu, Haomeng
    Yang, Shaowu
    Zhang, Yongjun
    COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2022, PT II, 2022, 461 : 363 - 379
  • [38] Three-Dimension Trajectory Design for Multi-UAV Wireless Network With Deep Reinforcement Learning
    Zhang, Wenqi
    Wang, Qiang
    Liu, Xiao
    Liu, Yuanwei
    Chen, Yue
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2021, 70 (01) : 600 - 612
  • [39] Multi-UAV Trajectory Planning for Energy-Efficient Content Coverage: A Decentralized Learning-Based Approach
    Zhao, Chenxi
    Liu, Junyu
    Sheng, Min
    Teng, Wei
    Zheng, Yang
    Li, Jiandong
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2021, 39 (10) : 3193 - 3207
  • [40] A deep reinforcement learning based distributed multi-UAV dynamic area coverage algorithm for complex environment
    Xiao, Jian
    Yuan, Guohui
    Xue, Yuxi
    He, Jinhui
    Wang, Yaoting
    Zou, Yuanjiang
    Wang, Zhuoran
    NEUROCOMPUTING, 2024, 595