Collision-Avoiding Flocking With Multiple Fixed-Wing UAVs in Obstacle-Cluttered Environments: A Task-Specific Curriculum-Based MADRL Approach

被引：45

作者：

Yan, Chao ^{[1
,2
]}

Wang, Chang ^{[1
]}

Xiang, Xiaojia ^{[1
]}

Low, Kin Huat ^{[2
]}

Wang, Xiangke ^{[1
]}

Xu, Xin ^{[1
]}

Shen, Lincheng ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Intelligence Sci & Technol, Changsha 410073, Peoples R China

[2] Nanyang Technol Univ, Sch Mech & Aerosp Engn, Singapore 639798, Singapore

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Task analysis; Collision avoidance; Sensors; Reinforcement learning; Numerical models; Vehicle dynamics; Training; Curriculum learning; fixed-wing unmanned aerial vehicles (UAVs); flocking; multiagent deep reinforcement learning (MADRL); obstacle avoidance;

D O I：

10.1109/TNNLS.2023.3245124

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multiple unmanned aerial vehicles (UAVs) are able to efficiently accomplish a variety of tasks in complex scenarios. However, developing a collision-avoiding flocking policy for multiple fixed-wing UAVs is still challenging, especially in obstacle-cluttered environments. In this article, we propose a novel curriculum-based multiagent deep reinforcement learning (MADRL) approach called task-specific curriculum-based MADRL (TSCAL) to learn the decentralized flocking with obstacle avoidance policy for multiple fixed-wing UAVs. The core idea is to decompose the collision-avoiding flocking task into multiple subtasks and progressively increase the number of subtasks to be solved in a staged manner. Meanwhile, TSCAL iteratively alternates between the procedures of online learning and offline transfer. For online learning, we propose a hierarchical recurrent attention multiagent actor-critic (HRAMA) algorithm to learn the policies for the corresponding subtask(s) in each learning stage. For offline transfer, we develop two transfer mechanisms, i.e., model reload and buffer reuse, to transfer knowledge between two neighboring stages. A series of numerical simulations demonstrate the significant advantages of TSCAL in terms of policy optimality, sample efficiency, and learning stability. Finally, the high-fidelity hardware-in-the-loop (HITL) simulation is conducted to verify the adaptability of TSCAL. A video about the numerical and HITL simulations is available at https://youtu.be/R9yLJNYRIqY.

引用

页码：10894 / 10908

页数：15

共 30 条

[1] Implementation of Decentralized Reinforcement Learning-Based Multi-Quadrotor Flocking [J].

Abichandani, Pramod ;

Speck, Christian ;

Bucci, Donald ;

Mcintyre, William ;

Lobo, Deepan .

IEEE ACCESS, 2021, 9 :132491-132507

[2] An End-to-End Curriculum Learning Approach for Autonomous Driving Scenarios [J].

Anzalone, Luca ;

Barra, Paola ;

Barra, Silvio ;

Castiglione, Aniello ;

Nappi, Michele .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (10) :19817-19826

[3]

Batra Sumeet., 2022, C ROB LEARN, P576

[4] An overview on optimal flocking [J].

Beaver, Logan E. ;

Malikopoulos, Andreas A. .

ANNUAL REVIEWS IN CONTROL, 2021, 51 :88-99

[5] Scale-free correlations in starling flocks [J].

Cavagna, Andrea ;

Cimarelli, Alessio ;

Giardina, Irene ;

Parisi, Giorgio ;

Santagati, Raffaele ;

Stefanini, Fabio ;

Viale, Massimiliano .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (26) :11865-11870

[6] Coordinated Path-Following Control of Fixed-Wing Unmanned Aerial Vehicles [J].

Chen, Hao ;

Cong, Yirui ;

Wang, Xiangke ;

Xu, Xin ;

Shen, Lincheng .

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (04) :2540-2554

[7] Collective memory and spatial sorting in animal groups [J].

Couzin, ID ;

Krause, J ;

James, R ;

Ruxton, GD ;

Franks, NR .

JOURNAL OF THEORETICAL BIOLOGY, 2002, 218 (01) :1-11

[8] Decentralized Multi-Agent Pursuit Using Deep Reinforcement Learning [J].

de Souza, Cristino, Jr. ;

Newbury, Rhys ;

Cosgun, Akansel ;

Castillo, Pedro ;

Vidolov, Boris ;

Kulic, Dana .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (03) :4552-4559

[9] Three-dimensional trajectories and network analyses of group behaviour within chimney swift flocks during approaches to the roost [J].

Evangelista, Dennis J. ;

Ray, Dylan D. ;

Raja, Sathish K. ;

Hedrick, Tyson L. .

PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2017, 284 (1849)

[10] Unraveling hidden interactions in complex systems with deep learning [J].

Ha, Seungwoong ;

Jeong, Hawoong .

SCIENTIFIC REPORTS, 2021, 11 (01)

← 1 2 3 →