Graph attention mechanism based reinforcement learning for multi-agent flocking control in communication-restricted environment

被引:30
作者
Xiao, Jian [1 ,2 ]
Yuan, Guohui [1 ,2 ]
He, Jinhui [1 ,2 ]
Fang, Kai [3 ]
Wang, Zhuoran [1 ,2 ,3 ,4 ]
机构
[1] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China
[2] Univ Elect Sci & Technol China, Yangtze Delta Reg Inst Quzhou, Quzhou 324000, Zhejiang, Peoples R China
[3] Quzhou Univ, Coll Elect & Informat Engn, Quzhou, Zhejiang, Peoples R China
[4] Zhe Jiang Qi Chao Cable CO LTD, Quzhou, Zhejiang, Peoples R China
关键词
Reinforcement learning (RL); Graph attention (GAT) mechanism; Flocking cooperative control; Communication -restricted environment; AVOIDANCE; DYNAMICS; POLICIES; SYSTEMS; UAVS;
D O I
10.1016/j.ins.2022.11.059
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To solve the poor performance of reinforcement learning (RL) in the multi-agent flocking cooperative control under the communication-restricted environments, we propose a multi-agent cooperative RL (MACRL) method based on the equivalent characteristics of the agents in the flocking task. A distance graph attention (GAT) mechanism is introduced into the policy network of the proposed MACRL to change the agent's attention weights related to neighbors and reduce the influence of remote neighbors which have poor com-munication quality on the agent's behavioral decision-making. Furthermore, a distance GAT-based MACRL (DGAT-MACRL) algorithm is proposed for multi-agent flocking control in the communication-restricted environment. The simulation results show that the pro-posed flocking algorithm has good adaptability to the communication delay and the com-munication distance constraint environments. The flocking control effect is significantly better than other RL-based flocking algorithms and traditional flocking algorithms. In addition, the good experimental performance has also confirmed that the proposed DGAT-MACRL is an effective solution to the problem of how to improve the adaptability of traditional RL to the flocking control system with a dynamic scale. Our algorithm provides a novel and practical algorithm offering an effective method for the cooperative task accomplished by multi-agent in non-ideal environments.(c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:142 / 157
页数:16
相关论文
共 45 条
[1]   Self-Organised Collision-Free Flocking Mechanism in Heterogeneous Robot Swarms [J].
Ban, Zhe ;
Hu, Junyan ;
Lennox, Barry ;
Arvin, Farshad .
MOBILE NETWORKS & APPLICATIONS, 2021, 26 (06) :2461-2471
[2]   From disorder to order in marching locusts [J].
Buhl, J ;
Sumpter, DJT ;
Couzin, ID ;
Hale, JJ ;
Despland, E ;
Miller, ER ;
Simpson, SJ .
SCIENCE, 2006, 312 (5778) :1402-1406
[3]   Confidence-Aware Reinforcement Learning for Self-Driving Cars [J].
Cao, Zhong ;
Xu, Shaobing ;
Peng, Huei ;
Yang, Diange ;
Zidek, Robert .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (07) :7419-7430
[4]   Emergent behavior in flocks [J].
Cucker, Felipe ;
Smale, Steve .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2007, 52 (05) :852-862
[5]   Neural networks based reinforcement learning for mobile robots obstacle avoidance [J].
Duguleana, Mihai ;
Mogan, Gheorghe .
EXPERT SYSTEMS WITH APPLICATIONS, 2016, 62 :104-115
[6]   Reinforcement Learning endowed with safe veto policies to learn the control of Linked-Multicomponent Robotic Systems [J].
Fernandez-Gauna, Borja ;
Grana, Manuel ;
Manuel Lopez-Guede, Jose ;
Etxeberria-Agiriano, Ismael ;
Ansoategui, Igor .
INFORMATION SCIENCES, 2015, 317 :25-47
[7]   Short-term traffic speed forecasting based on graph attention temporal convolutional networks [J].
Guo, Ge ;
Yuan, Wei .
NEUROCOMPUTING, 2020, 410 :387-393
[8]   Deep attention networks reveal the rules of collective motion in zebrafish [J].
Heras, Francisco J. H. ;
Romero-Ferrero, Francisco ;
Hinz, Robert C. ;
de Polavieja, Gonzalo G. .
PLOS COMPUTATIONAL BIOLOGY, 2019, 15 (09)
[9]  
Hu LM, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P4821
[10]   A Q-Learning Approach to Flocking With UAVs in a Stochastic Environment [J].
Hung, Shao-Ming ;
Givigi, Sidney N. .
IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (01) :186-197