Attention Enhanced Reinforcement Learning for Multi agent Cooperation

被引：22

作者：

Pu, Zhiqiang ^{[1
]}

Wang, Huimu ^{[1
,2
]}

Liu, Zhen ^{[1
]}

Yi, Jianqiang ^{[1
]}

Wu, Shiguang ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2023年 / 34卷 / 11期

基金：

中国国家自然科学基金;

关键词：

Training; Reinforcement learning; Games; Scalability; Task analysis; Standards; Optimization; Attention mechanism; deep reinforcement learning (DRL); graph convolutional networks; multi agent systems; LEVEL; GAME; GO;

D O I：

10.1109/TNNLS.2022.3146858

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this article, a novel method, called attention enhanced reinforcement learning (AERL), is proposed to address issues including complex interaction, limited communication range, and time-varying communication topology for multi agent cooperation. AERL includes a communication enhanced network (CEN), a graph spatiotemporal long short-term memory network (GST-LSTM), and parameters sharing multi-pseudo critic proximal policy optimization (PS-MPC-PPO). Specifically, CEN based on graph attention mechanism is designed to enlarge the agents' communication range and to deal with complex interaction among the agents. GST-LSTM, which replaces the standard fully connected (FC) operator in LSTM with graph attention operator, is designed to capture the temporal dependence while maintaining the spatial structure learned by CEN. PS-MPC-PPO, which extends proximal policy optimization (PPO) in multi agent systems with parameters' sharing to scale to environments with a large number of agents in training, is designed with multi-pseudo critics to mitigate the bias problem in training and accelerate the convergence process. Simulation results for three groups of representative scenarios including formation control, group containment, and predator-prey games demonstrate the effectiveness and robustness of AERL.

引用

页码：8235 / 8249

页数：15

共 50 条

[1] A Soft Graph Attention Reinforcement Learning for Multi-Agent Cooperation
Wang, Huimu
Pu, Zhiqiang
Liu, Zhen
Yi, Jianqiang
Qiu, Tenghai
2020 IEEE 16TH INTERNATIONAL CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE), 2020, : 1257 - 1262
[2] GAMA: Graph Attention Multi-agent reinforcement learning algorithm for cooperation
Chen, Haoqiang
Liu, Yadong
Zhou, Zongtan
Hu, Dewen
Zhang, Ming
APPLIED INTELLIGENCE, 2020, 50 (12) : 4195 - 4205
[3] GAMA: Graph Attention Multi-agent reinforcement learning algorithm for cooperation
Haoqiang Chen
Yadong Liu
Zongtan Zhou
Dewen Hu
Ming Zhang
Applied Intelligence, 2020, 50 : 4195 - 4205
[4] Multi-Agent Cognition Difference Reinforcement Learning for Multi-Agent Cooperation
Wang, Huimu
Qiu, Tenghai
Liu, Zhen
Pu, Zhiqiang
Yi, Jianqiang
Yuan, Wanmai
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[5] A cooperation model using reinforcement learning for multi-agent
Lee, M
Lee, J
Jeong, HJ
Lee, Y
Choi, S
Gatton, TM
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2006, PT 5, 2006, 3984 : 675 - 681
[6] Multipath Routing for Traffic Engineering with Hypergraph Attention Enhanced Multi-Agent Reinforcement Learning
Cai, Xuhong
Chen, Yi
2022 31ST WIRELESS AND OPTICAL COMMUNICATIONS CONFERENCE (WOCC), 2022, : 103 - 108
[7] SparseMAAC: Sparse Attention for Multi-agent Reinforcement Learning
Li, Wenhao
Jin, Bo
Wang, Xiangfeng
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2019, 11448 : 96 - 110
[8] An overview: Attention mechanisms in multi-agent reinforcement learning
Hu, Kai
Xu, Keer
Xia, Qingfeng
Li, Mingyang
Song, Zhiqiang
Song, Lipeng
Sun, Ning
NEUROCOMPUTING, 2024, 598
[9] Quantum Multi-Agent Reinforcement Learning for Autonomous Mobility Cooperation
Park, Soohyun
Kim, Jae Pyoung
Park, Chanyoung
Jung, Soyi
Kim, Joongheon
IEEE COMMUNICATIONS MAGAZINE, 2024, 62 (06) : 106 - 112
[10] Research on cooperation and reinforcement learning algorithm in multi-agent systems
Zheng, Shuli
Han, Jianghong
Luo, Xiangfeng
Jiang, Jianwen
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2002, 15 (04): : 453 - 457

← 1 2 3 4 5 →