Attention Enhanced Reinforcement Learning for Multi agent Cooperation

被引:33
作者
Pu, Zhiqiang [1 ]
Wang, Huimu [1 ,2 ]
Liu, Zhen [1 ]
Yi, Jianqiang [1 ]
Wu, Shiguang [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
基金
中国国家自然科学基金;
关键词
Training; Reinforcement learning; Games; Scalability; Task analysis; Standards; Optimization; Attention mechanism; deep reinforcement learning (DRL); graph convolutional networks; multi agent systems; LEVEL; GAME; GO;
D O I
10.1109/TNNLS.2022.3146858
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article, a novel method, called attention enhanced reinforcement learning (AERL), is proposed to address issues including complex interaction, limited communication range, and time-varying communication topology for multi agent cooperation. AERL includes a communication enhanced network (CEN), a graph spatiotemporal long short-term memory network (GST-LSTM), and parameters sharing multi-pseudo critic proximal policy optimization (PS-MPC-PPO). Specifically, CEN based on graph attention mechanism is designed to enlarge the agents' communication range and to deal with complex interaction among the agents. GST-LSTM, which replaces the standard fully connected (FC) operator in LSTM with graph attention operator, is designed to capture the temporal dependence while maintaining the spatial structure learned by CEN. PS-MPC-PPO, which extends proximal policy optimization (PPO) in multi agent systems with parameters' sharing to scale to environments with a large number of agents in training, is designed with multi-pseudo critics to mitigate the bias problem in training and accelerate the convergence process. Simulation results for three groups of representative scenarios including formation control, group containment, and predator-prey games demonstrate the effectiveness and robustness of AERL.
引用
收藏
页码:8235 / 8249
页数:15
相关论文
共 42 条
[1]  
Agarwal A., 2019, CoRR
[2]  
Bowling, 2004, ADV NEURAL INFORM PR, P209
[3]   Multiagent learning using a variable learning rate [J].
Bowling, M ;
Veloso, M .
ARTIFICIAL INTELLIGENCE, 2002, 136 (02) :215-250
[4]  
Choi J., 2017, P 21 AAAI C ART INT
[5]  
Foerster JN, 2018, AAAI CONF ARTIF INTE, P2974
[6]  
Gupta Jayesh K., 2017, Autonomous Agents and Multiagent Systems, AAMAS 2017: Workshops, Best Papers. Revised Selected Papers: LNAI 10642, P66, DOI 10.1007/978-3-319-71682-4_5
[7]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[8]  
Iqbal S, 2019, PR MACH LEARN RES, V97
[9]  
Jiang J., 2018, ARXIV181009202
[10]  
Jiang JC, 2018, ADV NEUR IN, V31