Multi-Agent Recurrent Deterministic Policy Gradient with Inter-Agent Communication

被引:0
作者
Cho, Joohyun [1 ]
Liu, Mingxi [1 ]
Zhou, Yi [1 ]
Chen, Rong-Rong [1 ]
机构
[1] Univ Utah, Dept ECE, Salt Lake City, UT 84112 USA
来源
FIFTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, IEEECONF | 2023年
关键词
Multi Agent Reinforcement Learning; Policy Gradient; Partially Observable; Actor-Critic;
D O I
10.1109/IEEECONF59524.2023.10477063
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce a novel approach to multi-agent coordination under partial state and observation, called Multi-Agent Recurrent Deterministic Policy Gradient with Differentiable Inter-Agent Communication (MARDPG-IAC). In such environments, it is difficult for agents to obtain information about the actions and observations of other agents, which can significantly impact their learning performance. To address this challenge, we propose a recurrent structure that accumulates partial observations to infer the hidden information and a communication mechanism that enables agents to exchange information to enhance their learning effectiveness. We employ an asynchronous update scheme to combine the MARDPG algorithm with the differentiable inter-agent communication algorithm, without requiring a replay buffer. Through a case study of building energy control in a power distribution network, we demonstrate that our proposed approach outperforms conventional Multi-Agent Deep Deterministic Policy Gradient (MADDPG) that relies on partial state only.
引用
收藏
页码:1394 / 1398
页数:5
相关论文
共 8 条
[1]   Learning to Collaborate: Multi-Scenario Ranking via Multi-Agent Reinforcement Learning [J].
Feng, Jun ;
Li, Heng ;
Huang, Minlie ;
Liu, Shichen ;
Ou, Wenwu ;
Wang, Zhirong ;
Zhu, Xiaoyan .
WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW2018), 2018, :1939-1948
[2]  
Foerster JN, 2016, ADV NEUR IN, V29
[3]  
Hausknecht M., 2015, P AAAI FALL S SER, P29
[4]  
Heess N, 2015, Arxiv, DOI arXiv:1512.04455
[5]  
Jiang JC, 2018, ADV NEUR IN, V31
[6]  
Lowe R, 2017, ADV NEUR IN, V30
[7]  
Sukhbaatar S, 2016, ADV NEUR IN, V29
[8]  
Vaswani A, 2017, ADV NEUR IN, V30