Multi-agent collaboration based on RGMAAC algorithm under partial observability

被引：0

作者：

Wang Z.-H. ^{[1
]}

Zhang Y.-X. ^{[1
]}

Huang Z.-Q. ^{[2
]}

Yin C.-K. ^{[1
]}

机构：

[1] School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing

[2] Department of Information Science, Beijing University of Technology, Beijing

来源：

Kongzhi yu Juece/Control and Decision | 2023年 / 38卷 / 05期

关键词：

communication between agents; deep reinforcement learning; MADDPG; multi-agent; partial observable;

D O I：

10.13195/j.kzyjc.2022.0422

中图分类号：

学科分类号：

摘要：

Multi-agent deep reinforcement learning (MADRL) applies the ideas and algorithms of deep reinforcement learning to the learning and control of multi-agent systems, which is an important method to develop multi-agent systems with swarm agents. Existing MADRL studies mainly design algorithms based on the assumption that the environment is completely observable or communication resources are not limited. However, partial observability is an objective problem in the practical application of multi-agent systems. For example, the observation range of agentsis is usually limited, and the complete environmental information is not included outside the observable range, which makes it difficult for multiagent collaboration. Aiming at the problem of partial observability in real scenes, based on the paradigm of centralized training and distributed execution, this paper extends the deep reinforcement learning algorithm Actor-Critic to multiagent systems and adds communication channels and gating mechanisms between agents, finally proposes a recurrent gated multi-agent Actor-Critic (RGMAAC) algorithm. Agents can communicate efficiently based on the historical action observation sequence, and finally use the local observation, the historical observation sequence and observations shared by other agents through communication channels to make behavior decisions. Meanwhile, based on the multi-agent particle environment, the multi-agent task of synchronous and fast arrival is designed, and two reward value functions and task scenarios are designed respectively. The experimental results show that the trained agent with the RGMAAC algorithm performs well and is superior to the baseline algorithm in terms of stability when some observable problems clearly appear in the task scenario. © 2023 Northeast University. All rights reserved.

引用

页码：1267 / 1277

页数：10

共 20 条

[1] Liang X X, Feng Y H, Ma Y, Et al., Deep multi-agent reinforcement learning: A survey, Acta Automatica Sinica, 46, 12, pp. 2537-2557, (2020)
[2] Sun C Y, Mu C X., Important scientific problems of multi-agent deep reinforcement learning, Acta Automatica Sinica, 46, 7, pp. 1301-1312, (2020)
[3] Cao Y C, Yu W W, Ren W, Et al., An overview of recent progress in the study of distributed multi-agent coordination, IEEE Transactions on Industrial Informatics, 9, 1, pp. 427-438, (2013)
[4] Ye D, Zhang M, Yang Y., A multi-agent frame work for packet routing in wireless sensor networks, Sensors, 15, 5, pp. 10026-10047, (2015)
[5] Huttenrauch M, Sosic A, Neumann G., Guided deep reinforcement learning for swarm systems, (2017)
[6] Oliehoek F A, Amato C., Infinite-horizon decPOMDPs, A Concise Introduction to Decentralized POMDPs, pp. 69-77, (2016)
[7] Lowe R, Wu Y, Tamar A, Et al., Multi-agent actor-critic for mixed cooperative-competitive environments, Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6382-6393, (2017)
[8] Wang Y H, Han B N, Wang T H, Et al., DOP: Off-policy multi-agent decomposed policy gradients, (2021)
[9] Chen L, Liang C, Zhang J Y, Et al., A multi-agent reinforcement learning algorithm based on improved DDPG in actor-critic framework, Control and Decision, 36, 1, pp. 75-82, (2021)
[10] Wang J H, Ren Z Z, Liu T, Et al., QPLEX: Duplex dueling multi-agent Q-learning[J/OL], (2021)

← 1 2 →