Optimistic sequential multi-agent reinforcement learning with motivational communication

被引：1

作者：

Huang, Anqi ^{[1
]}

Wang, Yongli ^{[1
]}

Zhou, Xiaoliang ^{[1
]}

Zou, Haochen ^{[1
]}

Dong, Xu ^{[1
]}

Che, Xun ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China

来源：

NEURAL NETWORKS | 2024年 / 179卷

基金：

中国国家自然科学基金;

关键词：

Multi-agent reinforcement learning; Policy gradient; Motivational communication; Reinforcement learning; Multi-agent system;

D O I：

10.1016/j.neunet.2024.106547

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Centralized Training with Decentralized Execution (CTDE) is a prevalent paradigm in the field of fully cooperative Multi-Agent Reinforcement Learning (MARL). Existing algorithms often encounter two major problems: independent strategies tend to underestimate the potential value of actions, leading to the convergence on sub-optimal Nash Equilibria (NE); some communication paradigms introduce added complexity to the learning process, complicating the focus on the essential elements of the messages. To address these challenges, we propose a novel method called O ptimistic S equential S oft Actor Critic with M otivational C ommunication (OSSMC). The key idea of OSSMC is to utilize a greedy-driven approach to explore the potential value of individual policies, named optimistic Q-values, which serve as an upper bound for the Q-value of the current policy. We then integrate a sequential update mechanism with optimistic Q-value for agents, aiming to ensure monotonic improvement in the joint policy optimization process. Moreover, we establish motivational communication modules for each agent to disseminate motivational messages to promote cooperative behaviors. Finally, we employ a value regularization strategy from the Soft Actor Critic (SAC) method to maximize entropy and improve exploration capabilities. The performance of OSSMC was rigorously evaluated against a series of challenging benchmark sets. Empirical results demonstrate that OSSMC not only surpasses current baseline algorithms but also exhibits a more rapid convergence rate.

引用

页数：12

共 50 条

[41] A Multi-agent Reinforcement Learning with Weighted Experience Sharing
Yu, Lasheng
Abdulai, Issahaku
ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2012, 6839 : 219 - 225
[42] A multi-agent reinforcement learning approach to robot soccer
Duan, Yong
Cui, Bao Xia
Xu, Xin He
ARTIFICIAL INTELLIGENCE REVIEW, 2012, 38 (03) : 193 - 211
[43] An overview: Attention mechanisms in multi-agent reinforcement learning
Hu, Kai
Xu, Keer
Xia, Qingfeng
Li, Mingyang
Song, Zhiqiang
Song, Lipeng
Sun, Ning
NEUROCOMPUTING, 2024, 598
[44] Multi-Agent Reinforcement Learning With Decentralized Distribution Correction
Li, Kuo
Jia, Qing-Shan
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 1684 - 1696
[45] Shaping multi-agent systems with gradient reinforcement learning
Olivier Buffet
Alain Dutech
François Charpillet
Autonomous Agents and Multi-Agent Systems, 2007, 15 : 197 - 220
[46] Multi-Agent Reinforcement Learning With Decentralized Distribution Correction
Li, Kuo
Jia, Qing-Shan
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 1684 - 1696
[47] Generating Multi-agent Patrol Areas by Reinforcement Learning
Park, Bumjin
Kang, Cheongwoong
Choi, Jaesik
2021 21ST INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2021), 2021, : 104 - 107
[48] Rationality of reward sharing in multi-agent reinforcement learning
Miyazaki, K
Kobayashi, S
NEW GENERATION COMPUTING, 2001, 19 (02) : 157 - 172
[49] Battlefield Environment Design for Multi-agent Reinforcement Learning
Do, Seungwon
Baek, Jaeuk
Jun, Sungwoo
Lee, Changeun
2022 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (IEEE BIGCOMP 2022), 2022, : 318 - 319
[50] Robust multi-agent reinforcement learning for noisy environments
Xinning Chen
Xuan Liu
Canhui Luo
Jiangjin Yin
Peer-to-Peer Networking and Applications, 2022, 15 : 1045 - 1056

← 1 2 3 4 5 →