Optimistic sequential multi-agent reinforcement learning with motivational communication

被引：1

作者：

Huang, Anqi ^{[1
]}

Wang, Yongli ^{[1
]}

Zhou, Xiaoliang ^{[1
]}

Zou, Haochen ^{[1
]}

Dong, Xu ^{[1
]}

Che, Xun ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China

来源：

NEURAL NETWORKS | 2024年 / 179卷

基金：

中国国家自然科学基金;

关键词：

Multi-agent reinforcement learning; Policy gradient; Motivational communication; Reinforcement learning; Multi-agent system;

D O I：

10.1016/j.neunet.2024.106547

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Centralized Training with Decentralized Execution (CTDE) is a prevalent paradigm in the field of fully cooperative Multi-Agent Reinforcement Learning (MARL). Existing algorithms often encounter two major problems: independent strategies tend to underestimate the potential value of actions, leading to the convergence on sub-optimal Nash Equilibria (NE); some communication paradigms introduce added complexity to the learning process, complicating the focus on the essential elements of the messages. To address these challenges, we propose a novel method called O ptimistic S equential S oft Actor Critic with M otivational C ommunication (OSSMC). The key idea of OSSMC is to utilize a greedy-driven approach to explore the potential value of individual policies, named optimistic Q-values, which serve as an upper bound for the Q-value of the current policy. We then integrate a sequential update mechanism with optimistic Q-value for agents, aiming to ensure monotonic improvement in the joint policy optimization process. Moreover, we establish motivational communication modules for each agent to disseminate motivational messages to promote cooperative behaviors. Finally, we employ a value regularization strategy from the Soft Actor Critic (SAC) method to maximize entropy and improve exploration capabilities. The performance of OSSMC was rigorously evaluated against a series of challenging benchmark sets. Empirical results demonstrate that OSSMC not only surpasses current baseline algorithms but also exhibits a more rapid convergence rate.

引用

页数：12

共 50 条

[31] Cooperative Multi-Agent Reinforcement Learning With Approximate Model Learning
Park, Young Joon
Lee, Young Jae
Kim, Seoung Bum
IEEE ACCESS, 2020, 8 : 125389 - 125400
[32] A Cooperative Multi-Agent Reinforcement Learning Method Based on Coordination Degree
Cui, Haoyan
Zhang, Zhen
IEEE ACCESS, 2021, 9 : 123805 - 123814
[33] Rationality of reward sharing in multi-agent reinforcement learning
Kazuteru Miyazaki
Shigenobu Kobayashi
New Generation Computing, 2001, 19 : 157 - 172
[34] The Cooperative Reinforcement Learning in a Multi-Agent Design System
Liu, Hong
Wang, Jihua
PROCEEDINGS OF THE 2013 IEEE 17TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2013, : 139 - 144
[35] A multi-agent reinforcement learning approach to robot soccer
Yong Duan
Bao Xia Cui
Xin He Xu
Artificial Intelligence Review, 2012, 38 : 193 - 211
[36] Modelling Stock Markets by Multi-agent Reinforcement Learning
Lussange, Johann
Lazarevich, Ivan
Bourgeois-Gironde, Sacha
Palminteri, Stefano
Gutkin, Boris
COMPUTATIONAL ECONOMICS, 2021, 57 (01) : 113 - 147
[37] Robust multi-agent reinforcement learning for noisy environments
Chen, Xinning
Liu, Xuan
Luo, Canhui
Yin, Jiangjin
PEER-TO-PEER NETWORKING AND APPLICATIONS, 2022, 15 (02) : 1045 - 1056
[38] A reinforcement learning scheme for a multi-agent card game
Fujita, H
Matsuno, Y
Ishii, S
2003 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-5, CONFERENCE PROCEEDINGS, 2003, : 4071 - 4078
[39] Shaping multi-agent systems with gradient reinforcement learning
Buffet, Olivier
Dutech, Alain
Charpillet, Francois
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2007, 15 (02) : 197 - 220
[40] Training Cooperative Agents for Multi-Agent Reinforcement Learning
Bhalla, Sushrut
Subramanian, Sriram G.
Crowley, Mark
AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1826 - 1828

← 1 2 3 4 5 →