Optimistic sequential multi-agent reinforcement learning with motivational communication

被引：1

作者：

Huang, Anqi ^{[1
]}

Wang, Yongli ^{[1
]}

Zhou, Xiaoliang ^{[1
]}

Zou, Haochen ^{[1
]}

Dong, Xu ^{[1
]}

Che, Xun ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China

来源：

NEURAL NETWORKS | 2024年 / 179卷

基金：

中国国家自然科学基金;

关键词：

Multi-agent reinforcement learning; Policy gradient; Motivational communication; Reinforcement learning; Multi-agent system;

D O I：

10.1016/j.neunet.2024.106547

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Centralized Training with Decentralized Execution (CTDE) is a prevalent paradigm in the field of fully cooperative Multi-Agent Reinforcement Learning (MARL). Existing algorithms often encounter two major problems: independent strategies tend to underestimate the potential value of actions, leading to the convergence on sub-optimal Nash Equilibria (NE); some communication paradigms introduce added complexity to the learning process, complicating the focus on the essential elements of the messages. To address these challenges, we propose a novel method called O ptimistic S equential S oft Actor Critic with M otivational C ommunication (OSSMC). The key idea of OSSMC is to utilize a greedy-driven approach to explore the potential value of individual policies, named optimistic Q-values, which serve as an upper bound for the Q-value of the current policy. We then integrate a sequential update mechanism with optimistic Q-value for agents, aiming to ensure monotonic improvement in the joint policy optimization process. Moreover, we establish motivational communication modules for each agent to disseminate motivational messages to promote cooperative behaviors. Finally, we employ a value regularization strategy from the Soft Actor Critic (SAC) method to maximize entropy and improve exploration capabilities. The performance of OSSMC was rigorously evaluated against a series of challenging benchmark sets. Empirical results demonstrate that OSSMC not only surpasses current baseline algorithms but also exhibits a more rapid convergence rate.

引用

页数：12

共 50 条

[1] Multi-Agent Reinforcement Learning With Distributed Targeted Multi-Agent Communication
Xu, Chi
Zhang, Hui
Zhang, Ya
2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 2915 - 2920
[2] Learning structured communication for multi-agent reinforcement learning
Junjie Sheng
Xiangfeng Wang
Bo Jin
Junchi Yan
Wenhao Li
Tsung-Hui Chang
Jun Wang
Hongyuan Zha
Autonomous Agents and Multi-Agent Systems, 2022, 36
[3] Learning structured communication for multi-agent reinforcement learning
Sheng, Junjie
Wang, Xiangfeng
Jin, Bo
Yan, Junchi
Li, Wenhao
Chang, Tsung-Hui
Wang, Jun
Zha, Hongyuan
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2022, 36 (02)
[4] Communication-Efficient and Federated Multi-Agent Reinforcement Learning
Krouka, Mounssif
Elgabli, Anis
Ben Issaid, Chaouki
Bennis, Mehdi
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2022, 8 (01) : 311 - 320
[5] A Review of Multi-Agent Reinforcement Learning Algorithms
Liang, Jiaxin
Miao, Haotian
Li, Kai
Tan, Jianheng
Wang, Xi
Luo, Rui
Jiang, Yueqiu
ELECTRONICS, 2025, 14 (04):
[6] Learning of Communication Codes in Multi-Agent Reinforcement Learning Problem
Kasai, Tatsuya
Tenmoto, Hiroshi
Kamiya, Akimoto
2008 IEEE CONFERENCE ON SOFT COMPUTING IN INDUSTRIAL APPLICATIONS SMCIA/08, 2009, : 1 - +
[7] Multi-Agent Reinforcement Learning for Coordinating Communication and Control
Mason, Federico
Chiariotti, Federico
Zanella, Andrea
Popovski, Petar
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2024, 10 (04) : 1566 - 1581
[8] A survey of multi-agent deep reinforcement learning with communication
Zhu, Changxi
Dastani, Mehdi
Wang, Shihan
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2024, 38 (01)
[9] Multi-agent reinforcement learning based on local communication
Wenxu Zhang
Lei Ma
Xiaonan Li
Cluster Computing, 2019, 22 : 15357 - 15366
[10] Multi-Agent Deep Reinforcement Learning with Emergent Communication
Simoes, David
Lau, Nuno
Reis, Luis Paulo
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,

← 1 2 3 4 5 →