Optimistic sequential multi-agent reinforcement learning with motivational communication

被引:1
|
作者
Huang, Anqi [1 ]
Wang, Yongli [1 ]
Zhou, Xiaoliang [1 ]
Zou, Haochen [1 ]
Dong, Xu [1 ]
Che, Xun [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-agent reinforcement learning; Policy gradient; Motivational communication; Reinforcement learning; Multi-agent system;
D O I
10.1016/j.neunet.2024.106547
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Centralized Training with Decentralized Execution (CTDE) is a prevalent paradigm in the field of fully cooperative Multi-Agent Reinforcement Learning (MARL). Existing algorithms often encounter two major problems: independent strategies tend to underestimate the potential value of actions, leading to the convergence on sub-optimal Nash Equilibria (NE); some communication paradigms introduce added complexity to the learning process, complicating the focus on the essential elements of the messages. To address these challenges, we propose a novel method called O ptimistic S equential S oft Actor Critic with M otivational C ommunication (OSSMC). The key idea of OSSMC is to utilize a greedy-driven approach to explore the potential value of individual policies, named optimistic Q-values, which serve as an upper bound for the Q-value of the current policy. We then integrate a sequential update mechanism with optimistic Q-value for agents, aiming to ensure monotonic improvement in the joint policy optimization process. Moreover, we establish motivational communication modules for each agent to disseminate motivational messages to promote cooperative behaviors. Finally, we employ a value regularization strategy from the Soft Actor Critic (SAC) method to maximize entropy and improve exploration capabilities. The performance of OSSMC was rigorously evaluated against a series of challenging benchmark sets. Empirical results demonstrate that OSSMC not only surpasses current baseline algorithms but also exhibits a more rapid convergence rate.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] A Multi-agent Reinforcement Learning with Weighted Experience Sharing
    Yu, Lasheng
    Abdulai, Issahaku
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2012, 6839 : 219 - 225
  • [42] A multi-agent reinforcement learning approach to robot soccer
    Duan, Yong
    Cui, Bao Xia
    Xu, Xin He
    ARTIFICIAL INTELLIGENCE REVIEW, 2012, 38 (03) : 193 - 211
  • [43] An overview: Attention mechanisms in multi-agent reinforcement learning
    Hu, Kai
    Xu, Keer
    Xia, Qingfeng
    Li, Mingyang
    Song, Zhiqiang
    Song, Lipeng
    Sun, Ning
    NEUROCOMPUTING, 2024, 598
  • [44] Multi-Agent Reinforcement Learning With Decentralized Distribution Correction
    Li, Kuo
    Jia, Qing-Shan
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 1684 - 1696
  • [45] Shaping multi-agent systems with gradient reinforcement learning
    Olivier Buffet
    Alain Dutech
    François Charpillet
    Autonomous Agents and Multi-Agent Systems, 2007, 15 : 197 - 220
  • [46] Multi-Agent Reinforcement Learning With Decentralized Distribution Correction
    Li, Kuo
    Jia, Qing-Shan
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 1684 - 1696
  • [47] Generating Multi-agent Patrol Areas by Reinforcement Learning
    Park, Bumjin
    Kang, Cheongwoong
    Choi, Jaesik
    2021 21ST INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2021), 2021, : 104 - 107
  • [48] Rationality of reward sharing in multi-agent reinforcement learning
    Miyazaki, K
    Kobayashi, S
    NEW GENERATION COMPUTING, 2001, 19 (02) : 157 - 172
  • [49] Battlefield Environment Design for Multi-agent Reinforcement Learning
    Do, Seungwon
    Baek, Jaeuk
    Jun, Sungwoo
    Lee, Changeun
    2022 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (IEEE BIGCOMP 2022), 2022, : 318 - 319
  • [50] Robust multi-agent reinforcement learning for noisy environments
    Xinning Chen
    Xuan Liu
    Canhui Luo
    Jiangjin Yin
    Peer-to-Peer Networking and Applications, 2022, 15 : 1045 - 1056