VAOS: Enhancing the stability of cooperative multi-agent policy learning

被引:0
|
作者
Li, Peng [1 ]
Chen, Shaofei [1 ]
Yuan, Weilin [1 ]
Hu, Zhenzhen [1 ]
Chen, Jing [1 ]
机构
[1] Natl Univ Def Technol, Coll Intelligence Sci & Technol, Changsha, Peoples R China
关键词
Overestimation reduction; Multi-agent; Operator switching; Value averaging; Reinforcement learning;
D O I
10.1016/j.knosys.2024.112474
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-agent value decomposition (MAVD) algorithms have made remarkable achievements in applications of multi-agent reinforcement learning (MARL). However, overestimation errors in MAVD algorithms generally lead to unstable phenomena such as severe oscillation and performance degradation in their learning processes. In this work, we propose a method to integrate the advantages of value averaging and operator switching (VAOS) to enhance MAVD algorithms' learning stability. In particular, we reduce the variance of the target approximate error by averaging the estimate values of the target network. Meanwhile, we design a operator switching method to fully combine the optimal policy learning ability of the Max operator and the superior stability of the Mellowmax operator. Moreover, we theoretically prove the performance of VAOS in reducing the overestimation error. Exhaustive experimental results show that (1) Comparing to the current popular value decomposition algorithms such as QMIX, VAOS can markedly enhance the learning stability; and (2) The performance of VAOS is superior to other advanced algorithms such as regularized softmax (RES) algorithm in reducing overestimation error.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] The Cooperative Multi-agent Learning with Random Reward Values
    张化祥
    黄上腾
    Journal of Shanghai Jiaotong University, 2005, (02) : 147 - 150
  • [22] Cooperative multi-agent game based on reinforcement learning
    Liu, Hongbo
    HIGH-CONFIDENCE COMPUTING, 2024, 4 (01):
  • [23] Learning Cooperative Behaviours in Adversarial Multi-agent Systems
    Wang, Ni
    Das, Gautham P.
    Millard, Alan G.
    TOWARDS AUTONOMOUS ROBOTIC SYSTEMS, TAROS 2022, 2022, 13546 : 179 - 189
  • [24] Investigation of cooperative learning supported by multi-Agent technology
    Tian, Aikui
    Zheng, Yingli
    PROCEEDINGS OF THE 2007 1ST INTERNATIONAL SYMPOSIUM ON INFORMATION TECHNOLOGIES AND APPLICATIONS IN EDUCATION (ISITAE 2007), 2007, : 315 - +
  • [25] Training Cooperative Agents for Multi-Agent Reinforcement Learning
    Bhalla, Sushrut
    Subramanian, Sriram G.
    Crowley, Mark
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1826 - 1828
  • [26] Explaining the Behaviour of Reinforcement Learning Agents in a Multi-Agent Cooperative Environment Using Policy Graphs
    Vila, Marc
    Gnatyshak, Dmitry
    Tormos, Adrian
    Gimenez-Abalos, Victor
    Alvarez-Napagao, Sergio
    ELECTRONICS, 2024, 13 (03)
  • [27] Cooperative Reinforcement Learning Algorithm to Distributed Power System Based on Multi-Agent
    Gao, La-mei
    Zeng, Jun
    Wu, Jie
    Li, Min
    2009 3RD INTERNATIONAL CONFERENCE ON POWER ELECTRONICS SYSTEMS AND APPLICATIONS: ELECTRIC VEHICLE AND GREEN ENERGY, 2009, : 53 - 53
  • [28] Advantage Constrained Proximal Policy Optimization in Multi-Agent Reinforcement Learning
    Li, Weifan
    Zhu, Yuanheng
    Zhao, Dongbin
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [29] Learning multi-agent cooperation
    Rivera, Corban
    Staley, Edward
    Llorens, Ashley
    FRONTIERS IN NEUROROBOTICS, 2022, 16
  • [30] Mention Recommendation in Twitter with Cooperative Multi-Agent Reinforcement Learning
    Gui, Tao
    Liu, Peng
    Zhang, Qi
    Zhu, Liang
    Peng, Minlong
    Zhou, Yunhua
    Huang, Xuanjing
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 535 - 544