VAOS: Enhancing the stability of cooperative multi-agent policy learning

被引：0

作者：

Li, Peng ^{[1
]}

Chen, Shaofei ^{[1
]}

Yuan, Weilin ^{[1
]}

Hu, Zhenzhen ^{[1
]}

Chen, Jing ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Intelligence Sci & Technol, Changsha, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 304卷

关键词：

Overestimation reduction; Multi-agent; Operator switching; Value averaging; Reinforcement learning;

D O I：

10.1016/j.knosys.2024.112474

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-agent value decomposition (MAVD) algorithms have made remarkable achievements in applications of multi-agent reinforcement learning (MARL). However, overestimation errors in MAVD algorithms generally lead to unstable phenomena such as severe oscillation and performance degradation in their learning processes. In this work, we propose a method to integrate the advantages of value averaging and operator switching (VAOS) to enhance MAVD algorithms' learning stability. In particular, we reduce the variance of the target approximate error by averaging the estimate values of the target network. Meanwhile, we design a operator switching method to fully combine the optimal policy learning ability of the Max operator and the superior stability of the Mellowmax operator. Moreover, we theoretically prove the performance of VAOS in reducing the overestimation error. Exhaustive experimental results show that (1) Comparing to the current popular value decomposition algorithms such as QMIX, VAOS can markedly enhance the learning stability; and (2) The performance of VAOS is superior to other advanced algorithms such as regularized softmax (RES) algorithm in reducing overestimation error.

引用

页数：14

共 50 条

[21] The Cooperative Multi-agent Learning with Random Reward Values
张化祥
黄上腾
Journal of Shanghai Jiaotong University, 2005, (02) : 147 - 150
[22] Cooperative multi-agent game based on reinforcement learning
Liu, Hongbo
HIGH-CONFIDENCE COMPUTING, 2024, 4 (01):
[23] Learning Cooperative Behaviours in Adversarial Multi-agent Systems
Wang, Ni
Das, Gautham P.
Millard, Alan G.
TOWARDS AUTONOMOUS ROBOTIC SYSTEMS, TAROS 2022, 2022, 13546 : 179 - 189
[24] Investigation of cooperative learning supported by multi-Agent technology
Tian, Aikui
Zheng, Yingli
PROCEEDINGS OF THE 2007 1ST INTERNATIONAL SYMPOSIUM ON INFORMATION TECHNOLOGIES AND APPLICATIONS IN EDUCATION (ISITAE 2007), 2007, : 315 - +
[25] Training Cooperative Agents for Multi-Agent Reinforcement Learning
Bhalla, Sushrut
Subramanian, Sriram G.
Crowley, Mark
AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1826 - 1828
[26] Explaining the Behaviour of Reinforcement Learning Agents in a Multi-Agent Cooperative Environment Using Policy Graphs
Vila, Marc
Gnatyshak, Dmitry
Tormos, Adrian
Gimenez-Abalos, Victor
Alvarez-Napagao, Sergio
ELECTRONICS, 2024, 13 (03)
[27] Cooperative Reinforcement Learning Algorithm to Distributed Power System Based on Multi-Agent
Gao, La-mei
Zeng, Jun
Wu, Jie
Li, Min
2009 3RD INTERNATIONAL CONFERENCE ON POWER ELECTRONICS SYSTEMS AND APPLICATIONS: ELECTRIC VEHICLE AND GREEN ENERGY, 2009, : 53 - 53
[28] Advantage Constrained Proximal Policy Optimization in Multi-Agent Reinforcement Learning
Li, Weifan
Zhu, Yuanheng
Zhao, Dongbin
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[29] Learning multi-agent cooperation
Rivera, Corban
Staley, Edward
Llorens, Ashley
FRONTIERS IN NEUROROBOTICS, 2022, 16
[30] Mention Recommendation in Twitter with Cooperative Multi-Agent Reinforcement Learning
Gui, Tao
Liu, Peng
Zhang, Qi
Zhu, Liang
Peng, Minlong
Zhou, Yunhua
Huang, Xuanjing
PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 535 - 544

← 1 2 3 4 5 →