VAOS: Enhancing the stability of cooperative multi-agent policy learning

被引：0

作者：

Li, Peng ^{[1
]}

Chen, Shaofei ^{[1
]}

Yuan, Weilin ^{[1
]}

Hu, Zhenzhen ^{[1
]}

Chen, Jing ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Intelligence Sci & Technol, Changsha, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 304卷

关键词：

Overestimation reduction; Multi-agent; Operator switching; Value averaging; Reinforcement learning;

D O I：

10.1016/j.knosys.2024.112474

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-agent value decomposition (MAVD) algorithms have made remarkable achievements in applications of multi-agent reinforcement learning (MARL). However, overestimation errors in MAVD algorithms generally lead to unstable phenomena such as severe oscillation and performance degradation in their learning processes. In this work, we propose a method to integrate the advantages of value averaging and operator switching (VAOS) to enhance MAVD algorithms' learning stability. In particular, we reduce the variance of the target approximate error by averaging the estimate values of the target network. Meanwhile, we design a operator switching method to fully combine the optimal policy learning ability of the Max operator and the superior stability of the Mellowmax operator. Moreover, we theoretically prove the performance of VAOS in reducing the overestimation error. Exhaustive experimental results show that (1) Comparing to the current popular value decomposition algorithms such as QMIX, VAOS can markedly enhance the learning stability; and (2) The performance of VAOS is superior to other advanced algorithms such as regularized softmax (RES) algorithm in reducing overestimation error.

引用

页数：14

共 50 条

[1] QDAP: Downsizing adaptive policy for cooperative multi-agent reinforcement learning
Zhao, Zhitong
Zhang, Ya
Wang, Siying
Zhang, Fan
Zhang, Malu
Chen, Wenyu
KNOWLEDGE-BASED SYSTEMS, 2024, 294
[2] Multi-agent Cooperative Search based on Reinforcement Learning
Sun, Yinjiang
Zhang, Rui
Liang, Wenbao
Xu, Cheng
PROCEEDINGS OF 2020 3RD INTERNATIONAL CONFERENCE ON UNMANNED SYSTEMS (ICUS), 2020, : 891 - 896
[3] Centralized reinforcement learning for multi-agent cooperative environments
Chengxuan Lu
Qihao Bao
Shaojie Xia
Chongxiao Qu
Evolutionary Intelligence, 2024, 17 : 267 - 273
[4] Centralized reinforcement learning for multi-agent cooperative environments
Lu, Chengxuan
Bao, Qihao
Xia, Shaojie
Qu, Chongxiao
EVOLUTIONARY INTELLIGENCE, 2024, 17 (01) : 267 - 273
[5] A Deep Reinforcement Learning Method based on Deterministic Policy Gradient for Multi-Agent Cooperative Competition
Zuo, Xuan
Xue, Hui-Feng
Wang, Xiao-Yin
Du, Wan-Ru
Tian, Tao
Gao, Shan
Zhang, Pu
CONTROL ENGINEERING AND APPLIED INFORMATICS, 2021, 23 (03): : 88 - 98
[6] Transform networks for cooperative multi-agent deep reinforcement learning
Wang, Hongbin
Xie, Xiaodong
Zhou, Lianke
APPLIED INTELLIGENCE, 2023, 53 (08) : 9261 - 9269
[7] Cooperative Behavior by Multi-agent Reinforcement Learning with Abstractive Communication
Tanda, Jin
Moustafa, Ahmed
Ito, Takayuki
2019 IEEE INTERNATIONAL CONFERENCE ON AGENTS (ICA), 2019, : 8 - 13
[8] Transform networks for cooperative multi-agent deep reinforcement learning
Hongbin Wang
Xiaodong Xie
Lianke Zhou
Applied Intelligence, 2023, 53 : 9261 - 9269
[9] Knowledge Reuse of Multi-Agent Reinforcement Learning in Cooperative Tasks
Shi, Daming
Tong, Junbo
Liu, Yi
Fan, Wenhui
ENTROPY, 2022, 24 (04)
[10] Language-Driven Policy Distillation for Cooperative Driving in Multi-Agent Reinforcement Learning
Liu, Jiaqi
Xu, Chengkai
Hang, Peng
Sun, Jian
Ding, Mingyu
Zhan, Wei
Tomizuka, Masayoshi
IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (05): : 4292 - 4299

← 1 2 3 4 5 →