SQIX: QMIX Algorithm Activated by General Softmax Operator for Cooperative Multiagent Reinforcement Learning

被引：7

作者：

Zhang, Miaomiao ^{[1
]}

Tong, Wei ^{[2
]}

Zhu, Guangyu ^{[3
]}

Xu, Xin ^{[4
]}

Wu, Edmond Q. ^{[5
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Automat, Shanghai 200240, Peoples R China

[2] Nanjing Univ Posts & Telecommun, Coll Automat & Artificial Intelligence, Nanjing 210023, Peoples R China

[3] Beijing Jiaotong Univ, Beijing Res Ctr Urban Traff Informat Sensing & Se, Beijing 100044, Peoples R China

[4] Natl Univ Def Technol, Sch Intelligent Sci & Technol, Changsha 410005, Peoples R China

[5] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China

来源：

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2024年 / 54卷 / 11期

关键词：

Reinforcement learning; Training; Multi-agent systems; Q-learning; Estimation; Task analysis; Games; Cooperative multiagent system; multiagent deep reinforcement learning; softmax operator; StarCraft; value estimation;

D O I：

10.1109/TSMC.2024.3370186

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multiagent cooperative systems can be used to conceptualize many real-world problems. Reinforcement learning is a particularly effective tool. The issue of bias in Q-function value estimation in single-agent reinforcement learning has garnered a lot of interest and substantial study. Indeed, this challenge endures in multiagent reinforcement learning, primarily owing to the inclusion of maximization operations. The crux of the matter lies in the inability to seamlessly extrapolate single-agent reinforcement learning algorithms to their multiagent counterparts. In this article, we introduce a more encompassing and straightforward principle: the notion of appropriate value correction. We suggest replacing the maximization operation with a monotonically nondecreasing function to obtain more accurate value estimates. We theoretically demonstrate that this operation effectively reduces the potential overestimation bias in the QMIX algorithm. Ultimately, our methodology, dubbed the SMIX algorithm-a fusion of the QMIX algorithm empowered by the Softmax operator, attains state-of-the-art outcomes across diverse multiagent cooperative tasks. This success extends to challenging domains such as StarCraft II, marking it as one of the most formidable games to date.

引用

页码：6550 / 6560

页数：11

共 49 条

[1]

Ackermann V., 2019, ARXIV

[2] Deep Reinforcement Learning A brief survey [J].

Arulkumaran, Kai ;

Deisenroth, Marc Peter ;

Brundage, Miles ;

Bharath, Anil Anthony .

IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) :26-38

[3]

Busoniu L, 2010, STUD COMPUT INTELL, V310, P183

[4] Multi-Agent Reinforcement Learning: A Review of Challenges and Applications [J].

Canese, Lorenzo ;

Cardarilli, Gian Carlo ;

Di Nunzio, Luca ;

Fazzolari, Rocco ;

Giardino, Daniele ;

Re, Marco ;

Spano, Sergio .

APPLIED SCIENCES-BASEL, 2021, 11 (11)

[5] A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat [J].

Chai, Jiajun ;

Chen, Wenzhang ;

Zhu, Yuanheng ;

Yao, Zong-Xin ;

Zhao, Dongbin .

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (09) :5417-5429

[6] Milestones in Autonomous Driving and Intelligent Vehicles: Survey of Surveys [J].

Chen, Long ;

Li, Yuchen ;

Huang, Chao ;

Li, Bai ;

Xing, Yang ;

Tian, Daxin ;

Li, Li ;

Hu, Zhongxu ;

Na, Xiaoxiang ;

Li, Zixuan ;

Teng, Siyu ;

Lv, Chen ;

Wang, Jinjun ;

Cao, Dongpu ;

Zheng, Nanning ;

Wang, Fei-Yue .

IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (02) :1046-1056

[7] Robust Multi-Agent Reinforcement Learning Method Based on Adversarial Domain Randomization for Real-World Dual-UAV Cooperation [J].

Chen, Shutong ;

Liu, Guanjun ;

Zhou, Ziyuan ;

Zhang, Kaiwen ;

Wang, Jiacun .

IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01) :1615-1627

[8] Image-based traffic signal control via world models [J].

Dai, Xingyuan ;

Zhao, Chen ;

Wang, Xiao ;

Lv, Yisheng ;

Lin, Yilun ;

Wang, Fei-Yue .

FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2022, 23 (12) :1795-1813

[9]

Foerster JN, 2018, AAAI CONF ARTIF INTE, P2974

[10]

Fujimoto S, 2018, PR MACH LEARN RES, V80

← 1 2 3 4 5 →