SA-MARL: Novel Self-Attention-Based Multi-Agent Reinforcement Learning With Stochastic Gradient Descent

被引：0

作者：

Younas, Rabbiya ^{[1
]}

Rehman, Hafiz Muhammad Raza Ur ^{[1
]}

Lee, Ingyu ^{[1
]}

On, Byung-Won ^{[2
]}

Yi, Sungwon ^{[3
]}

Choi, Gyu Sang ^{[1
]}

机构：

[1] Yeungnam Univ, Dept Informat & Commun Engn, Gyongsan 38541, South Korea

[2] Kunsan Natl Univ, Dept Software Convergence Engn, Gunsan Si 54150, South Korea

[3] Elect & Telecommun Res Inst ETRI, Planning Div, Daejeon 34129, South Korea

来源：

IEEE ACCESS | 2025年 / 13卷

基金：

新加坡国家研究基金会;

关键词：

Vectors; Training; Reinforcement learning; Neural networks; Optimization; Stochastic processes; Multi-agent systems; Knowledge engineering; Decision making; Robot kinematics; MARL; optimization; self-attention; StarCraft II;

D O I：

10.1109/ACCESS.2025.3544961

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the rapidly advancing Reinforcement Learning (RL) field, Multi-Agent Reinforcement Learning (MARL) has emerged as a key player in solving complex real-world challenges. A pivotal development in this realm is the introduction of the mixing network, representing a significant leap forward in the capabilities of multi-agent systems. Drawing inspiration from COMA and VDN methodologies, the mixing network overcomes limitations in extracting combined Q-values from joint state-action interactions. Previous approaches like COMA and VDN faced constraints in fully utilizing the state-provided information during training, limiting their effectiveness. QMIX and QVMinMax addressed this issue by employing neural networks to convert centralized states into weights for a second neural network, akin to hyper- networks. However, these solutions presented challenges such as computational intensity and susceptibility to local minima. To overcome these hurdles, our proposed methodology introduces three key contributions. First, we introduce the state- fusion network, an innovative alternative to traditional mixing, with a self-attention mechanism. Second, to address the local optima problem in MARL algorithms, we leverage the Grey Wolf Optimizer for weight and bias selection, adding a stochastic element for improved optimization. Finally, we comprehensively compare with QMIX, evaluating performance under two optimization methods: Gradient Descent and Stochastic Optimizer. Using the StarCraft II Learning Environment (SC2LE) as our experimental platform, our results demonstrate the superiority of our methodology over QMIX, QVMinMax, and QSOD in absolute performance, particularly when operating under resource constraints. Our proposed methodology contributes to the ongoing evolution of MARL techniques, showcasing advancements in attention mechanisms and optimization strategies for enhanced multi-agent system capabilities.

引用

页码：35674 / 35687

页数：14

共 32 条

[1] Deep reinforcement learning in real-time strategy games: a systematic literature review
Barros e Sa, Gabriel Caldas
Madeira, Charles Andrye Galvao
[J]. APPLIED INTELLIGENCE, 2025, 55 (03)
[2] Superhuman AI for heads-up no-limit poker: Libratus beats top professionals
Brown, Noam
Sandholm, Tuomas
[J]. SCIENCE, 2018, 359 (6374) : 418 - +
[3] An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination
Cao, Yongcan
Yu, Wenwu
Ren, Wei
Chen, Guanrong
[J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2013, 9 (01) : 427 - 438
[4] de Witt CAS, 2019, ADV NEUR IN, V32
[5] Experienced Gray Wolf Optimization Through Reinforcement Learning and Neural Networks
Emary, E.
Zawbaa, Hossam M.
Grosan, Crina
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (03) : 681 - 694
[6] Foerster J, 2017, PR MACH LEARN RES, V70
[7] Foerster JN, 2018, AAAI CONF ARTIF INTE, P2974
[8] Ha D, 2016, Arxiv, DOI [arXiv:1609.09106, DOI 10.48550/ARXIV.1609.09106]
[9] Leroy P, 2020, Arxiv, DOI [arXiv:2012.12062, 10.48550/arXiv.2012.12062]
[10] Li YX, 2018, Arxiv, DOI [arXiv:1701.07274, DOI 10.48550/ARXIV.1701.07274]

← 1 2 3 4 →