I2Q: A Fully Decentralized Q-Learning Algorithm

被引：0

作者：

Jiang, Jiechuan ^{[1
]}

Lu, Zongqing ^{[1
]}

机构：

[1] Peking Univ, Sch Comp Sci, Beijing, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Fully decentralized multi-agent reinforcement learning has shown great potential for many real-world cooperative tasks, where the global information, e.g., the actions of other agents, is not accessible. Although independent Q-learning is widely used for decentralized training, the transition probabilities are non-stationary since other agents are updating policies simultaneously, which leads to non-guaranteed convergence of independent Q-learning. To deal with non-stationarity, we first introduce stationary ideal transition probabilities, on which independent Q-learning could converge to the global optimum. Further, we propose a fully decentralized method, I2Q, which performs independent Q-learning on the modeled ideal transition function to reach the global optimum. The modeling of ideal transition function in I2Q is fully decentralized and independent from the learned policies of other agents, helping I2Q be free from non-stationarity and learn the optimal policy. Empirically, we show that I2Q can achieve remarkable improvement in a variety of cooperative multi-agent tasks.

引用

页数：13

共 50 条

[1] Selectively Decentralized Q-Learning
Thanh Nguyen
Mukhopadhyay, Snehasis
2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 328 - 333
[2] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
Wang, Yin-Hao
Li, Tzuu-Hseng S.
Lin, Chih-Jui
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
[3] Decentralized Q-Learning for Uplink Power Control
Dzulkifly, Sumayyah
Giupponi, Lorenza
Said, Fatin
Dohler, Mischa
2015 IEEE 20TH INTERNATIONAL WORKSHOP ON COMPUTER AIDED MODELLING AND DESIGN OF COMMUNICATION LINKS AND NETWORKS (CAMAD), 2015, : 54 - 58
[4] Asynchronous Decentralized Q-Learning in Stochastic Games
Yongacoglu, Bora
Arslan, Gurdal
Yuksel, Serdar
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 5008 - 5013
[5] Decentralized Q-Learning for Stochastic Teams and Games
Arslan, Gurdal
Yuksel, Serdar
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (04) : 1545 - 1558
[6] A Scalable Parallel Q-Learning Algorithm for Resource Constrained Decentralized Computing Environments
Camelo, Miguel
Famaey, Jeroen
Latre, Steven
PROCEEDINGS OF 2016 2ND WORKSHOP ON MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC), 2016, : 27 - 35
[7] ENHANCEMENTS OF FUZZY Q-LEARNING ALGORITHM
Glowaty, Grzegorz
COMPUTER SCIENCE-AGH, 2005, 7 : 77 - 87
[8] An analysis of the pheromone Q-learning algorithm
Monekosso, N
Remagnino, P
ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2002, PROCEEDINGS, 2002, 2527 : 224 - 232
[9] A Weighted Smooth Q-Learning Algorithm
Vijesh, V. Antony
Shreyas, S. R.
IEEE CONTROL SYSTEMS LETTERS, 2025, 9 : 21 - 26
[10] An improved immune Q-learning algorithm
Ji, Zhengqiao
Wu, Q. M. Jonathan
Sid-Ahmed, Maher
2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 3330 - +

← 1 2 3 4 5 →