I2Q: A Fully Decentralized Q-Learning Algorithm

被引:0
|
作者
Jiang, Jiechuan [1 ]
Lu, Zongqing [1 ]
机构
[1] Peking Univ, Sch Comp Sci, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fully decentralized multi-agent reinforcement learning has shown great potential for many real-world cooperative tasks, where the global information, e.g., the actions of other agents, is not accessible. Although independent Q-learning is widely used for decentralized training, the transition probabilities are non-stationary since other agents are updating policies simultaneously, which leads to non-guaranteed convergence of independent Q-learning. To deal with non-stationarity, we first introduce stationary ideal transition probabilities, on which independent Q-learning could converge to the global optimum. Further, we propose a fully decentralized method, I2Q, which performs independent Q-learning on the modeled ideal transition function to reach the global optimum. The modeling of ideal transition function in I2Q is fully decentralized and independent from the learned policies of other agents, helping I2Q be free from non-stationarity and learn the optimal policy. Empirically, we show that I2Q can achieve remarkable improvement in a variety of cooperative multi-agent tasks.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Selectively Decentralized Q-Learning
    Thanh Nguyen
    Mukhopadhyay, Snehasis
    2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 328 - 333
  • [2] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
    Wang, Yin-Hao
    Li, Tzuu-Hseng S.
    Lin, Chih-Jui
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
  • [3] Decentralized Q-Learning for Uplink Power Control
    Dzulkifly, Sumayyah
    Giupponi, Lorenza
    Said, Fatin
    Dohler, Mischa
    2015 IEEE 20TH INTERNATIONAL WORKSHOP ON COMPUTER AIDED MODELLING AND DESIGN OF COMMUNICATION LINKS AND NETWORKS (CAMAD), 2015, : 54 - 58
  • [4] Asynchronous Decentralized Q-Learning in Stochastic Games
    Yongacoglu, Bora
    Arslan, Gurdal
    Yuksel, Serdar
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 5008 - 5013
  • [5] Decentralized Q-Learning for Stochastic Teams and Games
    Arslan, Gurdal
    Yuksel, Serdar
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (04) : 1545 - 1558
  • [6] A Scalable Parallel Q-Learning Algorithm for Resource Constrained Decentralized Computing Environments
    Camelo, Miguel
    Famaey, Jeroen
    Latre, Steven
    PROCEEDINGS OF 2016 2ND WORKSHOP ON MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC), 2016, : 27 - 35
  • [7] ENHANCEMENTS OF FUZZY Q-LEARNING ALGORITHM
    Glowaty, Grzegorz
    COMPUTER SCIENCE-AGH, 2005, 7 : 77 - 87
  • [8] An analysis of the pheromone Q-learning algorithm
    Monekosso, N
    Remagnino, P
    ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2002, PROCEEDINGS, 2002, 2527 : 224 - 232
  • [9] A Weighted Smooth Q-Learning Algorithm
    Vijesh, V. Antony
    Shreyas, S. R.
    IEEE CONTROL SYSTEMS LETTERS, 2025, 9 : 21 - 26
  • [10] An improved immune Q-learning algorithm
    Ji, Zhengqiao
    Wu, Q. M. Jonathan
    Sid-Ahmed, Maher
    2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 3330 - +