Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games

被引:0
作者
Elmehdi Amhraoui
Tawfik Masrour
机构
[1] ENSAM-Meknes,Laboratory of Mathematical Modeling, Simulation and Smart Systems (L2M3S), Department of Mathematics and Computer Science
[2] Moulay ISMAIL University,undefined
来源
Journal of Intelligent & Robotic Systems | 2023年 / 108卷
关键词
Multiagent systems; Distributed reinforcement learning; Game theory; Fully cooperative Markov games; Independent learners;
D O I
暂无
中图分类号
学科分类号
摘要
In this article, we introduce the Smooth Q-Learning algorithm for independent learners (distributed and non-communicative learners) in cooperative Markov games. Smooth Q-Learning aimed to solve the relative over-generalization and the stochasticity problems while also performing well in the presence of other non-coordination factors such as the miscoordination problem (also known as the Pareto selection problem) and the non-stationarity problem. Smooth Q-Learning is an algorithm that tries to find a trade-off between two incompatible learning approaches: the maximum-based learning and the average-based learning, by dynamically adjusting the learning rate based on the value of temporal difference error in a way that ensures the algorithm lies somewhere between average-based learning and maximum-based learning. We compare Smooth Q-Learning against different algorithms from the literature: Decentralized Q-learning, Distributed Q-Learning, Hysteretic Q-Learning, and a recent version of Lenient Q-Learning called Lenient Multiagent Reinforcement learning 2. The results show that Smooth Q-Learning is very effective in the sense that it has the highest number of convergent trials. Unlike competing algorithms, Smooth Q-Learning is also easy to tune and does not require storing additional information.
引用
收藏
相关论文
共 54 条
  • [1] Mnih V(2015)Human-level control through deep reinforcement learning. nature 518 529-533
  • [2] Silver D(2016)Mastering the game of go with deep neural networks and tree search nature 529 484-489
  • [3] Levine S(2016)End-to-end training of deep visuomotor policies The J. Mach. Learn. Res. 17 1334-1373
  • [4] Finn C(2005)Cooperative multi-agent learning: The state of the art Auton. Agent Multi-Agent Syst. 11 387-434
  • [5] Darrell T(2019)A survey and critique of multiagent deep reinforcement learning Auton. Agent Multi-Agent Syst. 33 750-797
  • [6] Abbeel P(2003)Nash q-learning for general-sum stochastic games J. Mach. Learn. Res. 4 1039-1069
  • [7] Panait L(2001)Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents Friend-or-foe q-learning in general-sum games 1 322-328
  • [8] Luke S(2003)Lenient learning in independent-learner stochastic cooperative games Correlated q-learning 3 242-249
  • [9] Hernandez-Leal P(2007)Value-function reinforcement learning in markov games Mach. Learn. 67 23-43
  • [10] Kartal B(2016)Near-optimal reinforcement learning with self-play The J. Mach. Learn. Res. 17 2914-2955