Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games

被引：0

作者：

Elmehdi Amhraoui

Tawfik Masrour

机构：

[1] ENSAM-Meknes,Laboratory of Mathematical Modeling, Simulation and Smart Systems (L2M3S), Department of Mathematics and Computer Science

[2] Moulay ISMAIL University,undefined

来源：

Journal of Intelligent & Robotic Systems | 2023年 / 108卷

关键词：

Multiagent systems; Distributed reinforcement learning; Game theory; Fully cooperative Markov games; Independent learners;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this article, we introduce the Smooth Q-Learning algorithm for independent learners (distributed and non-communicative learners) in cooperative Markov games. Smooth Q-Learning aimed to solve the relative over-generalization and the stochasticity problems while also performing well in the presence of other non-coordination factors such as the miscoordination problem (also known as the Pareto selection problem) and the non-stationarity problem. Smooth Q-Learning is an algorithm that tries to find a trade-off between two incompatible learning approaches: the maximum-based learning and the average-based learning, by dynamically adjusting the learning rate based on the value of temporal difference error in a way that ensures the algorithm lies somewhere between average-based learning and maximum-based learning. We compare Smooth Q-Learning against different algorithms from the literature: Decentralized Q-learning, Distributed Q-Learning, Hysteretic Q-Learning, and a recent version of Lenient Q-Learning called Lenient Multiagent Reinforcement learning 2. The results show that Smooth Q-Learning is very effective in the sense that it has the highest number of convergent trials. Unlike competing algorithms, Smooth Q-Learning is also easy to tune and does not require storing additional information.

引用

共 54 条

[1] Mnih V(2015)Human-level control through deep reinforcement learning. nature 518 529-533
[2] Silver D(2016)Mastering the game of go with deep neural networks and tree search nature 529 484-489
[3] Levine S(2016)End-to-end training of deep visuomotor policies The J. Mach. Learn. Res. 17 1334-1373
[4] Finn C(2005)Cooperative multi-agent learning: The state of the art Auton. Agent Multi-Agent Syst. 11 387-434
[5] Darrell T(2019)A survey and critique of multiagent deep reinforcement learning Auton. Agent Multi-Agent Syst. 33 750-797
[6] Abbeel P(2003)Nash q-learning for general-sum stochastic games J. Mach. Learn. Res. 4 1039-1069
[7] Panait L(2001)Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents Friend-or-foe q-learning in general-sum games 1 322-328
[8] Luke S(2003)Lenient learning in independent-learner stochastic cooperative games Correlated q-learning 3 242-249
[9] Hernandez-Leal P(2007)Value-function reinforcement learning in markov games Mach. Learn. 67 23-43
[10] Kartal B(2016)Near-optimal reinforcement learning with self-play The J. Mach. Learn. Res. 17 2914-2955

← 1 2 3 4 5 6 →