Linear Convergence of Independent Natural Policy Gradient in Games With Entropy Regularization

被引：0

作者：

Sun, Youbang ^{[1
]}

Liu, Tao ^{[2
]}

Kumar, P. R. ^{[2
]}

Shahrampour, Shahin ^{[1
]}

机构：

[1] Northeastern Univ, Dept Mech & Ind Engn, Boston, MA 02115 USA

[2] Texas A&M Univ, Dept Elect & Comp Engn, College Stn, TX 77843 USA

来源：

IEEE CONTROL SYSTEMS LETTERS | 2024年 / 8卷

关键词：

Games; Entropy; Convergence; Nash equilibrium; Reinforcement learning; Gradient methods; Approximation algorithms; Game theory; multi-agent reinforcement learning; natural policy gradient; quantal response equilibrium;

D O I：

10.1109/LCSYS.2024.3410149

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This letter focuses on the entropy-regularized independent natural policy gradient (NPG) algorithm in multi-agent reinforcement learning. In this letter, agents are assumed to have access to an oracle with exact policy evaluation and seek to maximize their respective independent rewards. Each individual's reward is assumed to depend on the actions of all agents in the multi-agent system, leading to a game between agents. All agents make decisions under a policy with bounded rationality, which is enforced by the introduction of entropy regularization. In practice, a smaller regularization implies that agents are more rational and behave closer to Nash policies. On the other hand, with larger regularization agents tend to act randomly, which ensures more exploration. We show that, under sufficient entropy regularization, the dynamics of this system converge at a linear rate to the quantal response equilibrium (QRE). Although regularization assumptions prevent the QRE from approximating a Nash equilibrium (NE), our findings apply to a wide range of games, including cooperative, potential, and two-player matrix games. We also provide extensive empirical results on multiple games (including Markov games) as a verification of our theoretical analysis.

引用

页码：1217 / 1222

页数：6

共 50 条

[31] Geometric convergence of distributed gradient play in games with unconstrained action sets
Tatarenko, Tatiana
Nedic, Angelia
IFAC PAPERSONLINE, 2020, 53 (02): : 3367 - 3372
[32] Linear quadratic network games with dynamic players: Stabilization and output convergence to Nash equilibrium
Guo, Meichen
De Persis, Claudio
AUTOMATICA, 2021, 130
[33] Convergence of batch gradient learning with smoothing regularization and adaptive momentum for neural networks
Fan, Qinwei
Wu, Wei
Zurada, Jacek M.
SPRINGERPLUS, 2016, 5
[34] Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time
Wang, Weichen
Han, Jiequn
Yang, Zhuoran
Wang, Zhaoran
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7780 - 7791
[35] Adaptive Natural Policy Gradient in Reinforcement Learning
Li, Dazi
Qiao, Zengyuan
Song, Tianheng
Jin, Qibing
PROCEEDINGS OF 2018 IEEE 7TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE (DDCLS), 2018, : 605 - 610
[36] Policy Iteration for Linear Quadratic Games With Stochastic Parameters
Gravell, Benjamin
Ganapathy, Karthik
Summers, Tyler
IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (01): : 307 - 312
[37] Increasing Entropy to Boost Policy Gradient Performance on Personalization Tasks
Starnes, Andrew
Dereventsov, Anton
Webster, Clayton
2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1551 - 1558
[38] Model-Based Imitation Learning Using Entropy Regularization of Model and Policy
Uchibe, Eiji
IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) : 10922 - 10929
[39] A Compressed Gradient Tracking Method for Decentralized Optimization With Linear Convergence
Liao, Yiwei
Li, Zhuorui
Huang, Kun
Pu, Shi
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (10) : 5622 - 5629
[40] On R-linear convergence analysis for a class of gradient methods
Na Huang
Computational Optimization and Applications, 2022, 81 : 161 - 177

← 1 2 3 4 5 →