Linear Convergence of Independent Natural Policy Gradient in Games With Entropy Regularization

被引:0
|
作者
Sun, Youbang [1 ]
Liu, Tao [2 ]
Kumar, P. R. [2 ]
Shahrampour, Shahin [1 ]
机构
[1] Northeastern Univ, Dept Mech & Ind Engn, Boston, MA 02115 USA
[2] Texas A&M Univ, Dept Elect & Comp Engn, College Stn, TX 77843 USA
来源
IEEE CONTROL SYSTEMS LETTERS | 2024年 / 8卷
关键词
Games; Entropy; Convergence; Nash equilibrium; Reinforcement learning; Gradient methods; Approximation algorithms; Game theory; multi-agent reinforcement learning; natural policy gradient; quantal response equilibrium;
D O I
10.1109/LCSYS.2024.3410149
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This letter focuses on the entropy-regularized independent natural policy gradient (NPG) algorithm in multi-agent reinforcement learning. In this letter, agents are assumed to have access to an oracle with exact policy evaluation and seek to maximize their respective independent rewards. Each individual's reward is assumed to depend on the actions of all agents in the multi-agent system, leading to a game between agents. All agents make decisions under a policy with bounded rationality, which is enforced by the introduction of entropy regularization. In practice, a smaller regularization implies that agents are more rational and behave closer to Nash policies. On the other hand, with larger regularization agents tend to act randomly, which ensures more exploration. We show that, under sufficient entropy regularization, the dynamics of this system converge at a linear rate to the quantal response equilibrium (QRE). Although regularization assumptions prevent the QRE from approximating a Nash equilibrium (NE), our findings apply to a wide range of games, including cooperative, potential, and two-player matrix games. We also provide extensive empirical results on multiple games (including Markov games) as a verification of our theoretical analysis.
引用
收藏
页码:1217 / 1222
页数:6
相关论文
共 50 条
  • [31] Geometric convergence of distributed gradient play in games with unconstrained action sets
    Tatarenko, Tatiana
    Nedic, Angelia
    IFAC PAPERSONLINE, 2020, 53 (02): : 3367 - 3372
  • [32] Linear quadratic network games with dynamic players: Stabilization and output convergence to Nash equilibrium
    Guo, Meichen
    De Persis, Claudio
    AUTOMATICA, 2021, 130
  • [33] Convergence of batch gradient learning with smoothing regularization and adaptive momentum for neural networks
    Fan, Qinwei
    Wu, Wei
    Zurada, Jacek M.
    SPRINGERPLUS, 2016, 5
  • [34] Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time
    Wang, Weichen
    Han, Jiequn
    Yang, Zhuoran
    Wang, Zhaoran
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7780 - 7791
  • [35] Adaptive Natural Policy Gradient in Reinforcement Learning
    Li, Dazi
    Qiao, Zengyuan
    Song, Tianheng
    Jin, Qibing
    PROCEEDINGS OF 2018 IEEE 7TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE (DDCLS), 2018, : 605 - 610
  • [36] Policy Iteration for Linear Quadratic Games With Stochastic Parameters
    Gravell, Benjamin
    Ganapathy, Karthik
    Summers, Tyler
    IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (01): : 307 - 312
  • [37] Increasing Entropy to Boost Policy Gradient Performance on Personalization Tasks
    Starnes, Andrew
    Dereventsov, Anton
    Webster, Clayton
    2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1551 - 1558
  • [38] Model-Based Imitation Learning Using Entropy Regularization of Model and Policy
    Uchibe, Eiji
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) : 10922 - 10929
  • [39] A Compressed Gradient Tracking Method for Decentralized Optimization With Linear Convergence
    Liao, Yiwei
    Li, Zhuorui
    Huang, Kun
    Pu, Shi
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (10) : 5622 - 5629
  • [40] On R-linear convergence analysis for a class of gradient methods
    Na Huang
    Computational Optimization and Applications, 2022, 81 : 161 - 177