Linear Convergence of Independent Natural Policy Gradient in Games With Entropy Regularization

被引:0
|
作者
Sun, Youbang [1 ]
Liu, Tao [2 ]
Kumar, P. R. [2 ]
Shahrampour, Shahin [1 ]
机构
[1] Northeastern Univ, Dept Mech & Ind Engn, Boston, MA 02115 USA
[2] Texas A&M Univ, Dept Elect & Comp Engn, College Stn, TX 77843 USA
来源
IEEE CONTROL SYSTEMS LETTERS | 2024年 / 8卷
关键词
Games; Entropy; Convergence; Nash equilibrium; Reinforcement learning; Gradient methods; Approximation algorithms; Game theory; multi-agent reinforcement learning; natural policy gradient; quantal response equilibrium;
D O I
10.1109/LCSYS.2024.3410149
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This letter focuses on the entropy-regularized independent natural policy gradient (NPG) algorithm in multi-agent reinforcement learning. In this letter, agents are assumed to have access to an oracle with exact policy evaluation and seek to maximize their respective independent rewards. Each individual's reward is assumed to depend on the actions of all agents in the multi-agent system, leading to a game between agents. All agents make decisions under a policy with bounded rationality, which is enforced by the introduction of entropy regularization. In practice, a smaller regularization implies that agents are more rational and behave closer to Nash policies. On the other hand, with larger regularization agents tend to act randomly, which ensures more exploration. We show that, under sufficient entropy regularization, the dynamics of this system converge at a linear rate to the quantal response equilibrium (QRE). Although regularization assumptions prevent the QRE from approximating a Nash equilibrium (NE), our findings apply to a wide range of games, including cooperative, potential, and two-player matrix games. We also provide extensive empirical results on multiple games (including Markov games) as a verification of our theoretical analysis.
引用
收藏
页码:1217 / 1222
页数:6
相关论文
共 50 条
  • [21] Convergence Analysis of an Adaptively Regularized Natural Gradient Method
    Wu, Jiayuan
    Hu, Jiang
    Zhang, Hongchao
    Wen, Zaiwen
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2024, 72 : 2527 - 2542
  • [22] Regularization of the Policy Updates for Stabilizing Mean Field Games
    Algumaei, Talal
    Solozabal, Ruben
    Alami, Reda
    Hacid, Hakim
    Debbah, Merouane
    Takac, Martin
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT II, 2023, 13936 : 361 - 372
  • [23] Policy Gradient Play with Networked Agents in Markov Potential Games
    Aydin, Sarper
    Eksin, Ceyhun
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [24] Semicentralized Deep Deterministic Policy Gradient in Cooperative StarCraft Games
    Xie, Dong
    Zhong, Xiangnan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (04) : 1584 - 1593
  • [25] Learning Optimal Controllers for Linear Systems With Multiplicative Noise via Policy Gradient
    Gravell, Benjamin
    Esfahani, Peyman Mohajerin
    Summers, Tyler
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (11) : 5283 - 5298
  • [26] LINEAR CONVERGENCE OF A POLICY GRADIENT METHOD FOR SOME FINITE HORIZON CONTINUOUS TIME CONTROL PROBLEMS
    Reisinger, Christoph
    Stockinger, Wolfgang
    Zhang, Yufei
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2023, 61 (06) : 3526 - 3558
  • [27] Policy Optimization for H2 Linear Control with H∞ Robustness Guarantee: Implicit Regularization and Global Convergence
    Zhang, Kaiqing
    Hu, Bin
    Basar, Tamer
    LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 179 - 190
  • [28] Compatible natural gradient policy search
    Joni Pajarinen
    Hong Linh Thai
    Riad Akrour
    Jan Peters
    Gerhard Neumann
    Machine Learning, 2019, 108 : 1443 - 1466
  • [29] Compatible natural gradient policy search
    Pajarinen, Joni
    Hong Linh Thai
    Akrour, Riad
    Peters, Jan
    Neumann, Gerhard
    MACHINE LEARNING, 2019, 108 (8-9) : 1443 - 1466
  • [30] Incremental Estimation of Natural Policy Gradient with Relative Importance Weighting
    Iwaki, Ryo
    Yokoyama, Hiroki
    Asada, Minoru
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (09): : 2346 - 2355