Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games

被引:0
|
作者
Zeng, Sihan [1 ]
Doan, Thinh [2 ]
Romberg, Justin [1 ]
机构
[1] Georgia Inst Technol, Dept Elect & Comp Engn, Atlanta, GA 30318 USA
[2] Virginia Tech, Dept Elect & Comp Engn, Blacksburg, VA 24061 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年
关键词
EQUILIBRIA;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the problem of finding the Nash equilibrium in a two-player zero-sum Markov game. Due to its formulation as a minimax optimization program, a natural approach to solve the problem is to perform gradient descent/ascent with respect to each player in an alternating fashion. However, due to the non-convexity/non-concavity of the underlying objective function, theoretical understandings of this method are limited. In our paper, we consider solving an entropy-regularized variant of the Markov game. The regularization introduces structure into the optimization landscape that make the solutions more identifiable and allow the problem to be solved more efficiently. Our main contribution is to show that under proper choices of the regularization parameter, the gradient descent ascent algorithm converges to the Nash equilibrium of the original unregularized problem. We explicitly characterize the finite-time performance of the last iterate of our algorithm, which vastly improves over the existing convergence bound of the gradient descent ascent algorithm without regularization. Finally, we complement the analysis with numerical simulations that illustrate the accelerated convergence of the algorithm.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Policy Gradient Algorithm in Two-Player Zero-Sum Markov Games
    Li Y.
    Zhou J.
    Feng Y.
    Feng Y.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2023, 36 (01): : 81 - 91
  • [2] Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games
    Perolat, Julien
    Scherrer, Bruno
    Piot, Bilal
    Pietquin, Olivier
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1321 - 1329
  • [3] When are Offline Two-Player Zero-Sum Markov Games Solvable?
    Cui, Qiwen
    Du, Simon S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [4] Policy gradient algorithm and its convergence analysis for two-player zero-sum Markov games
    Wang Z.
    Li Y.
    Feng Y.
    Feng Y.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (03): : 480 - 491
  • [5] Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games
    Zhao, Yulai
    Tian, Yuandong
    Lee, Jason D.
    Du, Simon S.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [6] Corruption-Robust Offline Two-Player Zero-Sum Markov Games
    Nika, Andi
    Mandal, Debmalya
    Singla, Adish
    Radanovic, Goran
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [7] Inverse Two-Player Zero-Sum Dynamic Games
    Tsai, Dorian
    Molloy, Timothy L.
    Perez, Tristan
    2016 AUSTRALIAN CONTROL CONFERENCE (AUCC), 2016, : 192 - 196
  • [8] A theorem on symmetric two-player zero-sum games
    Laffond, G
    Laslier, JF
    LeBreton, M
    JOURNAL OF ECONOMIC THEORY, 1997, 72 (02) : 426 - 431
  • [9] Online Minimax Q Network Learning for Two-Player Zero-Sum Markov Games
    Zhu, Yuanheng
    Zhao, Dongbin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (03) : 1228 - 1241
  • [10] Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback
    Cai, Yang
    Luo, Haipeng
    Wei, Chen-Yu
    Zheng, Weiqiang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,