Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games

被引：0

作者：

Zeng, Sihan ^{[1
]}

Doan, Thinh ^{[2
]}

Romberg, Justin ^{[1
]}

机构：

[1] Georgia Inst Technol, Dept Elect & Comp Engn, Atlanta, GA 30318 USA

[2] Virginia Tech, Dept Elect & Comp Engn, Blacksburg, VA 24061 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

关键词：

EQUILIBRIA;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the problem of finding the Nash equilibrium in a two-player zero-sum Markov game. Due to its formulation as a minimax optimization program, a natural approach to solve the problem is to perform gradient descent/ascent with respect to each player in an alternating fashion. However, due to the non-convexity/non-concavity of the underlying objective function, theoretical understandings of this method are limited. In our paper, we consider solving an entropy-regularized variant of the Markov game. The regularization introduces structure into the optimization landscape that make the solutions more identifiable and allow the problem to be solved more efficiently. Our main contribution is to show that under proper choices of the regularization parameter, the gradient descent ascent algorithm converges to the Nash equilibrium of the original unregularized problem. We explicitly characterize the finite-time performance of the last iterate of our algorithm, which vastly improves over the existing convergence bound of the gradient descent ascent algorithm without regularization. Finally, we complement the analysis with numerical simulations that illustrate the accelerated convergence of the algorithm.

引用

页数：13

共 50 条

[1] Policy Gradient Algorithm in Two-Player Zero-Sum Markov Games
Li Y.
Zhou J.
Feng Y.
Feng Y.
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2023, 36 (01): : 81 - 91
[2] Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games
Perolat, Julien
Scherrer, Bruno
Piot, Bilal
Pietquin, Olivier
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1321 - 1329
[3] When are Offline Two-Player Zero-Sum Markov Games Solvable?
Cui, Qiwen
Du, Simon S.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[4] Policy gradient algorithm and its convergence analysis for two-player zero-sum Markov games
Wang Z.
Li Y.
Feng Y.
Feng Y.
Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (03): : 480 - 491
[5] Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games
Zhao, Yulai
Tian, Yuandong
Lee, Jason D.
Du, Simon S.
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[6] Corruption-Robust Offline Two-Player Zero-Sum Markov Games
Nika, Andi
Mandal, Debmalya
Singla, Adish
Radanovic, Goran
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[7] Inverse Two-Player Zero-Sum Dynamic Games
Tsai, Dorian
Molloy, Timothy L.
Perez, Tristan
2016 AUSTRALIAN CONTROL CONFERENCE (AUCC), 2016, : 192 - 196
[8] A theorem on symmetric two-player zero-sum games
Laffond, G
Laslier, JF
LeBreton, M
JOURNAL OF ECONOMIC THEORY, 1997, 72 (02) : 426 - 431
[9] Online Minimax Q Network Learning for Two-Player Zero-Sum Markov Games
Zhu, Yuanheng
Zhao, Dongbin
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (03) : 1228 - 1241
[10] Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback
Cai, Yang
Luo, Haipeng
Wei, Chen-Yu
Zheng, Weiqiang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →