Gradient Play in Stochastic Games: Stationary Points, Convergence, and Sample Complexity

被引:2
作者
Zhang, Runyu [1 ]
Ren, Zhaolin [1 ]
Li, Na [1 ]
机构
[1] Harvard Univ, Harvard Sch Engn & Appl Sci, Cambridge, MA 02138 USA
关键词
Games; Convergence; Stochastic processes; Nash equilibrium; Geometry; Complexity theory; Stability analysis; Multiagent systems; reinforcement learning;
D O I
10.1109/TAC.2024.3387208
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we study the performance of the gradient play algorithm for stochastic games (SGs), where each agent tries to maximize its own total discounted reward by making decisions independently based on current state information, which is shared between agents. Policies are directly parameterized by the probability of choosing a certain action at a given state. We show that Nash equilibria (NEs) and first-order stationary policies are equivalent in this setting, and give a local convergence rate around strict NEs. Furthermore, for a subclass of SGs called Markov potential games (which includes the setting with identical rewards as an important special case), we design a sample-based reinforcement learning algorithm and give a nonasymptotic global convergence rate analysis for both exact gradient play and our sample-based learning algorithm. Our result shows that the number of iterations to reach an epsilon-NE scales linearly, instead of exponentially, with the number of agents. Local geometry and local stability are also considered, where we prove that strict NEs are local maxima of the total potential function and fully mixed NEs are saddle points.
引用
收藏
页码:6499 / 6514
页数:16
相关论文
共 50 条
[21]   Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods With Entropy Regularization [J].
Ding, Yuhao ;
Zhang, Junzi ;
Lee, Hyunin ;
Lavaei, Javad .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2025, 70 (08) :5129-5144
[22]   Policy Gradient Play with Networked Agents in Markov Potential Games [J].
Aydin, Sarper ;
Eksin, Ceyhun .
LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
[23]   Sparse Group Lasso: Optimal Sample Complexity, Convergence Rate, and Statistical Inference [J].
Cai, T. Tony ;
Zhang, Anru R. ;
Zhou, Yuchen .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (09) :5975-6002
[24]   Stationary Almost Markov Perfect Equilibria in Discounted Stochastic Games [J].
Jaskiewicz, Anna ;
Nowak, Andrzej S. .
MATHEMATICS OF OPERATIONS RESEARCH, 2016, 41 (02) :430-441
[25]   Convergence analysis for pure stationary strategies in repeated potential games: Nash, Lyapunov and correlated equilibria [J].
Clernpner, Julio B. ;
Poznyak, Alexander S. .
EXPERT SYSTEMS WITH APPLICATIONS, 2016, 46 :474-484
[26]   Convergence of Stochastic Gradient Descent in Deep Neural Network [J].
Zhou, Bai-cun ;
Han, Cong-ying ;
Guo, Tian-de .
ACTA MATHEMATICAE APPLICATAE SINICA-ENGLISH SERIES, 2021, 37 (01) :126-136
[27]   Convergence of Stochastic Gradient Descent in Deep Neural Network [J].
Bai-cun Zhou ;
Cong-ying Han ;
Tian-de Guo .
Acta Mathematicae Applicatae Sinica, English Series, 2021, 37 :126-136
[28]   The convergence of fictitious play in 3 x 3 games with strategic complementarities [J].
Hahn, S .
ECONOMICS LETTERS, 1999, 64 (01) :57-60
[29]   Convergence of estimative density: criterion for model complexity and sample size [J].
Sheena, Yo .
STATISTICAL PAPERS, 2023, 64 (01) :117-137
[30]   Fully Projection-Free Proximal Stochastic Gradient Method With Optimal Convergence Rates [J].
Li, Yan ;
Cao, Xiaofeng ;
Chen, Honghui .
IEEE ACCESS, 2020, 8 :165904-165912