Smooth Fictitious Play in Stochastic Games with Perturbed Payoffs and Unknown Transitions

被引:0
|
作者
Baudin, Lucas [1 ]
Laraki, Rida [2 ,3 ]
机构
[1] Univ Paris Dauphine PSL, Paris, France
[2] Univ Paris Dauphine PSL, CNRS, Paris, France
[3] Univ Liverpool, Liverpool, England
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022 | 2022年
关键词
REINFORCEMENT; CONSISTENCY; CONVERGENCE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent extensions to dynamic games (Leslie et al. [2020], Sayin et al. [2020], Baudin and Laraki [2022]) of the well-known fictitious play learning procedure in static games were proved to globally converge to stationary Nash equilibria in two important classes of dynamic games (zero-sum and identical-interest discounted stochastic games). However, those decentralized algorithms need the players to know exactly the model (the transition probabilities and their payoffs at every stage). To overcome these strong assumptions, our paper introduces regularizations of the systems in Leslie et al. [2020], Baudin and Laraki [2022] to construct a family of new decentralized learning algorithms which are model-free (players don't know the transitions and their payoffs are perturbed at every stage). Our procedures can be seen as extensions to stochastic games of the classical smooth fictitious play learning procedures in static games (where the players best responses are regularized, thanks to a smooth strictly concave perturbation of their payoff functions). We prove the convergence of our family of procedures to stationary regularized Nash equilibria in zero-sum and identical-interest discounted stochastic games. The proof uses the continuous smooth best-response dynamics counterparts, and stochastic approximation methods. When there is only one player, our problem is an instance of Reinforcement Learning and our procedures are proved to globally converge to the optimal stationary policy of the regularized MDP. In that sense, they can be seen as an alternative to the well known Q-learning procedure.
引用
收藏
页数:14
相关论文
共 21 条
  • [1] DEEP FICTITIOUS PLAY FOR STOCHASTIC DIFFERENTIAL GAMES
    Hu, Ruimeng
    COMMUNICATIONS IN MATHEMATICAL SCIENCES, 2021, 19 (02) : 325 - 353
  • [2] Smooth Fictitious Play in N x 2 Potential Games
    Swenson, Brian
    Poor, H. Vincent
    CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2019, : 1739 - 1743
  • [3] Learning and equilibrium transitions: Stochastic stability in discounted stochastic fictitious play
    Williams, Noah
    JOURNAL OF ECONOMIC DYNAMICS & CONTROL, 2022, 145
  • [4] (SMOOTH) FICTITIOUS-PLAY IN IDENTICAL-INTEREST STOCHASTIC GAMES WITH INDEPENDENT CONTINUATION-PAYOFF ESTIMATES
    Zhang, K. Q.
    Sayin, M. O.
    Ozdaglar, A.
    APPLIED AND COMPUTATIONAL MATHEMATICS, 2024, 23 (03) : 366 - 391
  • [5] Fictitious Play and Best-Response Dynamics in Identical Interest and Zero Sum Stochastic Games
    Baudin, Lucas
    Laraki, Rida
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [6] Consistency of Vanishingly Smooth Fictitious Play
    Benaim, Michel
    Faure, Mathieu
    MATHEMATICS OF OPERATIONS RESEARCH, 2013, 38 (03) : 437 - 450
  • [7] Stochastic fictitious play with continuous action sets
    Perkins, S.
    Leslie, D. S.
    JOURNAL OF ECONOMIC THEORY, 2014, 152 : 179 - 213
  • [8] Deep Fictitious Play for Games with Continuous Action Spaces
    Kamra, Nitin
    Gupta, Umang
    Wang, Kai
    Fang, Fei
    Liu, Yan
    Tambe, Milind
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 2042 - 2044
  • [9] Heterogeneous beliefs and local information in stochastic fictitious play
    Fudenberg, Drew
    Takahashi, Satoru
    GAMES AND ECONOMIC BEHAVIOR, 2011, 71 (01) : 100 - 120
  • [10] The convergence of fictitious play in 3 x 3 games with strategic complementarities
    Hahn, S
    ECONOMICS LETTERS, 1999, 64 (01) : 57 - 60