Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

被引:0
|
作者
Kozuno, Tadashi [1 ]
Menard, Pierre [2 ]
Munos, Remi [3 ]
Valko, Michal [3 ]
机构
[1] Univ Alberta, Edmonton, AB, Canada
[2] Otto von Guericke Univ, Magdeburg, Germany
[3] DeepMind Paris, Paris, France
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the problem of learning a Nash equilibrium (NE) in an imperfect information game (IIG) through self-play. Precisely, we focus on two-player, zero-sum, episodic, tabular IIG under the perfect-recall assumption where the only feedback is realizations of the game (bandit feedback). In particular, the dynamics of the IIG is not known-we can only access it by sampling or interacting with a game simulator. For this learning setting, we provide the Implicit Exploration Online Mirror Descent (IXOMD) algorithm. It is a model-free algorithm with a high-probability bound on the convergence rate to the NE of order 1/root T where T is the number of played games. Moreover, IXOMD is computationally efficient as it needs to perform the updates only along the sampled trajectory.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games
    Perolat, Julien
    Scherrer, Bruno
    Piot, Bilal
    Pietquin, Olivier
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1321 - 1329
  • [2] When are Offline Two-Player Zero-Sum Markov Games Solvable?
    Cui, Qiwen
    Du, Simon S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [3] Online Minimax Q Network Learning for Two-Player Zero-Sum Markov Games
    Zhu, Yuanheng
    Zhao, Dongbin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (03) : 1228 - 1241
  • [4] Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback
    Cai, Yang
    Luo, Haipeng
    Wei, Chen-Yu
    Zheng, Weiqiang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] Policy Gradient Algorithm in Two-Player Zero-Sum Markov Games
    Li Y.
    Zhou J.
    Feng Y.
    Feng Y.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2023, 36 (01): : 81 - 91
  • [6] Stochastic Two-Player Zero-Sum Learning Differential Games
    Liu, Mushuang
    Wan, Yan
    Lewis, Frank L.
    Lopez, Victor G.
    2019 IEEE 15TH INTERNATIONAL CONFERENCE ON CONTROL AND AUTOMATION (ICCA), 2019, : 1038 - 1043
  • [7] Learning Extensive-Form Perfect Equilibria in Two-Player Zero-Sum Sequential Games
    Bernasconi, Martino
    Marchesi, Alberto
    Trovo, Francesco
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [8] Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games
    Zhao, Yulai
    Tian, Yuandong
    Lee, Jason D.
    Du, Simon S.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [9] Corruption-Robust Offline Two-Player Zero-Sum Markov Games
    Nika, Andi
    Mandal, Debmalya
    Singla, Adish
    Radanovic, Goran
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [10] Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games
    Zeng, Sihan
    Doan, Thinh
    Romberg, Justin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,