Satisficing Paths and Independent Multiagent Reinforcement Learning in Stochastic Games

被引:1
|
作者
Yongacoglu, Bora [1 ]
Arslan, Gurdal [2 ]
Yuksel, Serdar [1 ]
机构
[1] Queens Univ, Dept Math & Stat, Kingston, ON, Canada
[2] Univ Hawaii Manoa, Dept Elect Engn, Honolulu, HI 96822 USA
来源
SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE | 2023年 / 5卷 / 03期
关键词
multiagent reinforcement learning; independent learners; learning in games; stochastic games; decentralized systems; FICTITIOUS PLAY; UNCOUPLED DYNAMICS; CONVERGENCE; SYSTEMS; TEAMS; GO;
D O I
10.1137/22M1515112
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In multiagent reinforcement learning, independent learners are those that do not observe the actions of other agents in the system. Due to the decentralization of information, it is challenging to design independent learners that drive play to equilibrium. This paper investigates the feasibility of using satisficing dynamics to guide independent learners to approximate equilibrium in stochastic games. For \epsilon \geq 0, an \epsilon -satisficing policy update rule is any rule that instructs the agent to not change its policy when it is \epsilon -best-responding to the policies of the remaining players; \epsilon -satisficing paths are defined to be sequences of joint policies obtained when each agent uses some \epsilon -satisficing policy update rule to select its next policy. We establish structural results on the existence of \epsilon -satisficing paths into \epsilon -equilibrium in both symmetric N-player games and general stochastic games with two players. We then present an independent learning algorithm for N-player symmetric games and give high probability guarantees of convergence to \epsilon -equilibrium under self-play. This guarantee is made using symmetry alone, leveraging the previously unexploited structure of \epsilon -satisficing paths.
引用
收藏
页码:745 / 773
页数:29
相关论文
共 50 条
  • [31] Cooperative Multiagent Reinforcement Learning With Partial Observations
    Zhang, Yan
    Zavlanos, Michael M.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (02) : 968 - 981
  • [32] Multi-Agent Reinforcement Learning in Non-Cooperative Stochastic Games Using Large Language Models
    Alsadat, Shayan Meshkat
    Xu, Zhe
    IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 2757 - 2762
  • [33] QFuture: Learning Future Expectation Cognition in Multiagent Reinforcement Learning
    Liu, Boyin
    Pu, Zhiqiang
    Pan, Yi
    Yi, Jianqiang
    Chen, Min
    Wang, Shijie
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (04) : 1302 - 1314
  • [34] Policy Evaluation and Seeking for Multiagent Reinforcement Learning via Best Response
    Yan, Rui
    Duan, Xiaoming
    Shi, Zongying
    Zhong, Yisheng
    Marden, Jason R.
    Bullo, Francesco
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (04) : 1898 - 1913
  • [35] CuMARL: Curiosity-Based Learning in Multiagent Reinforcement Learning
    Ningombam, Devarani Devi
    Yoo, Byunghyun
    Kim, Hyun Woo
    Song, Hwa Jeon
    Yi, Sungwon
    IEEE ACCESS, 2022, 10 : 87254 - 87265
  • [36] Path to Stochastic Stability: Comparative Analysis of Stochastic Learning Dynamics in Games
    Jaleel, Hassan
    Shamma, Jeff S.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (11) : 5253 - 5268
  • [37] Multiagent Online Learning in Time-Varying Games
    Duvocelle, Benoit
    Mertikopoulos, Panayotis
    Staudigl, Mathias
    Vermeulen, Dries
    MATHEMATICS OF OPERATIONS RESEARCH, 2023, 48 (02) : 914 - 941
  • [38] Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games
    Elmehdi Amhraoui
    Tawfik Masrour
    Journal of Intelligent & Robotic Systems, 2023, 108
  • [39] Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games
    Amhraoui, Elmehdi
    Masrour, Tawfik
    JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2023, 108 (04)
  • [40] Reinforcement learning and stochastic optimisation
    Sebastian Jaimungal
    Finance and Stochastics, 2022, 26 : 103 - 129