The evolutionary dynamics of soft-max policy gradient in multi-agent settings

被引:0
|
作者
Bernasconi, Martino [1 ]
Cacciamani, Federico [1 ]
Fioravanti, Simone [2 ]
Gatti, Nicola [1 ]
Trovo, Francesco [1 ]
机构
[1] Politecn Milan, Milan, Italy
[2] Gran Sasso Sci Inst, Laquila, Italy
关键词
Game theory; Evolutionary game theory; Reinforcement learning; Multiagent learning; REINFORCEMENT;
D O I
10.1016/j.tcs.2024.115011
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Policy gradient is one of the most famous algorithms in reinforcement learning. This paper studies the mean dynamics of the soft-max policy gradient algorithm and its properties in multi- agent settings by resorting to evolutionary game theory and dynamical system tools. Unlike most multi-agent reinforcement learning algorithms, whose mean dynamics are a slight variant of the replicator dynamics not affecting the properties of the original dynamics, the soft-max policy gradient dynamics presents a structure significantly different from that of the replicator. In particular, we show that the soft-max policy gradient dynamics in a given game are equivalent to the replicator dynamics in an auxiliary game obtained by a non-convex transformation of the payoffs of the original game. Such a structure gives the dynamics several non-standard properties. The first property we study concerns the convergence to the best response. In particular, while the continuous-time mean dynamics always converge to the best response, the crucial question concerns the convergence speed. Precisely, we show that the space of initializations can be split into two complementary sets such that the trajectories initialized from points of the first set (said good initialization region) directly move to the best response. In contrast, those initialized from points of the second set (said bad initialization region) move first to a series of sub-optimal strategies and then to the best response. Interestingly, in multi-agent adversarial machine learning environments, we show that an adversary can exploit this property to make any current strategy of the learning agent using the soft-max policy gradient fall inside a bad initialization region, thus slowing its learning process and exploiting that policy. When the soft-max policy gradient dynamics is studied in multi-population games, modeling the learning dynamics in self-play, we show that the dynamics preserve the volume of the set of initial points. This property proves that the dynamics cannot converge when the only equilibrium of the game is fully mixed, as the volume of the set of initial points would need to shrink. We also give empirical evidence that the volume expands over time, suggesting that the dynamics in games with fully-mixed equilibrium is chaotic.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] A Deep Reinforcement Learning Method based on Deterministic Policy Gradient for Multi-Agent Cooperative Competition
    Zuo, Xuan
    Xue, Hui-Feng
    Wang, Xiao-Yin
    Du, Wan-Ru
    Tian, Tao
    Gao, Shan
    Zhang, Pu
    CONTROL ENGINEERING AND APPLIED INFORMATICS, 2021, 23 (03): : 88 - 98
  • [22] Hybrid Policy Learning for Multi-Agent Pathfinding
    Skrynnik, Alexey
    Yakovleva, Alexandra
    Davydov, Vasilii
    Yakovlev, Konstantin
    Panov, Aleksandr I.
    IEEE ACCESS, 2021, 9 : 126034 - 126047
  • [23] A political agent for multi-agent simulation of spatial planning policy
    Klerx, J
    INTERNATIONAL CONFERENCE ON POLITICS AND INFORMATION SYSTEMS: TECHNOLOGIES AND APPLICATIONS, VOL 2, 2004, : 73 - 78
  • [24] Shaping multi-agent systems with gradient reinforcement learning
    Olivier Buffet
    Alain Dutech
    François Charpillet
    Autonomous Agents and Multi-Agent Systems, 2007, 15 : 197 - 220
  • [25] Shaping multi-agent systems with gradient reinforcement learning
    Buffet, Olivier
    Dutech, Alain
    Charpillet, Francois
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2007, 15 (02) : 197 - 220
  • [26] Research on Wargame Decision-Making Method Based on Multi-Agent Deep Deterministic Policy Gradient
    Yu, Sheng
    Zhu, Wei
    Wang, Yong
    APPLIED SCIENCES-BASEL, 2023, 13 (07):
  • [27] Optimization of swimming mode for elongated undulating fin using multi-agent deep deterministic policy gradient
    Vu, Quoc Tuan
    Duong, Van Tu
    Nguyen, Huy Hung
    Nguyen, Tan Tien
    ENGINEERING SCIENCE AND TECHNOLOGY-AN INTERNATIONAL JOURNAL-JESTECH, 2024, 56
  • [28] Multi-agent Behavior-Based Policy Transfer
    Didi, Sabre
    Nitschke, Geoff
    APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2016, PT II, 2016, 9598 : 181 - 197
  • [29] Combining Policy Search with Planning in Multi-agent Cooperation
    Ma, Jie
    Cameron, Stephen
    ROBOCUP 2008: ROBOT SOCCER WORLD CUP XII, 2009, 5399 : 532 - 543
  • [30] Constraining an Unconstrained Multi-agent Policy with offline data
    Guan, Cong
    Jiang, Tao
    Li, Yi-Chen
    Zhang, Zongzhang
    Yuan, Lei
    Yu, Yang
    NEURAL NETWORKS, 2025, 186