Opponent cart-pole dynamics for reinforcement learning of competing agents

被引:1
作者
Huang, Xun [1 ]
机构
[1] Peking Univ, Coll Engn, State Key Lab Turbulence & Complex Syst, Beijing 100871, Peoples R China
基金
美国国家科学基金会;
关键词
Cart-pole dynamics; Reinforcement learning; Thucydides trap; Inverted pendulum; PURSUIT-EVASION GAME;
D O I
10.1007/s10409-022-09005-x
中图分类号
TH [机械、仪表工业];
学科分类号
0802 ;
摘要
In this work, the classical single cart-pole dynamic system is extended to the double cart-pole dynamic system with the inclusion of a competing target, which enables the study of multi-agent deep learning problems at an affordable cost. The corresponding important issues, such as system dynamics, reward function and simultaneous training of opponent agents, are discussed in details. To showcase the system dynamics, a couple of agents are trained and the analysis of the competing results reveals the key pattern for winning the competition. It appears that a defensive agent is always defeated by an offensive agent, albeit the associated neural network has a very limited intelligence. When both agents are defensive, the system dynamics will remain stable and achieve the Nash equilibrium. Overall, the proposed dynamic system could serve a surrogate model and assist the study about how to escape the so-called Thucydides trap.
引用
收藏
页数:10
相关论文
共 38 条
  • [1] Active Learning of Dynamics for Data-Driven Control Using Koopman Operators
    Abraham, Ian
    Murphey, Todd D.
    [J]. IEEE TRANSACTIONS ON ROBOTICS, 2019, 35 (05) : 1071 - 1083
  • [2] Allison Graham., 2017, DESTINED WAR CAN AM
  • [3] Arulkumaran K., PREPRINTS
  • [4] NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS
    BARTO, AG
    SUTTON, RS
    ANDERSON, CW
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05): : 834 - 846
  • [5] Solution of a Pursuit-Evasion Game Using a Near-Optimal Strategy
    Carr, Ryan W.
    Cobb, Richard G.
    Pachter, Meir
    Pierce, Scott
    [J]. JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2018, 41 (04) : 841 - 850
  • [6] Reinforcement Learning-Based Control of Nonlinear Systems Using Lyapunov Stability Concept and Fuzzy Reward Scheme
    Chen, Ming
    Lam, Hak Keung
    Shi, Qian
    Xiao, Bo
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2020, 67 (10) : 2059 - 2063
  • [7] Nonconventional control of the flexible pole-cart balancing problem: Experimental results
    Dadios, EP
    Williams, DJ
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1998, 28 (06): : 895 - 901
  • [8] DeepMind, Pysc2 - StarCraft II learning environment
  • [9] Feng S., PREPRINTS
  • [10] Geva S., 1993, IEEE Control Systems Magazine, V13, P40, DOI 10.1109/37.236324