Opponent cart-pole dynamics for reinforcement learning of competing agents

被引：1

作者：

Huang, Xun ^{[1
]}

机构：

[1] Peking Univ, Coll Engn, State Key Lab Turbulence & Complex Syst, Beijing 100871, Peoples R China

来源：

ACTA MECHANICA SINICA | 2022年 / 38卷 / 05期

基金：

美国国家科学基金会;

关键词：

Cart-pole dynamics; Reinforcement learning; Thucydides trap; Inverted pendulum; PURSUIT-EVASION GAME;

D O I：

10.1007/s10409-022-09005-x

中图分类号：

TH [机械、仪表工业];

学科分类号：

0802 ;

摘要：

In this work, the classical single cart-pole dynamic system is extended to the double cart-pole dynamic system with the inclusion of a competing target, which enables the study of multi-agent deep learning problems at an affordable cost. The corresponding important issues, such as system dynamics, reward function and simultaneous training of opponent agents, are discussed in details. To showcase the system dynamics, a couple of agents are trained and the analysis of the competing results reveals the key pattern for winning the competition. It appears that a defensive agent is always defeated by an offensive agent, albeit the associated neural network has a very limited intelligence. When both agents are defensive, the system dynamics will remain stable and achieve the Nash equilibrium. Overall, the proposed dynamic system could serve a surrogate model and assist the study about how to escape the so-called Thucydides trap.

引用

页数：10

共 38 条

[1] Active Learning of Dynamics for Data-Driven Control Using Koopman Operators
Abraham, Ian
Murphey, Todd D.
[J]. IEEE TRANSACTIONS ON ROBOTICS, 2019, 35 (05) : 1071 - 1083
[2] Allison Graham., 2017, DESTINED WAR CAN AM
[3] Arulkumaran K., PREPRINTS
[4] NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS
BARTO, AG
SUTTON, RS
ANDERSON, CW
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05): : 834 - 846
[5] Solution of a Pursuit-Evasion Game Using a Near-Optimal Strategy
Carr, Ryan W.
Cobb, Richard G.
Pachter, Meir
Pierce, Scott
[J]. JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2018, 41 (04) : 841 - 850
[6] Reinforcement Learning-Based Control of Nonlinear Systems Using Lyapunov Stability Concept and Fuzzy Reward Scheme
Chen, Ming
Lam, Hak Keung
Shi, Qian
Xiao, Bo
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2020, 67 (10) : 2059 - 2063
[7] Nonconventional control of the flexible pole-cart balancing problem: Experimental results
Dadios, EP
Williams, DJ
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1998, 28 (06): : 895 - 901
[8] DeepMind, Pysc2 - StarCraft II learning environment
[9] Feng S., PREPRINTS
[10] Geva S., 1993, IEEE Control Systems Magazine, V13, P40, DOI 10.1109/37.236324

← 1 2 3 4 →