Game Theory-Based Control System Algorithms with Real-Time Reinforcement Learning HOW TO SOLVE MULTIPLAYER GAMES ONLINE

被引：136

作者：

Vamvoudakis, Kyriakos G. ^{[1
]}

Modares, Hamidreza ^{[2
]}

Kiumarsi, Bahare ^{[3
]}

Lewis, Frank L. ^{[4
,5
]}

机构：

[1] Virginia Tech, Dept Aerosp & Ocean Engn, Blacksburg, VA 24061 USA

[2] Missouri Univ Sci & Technol, Rolla, MO USA

[3] Univ Texas Arlington, Arlington, TX 76019 USA

[4] Univ Texas Arlington, Res Inst, Ft Worth, TX USA

[5] Northeastern Univ, Shenyang, Peoples R China

来源：

IEEE CONTROL SYSTEMS MAGAZINE | 2017年 / 37卷 / 01期

关键词：

OPTIMAL TRACKING CONTROL; ZERO-SUM GAMES; STACKELBERG STRATEGY; FEEDBACK; EQUATION;

D O I：

10.1109/MCS.2016.2621461

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Complex human-engineered systems involve an interconnection of multiple decision makers (or agents) whose collective behavior depends on a compilation of local decisions that are based on partial information about each other and the state of the environment [1]-[4]. Strategic interactions among agents in these systems can be modeled as a multiplayer simultaneous-move game [5]-[8]. The agents involved can have conflicting objectives, and it is natural to make decisions based upon optimizing individual payoffs or costs. © 2016 IEEE.

引用

页码：33 / 52

页数：20

共 66 条

[1] Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control [J].

Al-Tamimi, Asma ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

AUTOMATICA, 2007, 43 (03) :473-481

[2]

Alpcan T., 2010, Network Security: A Decision and Game-Theoretic Approach

[3]

Arthur W.B., 1997, The economy as an evolving complex system II, V28

[4] TEAM-OPTIMAL CLOSED-LOOP STACKELBERG STRATEGIES IN HIERARCHICAL CONTROL-PROBLEMS [J].

BASAR, T ;

OLSDER, GJ .

AUTOMATICA, 1980, 16 (04) :409-414

[5]

Basar T., 1995, Dynamic noncooperative game theory, V200

[6] DYNAMIC PROGRAMMING AND LAGRANGE MULTIPLIERS [J].

BELLMAN, R .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1956, 42 (10) :767-769

[7]

Bernhard P., 1995, H-optimal control and related minimax design problems, V2nd

[8]

Bertsekas D. P., 1996, NEURODYNAMIC PROGRAM

[9] A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems [J].

Bhasin, S. ;

Kamalapurkar, R. ;

Johnson, M. ;

Vamvoudakis, K. G. ;

Lewis, F. L. ;

Dixon, W. E. .

AUTOMATICA, 2013, 49 (01) :82-92

[10]

BRADTKE SJ, 1994, PROCEEDINGS OF THE 1994 AMERICAN CONTROL CONFERENCE, VOLS 1-3, P3475

← 1 2 3 4 5 6 7 →