Approximate Nash Solutions for Multiplayer Mixed-Zero-Sum Game With Reinforcement Learning

被引：51

作者：

Lv, Yongfeng ^{[1
]}

Ren, Xuemei ^{[1
]}

机构：

[1] Beijing Inst Technol, Sch Automat, Beijing 100081, Peoples R China

来源：

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2019年 / 49卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Approximate dynamic programming (ADP); Nash games; neural networks (NNs); reinforcement learning (RL); system identification; ALGORITHM; SYSTEMS;

D O I：

10.1109/TSMC.2018.2861826

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Inspired by Nash game theory, a multiplayer mixed-zero-sum (MZS) nonlinear game considering both two situations [zero-sum and nonzero-sum (NZS) Nash games] is proposed in this paper. A synchronous reinforcement learning (RL) scheme based on the identifier-critic structure is developed to learn the Nash equilibrium solution of the proposed MZS game. First, the MZS game formulation is presented, where the performance indexes for players 1 to N - 1 and N NZS Nash game are presented, and another performance index for players N and N + 1 zero-sum game is presented, such that player N cooperates with players 1 to N - 1, while competes with player N + 1, which leads to a Nash equilibrium of all players. A single-layer neural network (NN) is then used to approximate the unknown dynamics of the nonlinear game system. Finally, an RL scheme based on NNs is developed to learn the optimal performance indexes, which can be used to produce the optimal control policy of every player such that Nash equilibrium can be obtained. Thus, the widely used actor NN in RL literature is not needed. To this end, a recently proposed adaptive law is used to estimate the unknown identifier coefficient vectors, and an improved adaptive law with the error performance index is further developed to update the critic coefficient vectors. Both linear and nonlinear simulations are presented to demonstrate the existence of Nash equilibrium for MZS game and performance of the proposed algorithm.

引用

页码：2739 / 2750

页数：12

共 47 条

[1] NECESSARY CONDITIONS FOR CONSTANT SOLUTIONS OF COUPLED RICCATI-EQUATIONS IN NASH GAMES [J].

ABOUKANDIL, H ;

FREILING, G ;

JANK, G .

SYSTEMS & CONTROL LETTERS, 1993, 21 (04) :295-306

[2] Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof [J].

Al-Tamimi, Asma ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04) :943-949

[3] Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control [J].

Al-Tamimi, Asma ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

AUTOMATICA, 2007, 43 (03) :473-481

[4]

[Anonymous], DYNAMIC NONCOOPERATI

[5]

[Anonymous], 2016, IEEE T SYST MAN CYB

[6] Online finite-horizon optimal learning algorithm for nonzero-sum games with partially unknown dynamics and constrained inputs [J].

Cui, Xiaohong ;

Zhang, Huaguang ;

Luo, Yanhong ;

Zu, Peifu .

NEUROCOMPUTING, 2016, 185 :37-44

[7] Robust Adaptive Dynamic Programming of Two-Player Zero-Sum Games for Continuous-Time Linear Systems [J].

Fu, Yue ;

Fu, Jun ;

Chai, Tianyou .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2015, 26 (12) :3314-3319

[8]

Ioannou P. A., 2012, Robust adaptive control

[9] Online adaptive approximate optimal tracking control with simplified dual approximation structure for continuous-time unknown nonlinear systems [J].

Na, Jing ;

Herrmann, Guido .

IEEE/CAA Journal of Automatica Sinica, 2014, 1 (04) :412-422

[10]

Krstic M., 1995, Nonlinear and adaptive control design

← 1 2 3 4 5 →