Off-policy Q-learning: Solving Nash equilibrium of multi-player games with network-induced delay and unmeasured state

被引：30

作者：

Li, Jinna ^{[1
]}

Xiao, Zhenfei ^{[1
]}

Fan, Jialu ^{[2
]}

Chai, Tianyou ^{[2
]}

Lewis, Frank L. L. ^{[3
]}

机构：

[1] Liaoning Petrochem Univ, Sch Informat & Control Engn, Fushun 113001, Peoples R China

[2] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Peoples R China

[3] Univ Texas Arlington, UTA Res Inst, Arlington, TX 76118 USA

来源：

AUTOMATICA | 2022年 / 136卷

基金：

国家自然科学基金重大项目; 中国国家自然科学基金;

关键词：

Adaptive dynamic programming (ADP); Game theory; Network-induced delay; Off-policy Q-learning; Unmeasured state; ZERO-SUM GAMES; TRACKING CONTROL; MULTIAGENT SYSTEMS;

D O I：

10.1016/j.automatica.2021.110076

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the framework of adaptive dynamic programming combined with Q-learning, this paper investigates networked multi-player games, in which the common state of the plant is transmitted to all players via a network, for finding the Nash equilibrium solution without requiring the system matrices to be known, even though there exists network-induced delay and system state cannot be directly measured. By adding an observer and a virtual Smith predictor for estimating system state and predicting system state, the control policies of players can be successfully designed. Then, a novel off-policy Q-learning algorithm is proposed to learn the Nash equilibrium solution via solving the coupled algebraic Riccati equations using available data, followed by the rigorous proof of convergence of the proposed algorithm. Finally, an example is given to show the effectiveness of the proposed method. (c) 2021 Elsevier Ltd. All rights reserved.

引用

页数：7

共 16 条

[1] Discrete-Time Multi-Player Games Based on Off-Policy Q-Learning
Li, Jinna
Xiao, Zhenfei
Li, Ping
IEEE ACCESS, 2019, 7 : 134647 - 134659
[2] Off-Policy Q-Learning for Anti-Interference Control of Multi-Player Systems
Li, Jinna
Xiao, Zhenfei
Chai, Tianyou
Lewis, Frank L.
Jagannathan, Sarangapani
IFAC PAPERSONLINE, 2020, 53 (02): : 9189 - 9194
[3] H∞ Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning
Li, Jinna
Xiao, Zhenfei
IEEE ACCESS, 2020, 8 (08): : 28831 - 28846
[4] Efficient off-policy Q-learning for multi-agent systems by solving dual games
Wang, Yan
Xue, Huiwen
Wen, Jiwei
Liu, Jinfeng
Luan, Xiaoli
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2024, 34 (06) : 4193 - 4212
[5] Integral reinforcement learning off-policy method for solving nonlinear multi-player nonzero-sum games with saturated actuator
Ren, He
Zhang, Huaguang
Wen, Yinlei
Liu, Chong
NEUROCOMPUTING, 2019, 335 : 96 - 104
[6] A reinforcement learning algorithm for obtaining the Nash equilibrium of multi-player matrix games
Nanduri, Vishnu
Das, Tapas K.
IIE TRANSACTIONS, 2009, 41 (02) : 158 - 167
[7] Seeking Nash Equilibrium for Linear Discrete-time Systems via Off-policy Q-learning
Ni, Haohan
Ji, Yuxiang
Yang, Yuxiao
Zhou, Jianping
IAENG International Journal of Applied Mathematics, 2024, 54 (11) : 2477 - 2483
[8] Multi-player H∞ Differential Game using On-Policy and Off-Policy Reinforcement Learning
An, Peiliang
Liu, Mushuang
Wan, Yan
Lewis, Frank L.
2020 IEEE 16TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION (ICCA), 2020, : 1137 - 1142
[9] Output Feedback H∞ Control for Linear Discrete-Time Multi-Player Systems With Multi-Source Disturbances Using Off-Policy Q-Learning
Xiao, Zhenfei
Li, Jinna
Li, Ping
IEEE ACCESS, 2020, 8 : 208938 - 208951
[10] Off-Policy Model-Free Learning for Multi-Player Non-Zero-Sum Games With Constrained Inputs
Huo, Yu
Wang, Ding
Qiao, Junfei
Li, Menghua
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2023, 70 (02) : 910 - 920

← 1 2 →