Off-policy Q-learning: Solving Nash equilibrium of multi-player games with network-induced delay and unmeasured state

被引:30
|
作者
Li, Jinna [1 ]
Xiao, Zhenfei [1 ]
Fan, Jialu [2 ]
Chai, Tianyou [2 ]
Lewis, Frank L. L. [3 ]
机构
[1] Liaoning Petrochem Univ, Sch Informat & Control Engn, Fushun 113001, Peoples R China
[2] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Peoples R China
[3] Univ Texas Arlington, UTA Res Inst, Arlington, TX 76118 USA
基金
国家自然科学基金重大项目; 中国国家自然科学基金;
关键词
Adaptive dynamic programming (ADP); Game theory; Network-induced delay; Off-policy Q-learning; Unmeasured state; ZERO-SUM GAMES; TRACKING CONTROL; MULTIAGENT SYSTEMS;
D O I
10.1016/j.automatica.2021.110076
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the framework of adaptive dynamic programming combined with Q-learning, this paper investigates networked multi-player games, in which the common state of the plant is transmitted to all players via a network, for finding the Nash equilibrium solution without requiring the system matrices to be known, even though there exists network-induced delay and system state cannot be directly measured. By adding an observer and a virtual Smith predictor for estimating system state and predicting system state, the control policies of players can be successfully designed. Then, a novel off-policy Q-learning algorithm is proposed to learn the Nash equilibrium solution via solving the coupled algebraic Riccati equations using available data, followed by the rigorous proof of convergence of the proposed algorithm. Finally, an example is given to show the effectiveness of the proposed method. (c) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页数:7
相关论文
共 16 条
  • [1] Discrete-Time Multi-Player Games Based on Off-Policy Q-Learning
    Li, Jinna
    Xiao, Zhenfei
    Li, Ping
    IEEE ACCESS, 2019, 7 : 134647 - 134659
  • [2] Off-Policy Q-Learning for Anti-Interference Control of Multi-Player Systems
    Li, Jinna
    Xiao, Zhenfei
    Chai, Tianyou
    Lewis, Frank L.
    Jagannathan, Sarangapani
    IFAC PAPERSONLINE, 2020, 53 (02): : 9189 - 9194
  • [3] H∞ Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning
    Li, Jinna
    Xiao, Zhenfei
    IEEE ACCESS, 2020, 8 (08): : 28831 - 28846
  • [4] Efficient off-policy Q-learning for multi-agent systems by solving dual games
    Wang, Yan
    Xue, Huiwen
    Wen, Jiwei
    Liu, Jinfeng
    Luan, Xiaoli
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2024, 34 (06) : 4193 - 4212
  • [5] Integral reinforcement learning off-policy method for solving nonlinear multi-player nonzero-sum games with saturated actuator
    Ren, He
    Zhang, Huaguang
    Wen, Yinlei
    Liu, Chong
    NEUROCOMPUTING, 2019, 335 : 96 - 104
  • [6] A reinforcement learning algorithm for obtaining the Nash equilibrium of multi-player matrix games
    Nanduri, Vishnu
    Das, Tapas K.
    IIE TRANSACTIONS, 2009, 41 (02) : 158 - 167
  • [7] Seeking Nash Equilibrium for Linear Discrete-time Systems via Off-policy Q-learning
    Ni, Haohan
    Ji, Yuxiang
    Yang, Yuxiao
    Zhou, Jianping
    IAENG International Journal of Applied Mathematics, 2024, 54 (11) : 2477 - 2483
  • [8] Multi-player H∞ Differential Game using On-Policy and Off-Policy Reinforcement Learning
    An, Peiliang
    Liu, Mushuang
    Wan, Yan
    Lewis, Frank L.
    2020 IEEE 16TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION (ICCA), 2020, : 1137 - 1142
  • [9] Output Feedback H∞ Control for Linear Discrete-Time Multi-Player Systems With Multi-Source Disturbances Using Off-Policy Q-Learning
    Xiao, Zhenfei
    Li, Jinna
    Li, Ping
    IEEE ACCESS, 2020, 8 : 208938 - 208951
  • [10] Off-Policy Model-Free Learning for Multi-Player Non-Zero-Sum Games With Constrained Inputs
    Huo, Yu
    Wang, Ding
    Qiao, Junfei
    Li, Menghua
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2023, 70 (02) : 910 - 920