Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games

被引:185
|
作者
Song, Ruizhuo [1 ]
Lewis, Frank L. [2 ,3 ]
Wei, Qinglai [4 ]
机构
[1] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China
[2] Univ Texas Arlington, UTA Res Inst, Arlington, TX 76019 USA
[3] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Peoples R China
[4] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
Adaptive critic designs; adaptive dynamic programming (ADP); approximate dynamic programming; integral reinforcement learning (IRL); nonlinear systems; nonzero sum (NZS); off-policy; OPTIMAL TRACKING CONTROL; ADAPTIVE OPTIMAL-CONTROL; H-INFINITY CONTROL; DIFFERENTIAL-GAMES; UNKNOWN DYNAMICS; FEEDBACK-CONTROL; LINEAR-SYSTEMS; CONTROL DESIGN; OUTPUT DATA; ALGORITHM;
D O I
10.1109/TNNLS.2016.2582849
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper establishes an off-policy integral reinforcement learning (IRL) method to solve nonlinear continuous-time (CT) nonzero-sum (NZS) games with unknown system dynamics. The IRL algorithm is presented to obtain the iterative control and off-policy learning is used to allow the dynamics to be completely unknown. Off-policy IRL is designed to do policy evaluation and policy improvement in the policy iteration algorithm. Critic and action networks are used to obtain the performance index and control for each player. The gradient descent algorithm makes the update of critic and action weights simultaneously. The convergence analysis of the weights is given. The asymptotic stability of the closed-loop system and the existence of Nash equilibrium are proved. The simulation study demonstrates the effectiveness of the developed method for nonlinear CT NZS games with unknown system dynamics.
引用
收藏
页码:704 / 713
页数:10
相关论文
共 50 条
  • [1] Off-Policy Reinforcement Learning for Partially Unknown Nonzero-Sum Games
    Zhang, Qichao
    Zhao, Dongbin
    Zhang, Sibo
    NEURAL INFORMATION PROCESSING, ICONIP 2017, PT I, 2017, 10634 : 822 - 830
  • [2] Integral reinforcement learning off-policy method for solving nonlinear multi-player nonzero-sum games with saturated actuator
    Ren, He
    Zhang, Huaguang
    Wen, Yinlei
    Liu, Chong
    NEUROCOMPUTING, 2019, 335 : 96 - 104
  • [3] Event-triggered constrained neural critic control of nonlinear continuous-time multiplayer nonzero-sum games
    Li, Menghua
    Wang, Ding
    Zhao, Mingming
    Qiao, Junfei
    INFORMATION SCIENCES, 2023, 631 : 412 - 428
  • [4] Off-policy based adaptive dynamic programming method for nonzero-sum games on discrete-time system
    Wen, Yinlei
    Zhang, Huaguang
    Ren, He
    Zhang, Kun
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2020, 357 (12): : 8059 - 8081
  • [5] THE MULTIPLAYER NONZERO-SUM DYNKIN GAME IN CONTINUOUS TIME
    Hamadene, Said
    Mohammed, Hassani
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2014, 52 (02) : 821 - 835
  • [6] Off-policy integral reinforcement learning algorithm in dealing with nonzero sum game for nonlinear distributed parameter systems
    Ren, He
    Dai, Jing
    Zhang, Huaguang
    Zhang, Kun
    TRANSACTIONS OF THE INSTITUTE OF MEASUREMENT AND CONTROL, 2020, 42 (15) : 2919 - 2928
  • [7] Data-Driven Nonzero-Sum Game for Discrete-Time Systems Using Off-Policy Reinforcement Learning
    Yang, Yongliang
    Zhang, Sen
    Dong, Jie
    Yin, Yixin
    IEEE ACCESS, 2020, 8 : 14074 - 14088
  • [8] Off-policy integral reinforcement learning-based optimal tracking control for a class of nonzero-sum game systems with unknown dynamics
    Zhao, Jin-Gang
    Chen, Fang-Fang
    OPTIMAL CONTROL APPLICATIONS & METHODS, 2022, 43 (06): : 1623 - 1644
  • [9] Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems
    魏庆来
    宋睿卓
    孙秋野
    肖文栋
    Chinese Physics B, 2015, (09) : 151 - 156
  • [10] Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems
    Wei Qing-Lai
    Song Rui-Zhuo
    Sun Qiu-Ye
    Xiao Wen-Dong
    CHINESE PHYSICS B, 2015, 24 (09)