Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games

被引:195
作者
Song, Ruizhuo [1 ]
Lewis, Frank L. [2 ,3 ]
Wei, Qinglai [4 ]
机构
[1] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China
[2] Univ Texas Arlington, UTA Res Inst, Arlington, TX 76019 USA
[3] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Peoples R China
[4] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
Adaptive critic designs; adaptive dynamic programming (ADP); approximate dynamic programming; integral reinforcement learning (IRL); nonlinear systems; nonzero sum (NZS); off-policy; OPTIMAL TRACKING CONTROL; ADAPTIVE OPTIMAL-CONTROL; H-INFINITY CONTROL; DIFFERENTIAL-GAMES; UNKNOWN DYNAMICS; FEEDBACK-CONTROL; LINEAR-SYSTEMS; CONTROL DESIGN; OUTPUT DATA; ALGORITHM;
D O I
10.1109/TNNLS.2016.2582849
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper establishes an off-policy integral reinforcement learning (IRL) method to solve nonlinear continuous-time (CT) nonzero-sum (NZS) games with unknown system dynamics. The IRL algorithm is presented to obtain the iterative control and off-policy learning is used to allow the dynamics to be completely unknown. Off-policy IRL is designed to do policy evaluation and policy improvement in the policy iteration algorithm. Critic and action networks are used to obtain the performance index and control for each player. The gradient descent algorithm makes the update of critic and action weights simultaneously. The convergence analysis of the weights is given. The asymptotic stability of the closed-loop system and the existence of Nash equilibrium are proved. The simulation study demonstrates the effectiveness of the developed method for nonlinear CT NZS games with unknown system dynamics.
引用
收藏
页码:704 / 713
页数:10
相关论文
共 47 条
[1]   Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].
Abu-Khalaf, M ;
Lewis, FL .
AUTOMATICA, 2005, 41 (05) :779-791
[2]   A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems [J].
Bhasin, S. ;
Kamalapurkar, R. ;
Johnson, M. ;
Vamvoudakis, K. G. ;
Lewis, F. L. ;
Dixon, W. E. .
AUTOMATICA, 2013, 49 (01) :82-92
[3]   Reinforcement learning in continuous time and space [J].
Doya, K .
NEURAL COMPUTATION, 2000, 12 (01) :219-245
[4]   Multiple model-based reinforcement learning [J].
Doya, K ;
Samejima, K ;
Katagiri, K ;
Kawato, M .
NEURAL COMPUTATION, 2002, 14 (06) :1347-1369
[5]   Continuous-time adaptive critics [J].
Hanselmann, Thomas ;
Noakes, Lyle ;
Zaknich, Anthony .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2007, 18 (03) :631-647
[6]  
Isaacs R., 1955, TECH REP
[7]   Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems [J].
Jiang, Yu ;
Jiang, Zhong-Ping .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2015, 60 (11) :2917-2929
[8]   Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics [J].
Jiang, Yu ;
Jiang, Zhong-Ping .
AUTOMATICA, 2012, 48 (10) :2699-2704
[9]  
Jungers M., 2007, INT J TOMOGRAPHY STA, V7, P49
[10]   Optimal Tracking Control of Unknown Discrete-Time Linear Systems Using Input-Output Measured Data [J].
Kiumarsi, Bahare ;
Lewis, Frank L. ;
Naghibi-Sistani, Mohammad-Bagher ;
Karimpour, Ali .
IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (12) :2770-2779