Model-free adaptive optimal control of continuous-time nonlinear non-zero-sum games based on reinforcement learning

被引:5
作者
Guo, Lei [1 ]
Zhao, Han [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100876, Peoples R China
基金
中国国家自然科学基金;
关键词
APPROXIMATE OPTIMAL-CONTROL; LINEAR-SYSTEMS;
D O I
10.1049/cth2.12376
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, two novel algorithms to find the Nash equilibrium solution of the non-zero-sum games for continuous-time input-affine nonlinear systems are presented. Based on integral reinforcement learning method, the integral-exploration-coupled Hamilton-Jacobi (HJ) equations are derived, which does not contain any information of the system dynamics. Then, based on neural networks approximation, two different adaptive tuning law of weights are given to estimate the approximate solution of the coupled HJ equations. Both two algorithms can estimate the value function and the policy without knowing or identifying the system dynamics. The closed-loop system stability and the convergence of weights are guaranteed based on Lyapunov analysis. Finally, the simulation results of a two-player non-zero-sum game demonstrate the effectiveness of our algorithms.
引用
收藏
页码:223 / 239
页数:17
相关论文
共 39 条
  • [21] Discrete-Time Non-Zero-Sum Games With Completely Unknown Dynamics
    Song, Ruizhuo
    Wei, Qinglai
    Zhang, Huaguang
    Lewis, Frank L.
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (06) : 2929 - 2943
  • [22] Sutton R. S., 1988, Machine Learning, V3, P9, DOI 10.1007/BF00115009
  • [23] Sutton RS, 2018, ADAPT COMPUT MACH LE, P1
  • [24] Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach
    Vamvoudakis, Kyriakos G.
    [J]. SYSTEMS & CONTROL LETTERS, 2017, 100 : 14 - 20
  • [25] Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems
    Vamvoudakis, Kyriakos G.
    [J]. AUTOMATICA, 2015, 61 : 274 - 281
  • [26] Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations
    Vamvoudakis, Kyriakos G.
    Lewis, Frank L.
    [J]. AUTOMATICA, 2011, 47 (08) : 1556 - 1569
  • [27] Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem
    Vamvoudakis, Kyriakos G.
    Lewis, Frank L.
    [J]. AUTOMATICA, 2010, 46 (05) : 878 - 888
  • [28] Adaptive optimal control for continuous-time linear systems based on policy iteration
    Vrabie, D.
    Pastravanu, O.
    Abu-Khalaf, M.
    Lewis, F. L.
    [J]. AUTOMATICA, 2009, 45 (02) : 477 - 484
  • [29] Watkins C, 1989, THESIS U CAMBRIDGE C
  • [30] Finite-Approximation-Error-Based Discrete-Time Iterative Adaptive Dynamic Programming
    Wei, Qinglai
    Wang, Fei-Yue
    Liu, Derong
    Yang, Xiong
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (12) : 2820 - 2833