Model-free adaptive optimal control of continuous-time nonlinear non-zero-sum games based on reinforcement learning

被引:5
作者
Guo, Lei [1 ]
Zhao, Han [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100876, Peoples R China
基金
中国国家自然科学基金;
关键词
APPROXIMATE OPTIMAL-CONTROL; LINEAR-SYSTEMS;
D O I
10.1049/cth2.12376
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, two novel algorithms to find the Nash equilibrium solution of the non-zero-sum games for continuous-time input-affine nonlinear systems are presented. Based on integral reinforcement learning method, the integral-exploration-coupled Hamilton-Jacobi (HJ) equations are derived, which does not contain any information of the system dynamics. Then, based on neural networks approximation, two different adaptive tuning law of weights are given to estimate the approximate solution of the coupled HJ equations. Both two algorithms can estimate the value function and the policy without knowing or identifying the system dynamics. The closed-loop system stability and the convergence of weights are guaranteed based on Lyapunov analysis. Finally, the simulation results of a two-player non-zero-sum game demonstrate the effectiveness of our algorithms.
引用
收藏
页码:223 / 239
页数:17
相关论文
共 39 条
  • [1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach
    Abu-Khalaf, M
    Lewis, FL
    [J]. AUTOMATICA, 2005, 41 (05) : 779 - 791
  • [2] Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control
    Al-Tamimi, Asma
    Lewis, Frank L.
    Abu-Khalaf, Murad
    [J]. AUTOMATICA, 2007, 43 (03) : 473 - 481
  • [3] BAIRD LC, 1994, 1994 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOL 1-7, P2448, DOI 10.1109/ICNN.1994.374604
  • [4] Basar, 1999, DYNAMIC NONCOOPERATI
  • [5] Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation
    Beard, RW
    Saridis, GN
    Wen, JT
    [J]. AUTOMATICA, 1997, 33 (12) : 2159 - 2177
  • [6] DYNAMIC PROGRAMMING
    BELLMAN, R
    [J]. SCIENCE, 1966, 153 (3731) : 34 - &
  • [7] A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems
    Bhasin, S.
    Kamalapurkar, R.
    Johnson, M.
    Vamvoudakis, K. G.
    Lewis, F. L.
    Dixon, W. E.
    [J]. AUTOMATICA, 2013, 49 (01) : 82 - 92
  • [8] UNIVERSAL APPROXIMATION OF AN UNKNOWN MAPPING AND ITS DERIVATIVES USING MULTILAYER FEEDFORWARD NETWORKS
    HORNIK, K
    STINCHCOMBE, M
    WHITE, H
    [J]. NEURAL NETWORKS, 1990, 3 (05) : 551 - 560
  • [9] Ioannou P, 2006, ADV DES CONTROL, P1
  • [10] Data-based approximate optimal control for nonzero-sum games of multi-player systems using adaptive dynamic programming
    Jiang, He
    Zhang, Huaguang
    Xiao, Geyang
    Cui, Xiaohong
    [J]. NEUROCOMPUTING, 2018, 275 : 192 - 199