Model-free adaptive optimal control of continuous-time nonlinear non-zero-sum games based on reinforcement learning

被引：5

作者：

Guo, Lei ^{[1
]}

Zhao, Han ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100876, Peoples R China

来源：

IET CONTROL THEORY AND APPLICATIONS | 2023年 / 17卷 / 02期

基金：

中国国家自然科学基金;

关键词：

APPROXIMATE OPTIMAL-CONTROL; LINEAR-SYSTEMS;

D O I：

10.1049/cth2.12376

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, two novel algorithms to find the Nash equilibrium solution of the non-zero-sum games for continuous-time input-affine nonlinear systems are presented. Based on integral reinforcement learning method, the integral-exploration-coupled Hamilton-Jacobi (HJ) equations are derived, which does not contain any information of the system dynamics. Then, based on neural networks approximation, two different adaptive tuning law of weights are given to estimate the approximate solution of the coupled HJ equations. Both two algorithms can estimate the value function and the policy without knowing or identifying the system dynamics. The closed-loop system stability and the convergence of weights are guaranteed based on Lyapunov analysis. Finally, the simulation results of a two-player non-zero-sum game demonstrate the effectiveness of our algorithms.

引用

页码：223 / 239

页数：17

共 39 条

[1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach
Abu-Khalaf, M
Lewis, FL
[J]. AUTOMATICA, 2005, 41 (05) : 779 - 791
[2] Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control
Al-Tamimi, Asma
Lewis, Frank L.
Abu-Khalaf, Murad
[J]. AUTOMATICA, 2007, 43 (03) : 473 - 481
[3] BAIRD LC, 1994, 1994 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOL 1-7, P2448, DOI 10.1109/ICNN.1994.374604
[4] Basar, 1999, DYNAMIC NONCOOPERATI
[5] Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation
Beard, RW
Saridis, GN
Wen, JT
[J]. AUTOMATICA, 1997, 33 (12) : 2159 - 2177
[6] DYNAMIC PROGRAMMING
BELLMAN, R
[J]. SCIENCE, 1966, 153 (3731) : 34 - &
[7] A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems
Bhasin, S.
Kamalapurkar, R.
Johnson, M.
Vamvoudakis, K. G.
Lewis, F. L.
Dixon, W. E.
[J]. AUTOMATICA, 2013, 49 (01) : 82 - 92
[8] UNIVERSAL APPROXIMATION OF AN UNKNOWN MAPPING AND ITS DERIVATIVES USING MULTILAYER FEEDFORWARD NETWORKS
HORNIK, K
STINCHCOMBE, M
WHITE, H
[J]. NEURAL NETWORKS, 1990, 3 (05) : 551 - 560
[9] Ioannou P, 2006, ADV DES CONTROL, P1
[10] Data-based approximate optimal control for nonzero-sum games of multi-player systems using adaptive dynamic programming
Jiang, He
Zhang, Huaguang
Xiao, Geyang
Cui, Xiaohong
[J]. NEUROCOMPUTING, 2018, 275 : 192 - 199

← 1 2 3 4 →