An efficient model-free adaptive optimal control of continuous-time nonlinear non-zero-sum games based on integral reinforcement learning with exploration

被引：0

作者：

Guo, Lei ^{[1
]}

Xiong, Wenbo ^{[1
]}

Song, Yuan ^{[1
]}

Gan, Dongming ^{[2
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing, Peoples R China

[2] Purdue Univ, Sch Engn Technol, W Lafayette, IN USA

来源：

IET CONTROL THEORY AND APPLICATIONS | 2024年 / 18卷 / 06期

基金：

中国国家自然科学基金;

关键词：

adaptive control; dynamic programming; game theory; optimal control; OPTIMAL TRACKING CONTROL; SYSTEMS;

D O I：

10.1049/cth2.12610

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

To reduce the learning time and space occupation, this study presents a novel model-free algorithm for obtaining the Nash equilibrium solution of continuous-time nonlinear non-zero-sum games. Based on the integral reinforcement learning method, a new integral HJ equation that can quickly and cooperatively determine the Nash equilibrium strategies of all players is proposed. By leveraging the neural network approximation and gradient descent method, simultaneous continuous-time adaptive tuning laws are provided for both critic and actor neural network weights. These laws facilitate the estimation of the optimal value function and optimal policy without requiring knowledge or identification of the system's dynamics. The closed-loop system stability and convergence of weights are guaranteed through the Lyapunov analysis. Additionally, the algorithm is enhanced to reduce the number of auxiliary NNs used in the critic. The simulation results for a two-player non-zero-sum game validate the effectiveness of the proposed algorithm.

引用

页码：748 / 763

页数：16

共 43 条

[1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach
Abu-Khalaf, M
Lewis, FL
[J]. AUTOMATICA, 2005, 41 (05) : 779 - 791
[2] Reinforcement Learning: Connections, Surprises , Challenges
Barto, Andrew G.
[J]. AI MAGAZINE, 2019, 40 (01) : 3 - 15
[3] Basar T, 1998, Dynamic Noncooperative Game Theory, V2nd
[4] A novel Z-function-based completely model-free reinforcement learning method to finite-horizon zero-sum game of nonlinear system
Chen, Zhe
Xue, Wenqian
Li, Ning
Lian, Bosen
Lewis, Frank L.
[J]. NONLINEAR DYNAMICS, 2022, 107 (03) : 2563 - 2582
[5] Multiple model-based reinforcement learning
Doya, K
Samejima, K
Katagiri, K
Kawato, M
[J]. NEURAL COMPUTATION, 2002, 14 (06) : 1347 - 1369
[6] Online adaptive optimal control algorithm based on synchronous integral reinforcement learning with explorations
Guo, Lei
Zhao, Han
[J]. NEUROCOMPUTING, 2023, 520 : 250 - 261
[7] Model-free adaptive optimal control of continuous-time nonlinear non-zero-sum games based on reinforcement learning
Guo, Lei
Zhao, Han
[J]. IET CONTROL THEORY AND APPLICATIONS, 2023, 17 (02) : 223 - 239
[8] Ioannou P, 2006, ADV DES CONTROL, P1, DOI 10.1137/1.9780898718652
[9] Janner M, 2019, ADV NEUR IN, V32
[10] Critic-only adaptive dynamic programming algorithms' applications to the secure control of cyber-physical systems
Jiang, He
Zhang, Huaguang
Xie, Xiangpeng
[J]. ISA TRANSACTIONS, 2020, 104 : 138 - 144

← 1 2 3 4 5 →