Model-free adaptive optimal control of continuous-time nonlinear non-zero-sum games based on reinforcement learning

被引：5

作者：

Guo, Lei ^{[1
]}

Zhao, Han ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100876, Peoples R China

来源：

IET CONTROL THEORY AND APPLICATIONS | 2023年 / 17卷 / 02期

基金：

中国国家自然科学基金;

关键词：

APPROXIMATE OPTIMAL-CONTROL; LINEAR-SYSTEMS;

D O I：

10.1049/cth2.12376

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, two novel algorithms to find the Nash equilibrium solution of the non-zero-sum games for continuous-time input-affine nonlinear systems are presented. Based on integral reinforcement learning method, the integral-exploration-coupled Hamilton-Jacobi (HJ) equations are derived, which does not contain any information of the system dynamics. Then, based on neural networks approximation, two different adaptive tuning law of weights are given to estimate the approximate solution of the coupled HJ equations. Both two algorithms can estimate the value function and the policy without knowing or identifying the system dynamics. The closed-loop system stability and the convergence of weights are guaranteed based on Lyapunov analysis. Finally, the simulation results of a two-player non-zero-sum game demonstrate the effectiveness of our algorithms.

引用

页码：223 / 239

页数：17

共 39 条

[21] Discrete-Time Non-Zero-Sum Games With Completely Unknown Dynamics
Song, Ruizhuo
Wei, Qinglai
Zhang, Huaguang
Lewis, Frank L.
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (06) : 2929 - 2943
[22] Sutton R. S., 1988, Machine Learning, V3, P9, DOI 10.1007/BF00115009
[23] Sutton RS, 2018, ADAPT COMPUT MACH LE, P1
[24] Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach
Vamvoudakis, Kyriakos G.
[J]. SYSTEMS & CONTROL LETTERS, 2017, 100 : 14 - 20
[25] Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems
Vamvoudakis, Kyriakos G.
[J]. AUTOMATICA, 2015, 61 : 274 - 281
[26] Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations
Vamvoudakis, Kyriakos G.
Lewis, Frank L.
[J]. AUTOMATICA, 2011, 47 (08) : 1556 - 1569
[27] Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem
Vamvoudakis, Kyriakos G.
Lewis, Frank L.
[J]. AUTOMATICA, 2010, 46 (05) : 878 - 888
[28] Adaptive optimal control for continuous-time linear systems based on policy iteration
Vrabie, D.
Pastravanu, O.
Abu-Khalaf, M.
Lewis, F. L.
[J]. AUTOMATICA, 2009, 45 (02) : 477 - 484
[29] Watkins C, 1989, THESIS U CAMBRIDGE C
[30] Finite-Approximation-Error-Based Discrete-Time Iterative Adaptive Dynamic Programming
Wei, Qinglai
Wang, Fei-Yue
Liu, Derong
Yang, Xiong
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (12) : 2820 - 2833

← 1 2 3 4 →