共 35 条
Optimal Time-Varying Q-Learning Algorithm for Affine Nonlinear Systems With Coupled Players
被引:0
作者:
Yu, Shuhang
[1
]
Zhang, Huaguang
[2
,3
]
Sun, Jiayue
[1
]
Li, Mei
[4
]
机构:
[1] Northeastern Univ, Coll Informat Sci & Engn, Shenyang 110004, Liaoning, Peoples R China
[2] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110004, Liaoning, Peoples R China
[3] Northeastern Univ, Sch Informat Sci & Engn, Shenyang 110004, Liaoning, Peoples R China
[4] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110004, Liaoning, Peoples R China
来源:
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS
|
2025年
基金:
中国国家自然科学基金;
关键词:
Heuristic algorithms;
Nash equilibrium;
Games;
Couplings;
Q-learning;
Optimal control;
Time-varying systems;
System dynamics;
Differential games;
Sun;
Adaptive dynamic programming (ADP);
finite-horizon;
mixed H-2/H-infinity control;
neural network (NN);
TRACKING CONTROL;
GAMES;
SYNCHRONIZATION;
FEEDBACK;
D O I:
10.1109/TSMC.2025.3580988
中图分类号:
TP [自动化技术、计算机技术];
学科分类号:
0812 ;
摘要:
To address the finite-horizon coupled two-player mixed H-2/H-infinity control challenge within a continuous-time affine nonlinear system, this research introduces a distinctive Q-function and presents an innovative adaptive dynamic programming (ADP) method that operates autonomously of system-specific information. Initially, we formulate the time-varying Hamilton-Jacobi-Isaacs (HJI) equations, which pose a significant challenge for resolution due to their time-dependent and nonlinear nature. Subsequently, a novel offline policy iteration (PI) algorithm is introduced, highlighting its convergence and reinforcing the substantive proof of the existence of Nash equilibrium points. Moreover, a novel action-dependent Q-function is established to facilitate entirely model-free learning, representing the initial foray into the mixed H-2/H-infinity control problem involving coupled players. The Lyapunov direct approach is employed to ensure the stability of the closed-loop uncertain affine nonlinear system under the ADP-based control scheme, guaranteeing uniform ultimate boundedness (UUB). Finally, a numerical simulation is conducted to validate the effectiveness of the aforementioned ADP-based control approach.
引用
收藏
页数:11
相关论文