Optimal Time-Varying Q-Learning Algorithm for Affine Nonlinear Systems With Coupled Players

被引:0
作者
Yu, Shuhang [1 ]
Zhang, Huaguang [2 ,3 ]
Sun, Jiayue [1 ]
Li, Mei [4 ]
机构
[1] Northeastern Univ, Coll Informat Sci & Engn, Shenyang 110004, Liaoning, Peoples R China
[2] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110004, Liaoning, Peoples R China
[3] Northeastern Univ, Sch Informat Sci & Engn, Shenyang 110004, Liaoning, Peoples R China
[4] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110004, Liaoning, Peoples R China
来源
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2025年
基金
中国国家自然科学基金;
关键词
Heuristic algorithms; Nash equilibrium; Games; Couplings; Q-learning; Optimal control; Time-varying systems; System dynamics; Differential games; Sun; Adaptive dynamic programming (ADP); finite-horizon; mixed H-2/H-infinity control; neural network (NN); TRACKING CONTROL; GAMES; SYNCHRONIZATION; FEEDBACK;
D O I
10.1109/TSMC.2025.3580988
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To address the finite-horizon coupled two-player mixed H-2/H-infinity control challenge within a continuous-time affine nonlinear system, this research introduces a distinctive Q-function and presents an innovative adaptive dynamic programming (ADP) method that operates autonomously of system-specific information. Initially, we formulate the time-varying Hamilton-Jacobi-Isaacs (HJI) equations, which pose a significant challenge for resolution due to their time-dependent and nonlinear nature. Subsequently, a novel offline policy iteration (PI) algorithm is introduced, highlighting its convergence and reinforcing the substantive proof of the existence of Nash equilibrium points. Moreover, a novel action-dependent Q-function is established to facilitate entirely model-free learning, representing the initial foray into the mixed H-2/H-infinity control problem involving coupled players. The Lyapunov direct approach is employed to ensure the stability of the closed-loop uncertain affine nonlinear system under the ADP-based control scheme, guaranteeing uniform ultimate boundedness (UUB). Finally, a numerical simulation is conducted to validate the effectiveness of the aforementioned ADP-based control approach.
引用
收藏
页数:11
相关论文
共 35 条
[1]   A Generalized Minimax Q-Learning Algorithm for Two-Player Zero-Sum Stochastic Games [J].
Diddigi, Raghuram Bharadwaj ;
Kamanchi, Chandramouli ;
Bhatnagar, Shalabh .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (09) :4816-4823
[2]   Off-Policy Model-Free Learning for Multi-Player Non-Zero-Sum Games With Constrained Inputs [J].
Huo, Yu ;
Wang, Ding ;
Qiao, Junfei ;
Li, Menghua .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2023, 70 (02) :910-920
[3]   Robust Adaptive Dynamic Programming and Feedback Stabilization of Nonlinear Systems [J].
Jiang, Yu ;
Jiang, Zhong-Ping .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (05) :882-893
[4]   Non-equilibrium dynamic games and cyber-physical security: A cognitive hierarchy approach [J].
Kanellopoulos, Aris ;
Vamvoudakis, Kyriakos G. .
SYSTEMS & CONTROL LETTERS, 2019, 125 :59-66
[5]   Excitation for Adaptive Optimal Control of Nonlinear Systems in Differential Games [J].
Karg, Philipp ;
Koepf, Florian ;
Braun, Christian A. ;
Hohmann, Soeren .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (01) :596-603
[6]   Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics [J].
Kiumarsi, Bahare ;
Lewis, Frank L. ;
Modares, Hamidreza ;
Karimpour, Ali ;
Naghibi-Sistani, Mohammad-Bagher .
AUTOMATICA, 2014, 50 (04) :1167-1175
[7]   Model-Free Q-Learning for the Tracking Problem of Linear Discrete-Time Systems [J].
Li, Chun ;
Ding, Jinliang ;
Lewis, Frank L. ;
Chai, Tianyou .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) :3191-3201
[8]   Off-policy Q-learning: Solving Nash equilibrium of multi-player games with network-induced delay and unmeasured state [J].
Li, Jinna ;
Xiao, Zhenfei ;
Fan, Jialu ;
Chai, Tianyou ;
Lewis, Frank L. L. .
AUTOMATICA, 2022, 136
[9]   Off-Policy Reinforcement Learning for Synchronization in Multiagent Graphical Games [J].
Li, Jinna ;
Modares, Hamidreza ;
Chai, Tianyou ;
Lewis, Frank L. ;
Xie, Lihua .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (10) :2434-2445
[10]   Multiplayer Stackelberg-Nash Game for Nonlinear System via Value Iteration-Based Integral Reinforcement Learning [J].
Li, Man ;
Qin, Jiahu ;
Freris, Nikolaos M. ;
Ho, Daniel W. C. .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (04) :1429-1440