Optimal Time-Varying Q-Learning Algorithm for Affine Nonlinear Systems With Coupled Players

被引：0

作者：

Yu, Shuhang ^{[1
]}

Zhang, Huaguang ^{[2
,3
]}

Sun, Jiayue ^{[1
]}

Li, Mei ^{[4
]}

机构：

[1] Northeastern Univ, Coll Informat Sci & Engn, Shenyang 110004, Liaoning, Peoples R China

[2] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110004, Liaoning, Peoples R China

[3] Northeastern Univ, Sch Informat Sci & Engn, Shenyang 110004, Liaoning, Peoples R China

[4] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110004, Liaoning, Peoples R China

来源：

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2025年

基金：

中国国家自然科学基金;

关键词：

Heuristic algorithms; Nash equilibrium; Games; Couplings; Q-learning; Optimal control; Time-varying systems; System dynamics; Differential games; Sun; Adaptive dynamic programming (ADP); finite-horizon; mixed H-2/H-infinity control; neural network (NN); TRACKING CONTROL; GAMES; SYNCHRONIZATION; FEEDBACK;

D O I：

10.1109/TSMC.2025.3580988

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

To address the finite-horizon coupled two-player mixed H-2/H-infinity control challenge within a continuous-time affine nonlinear system, this research introduces a distinctive Q-function and presents an innovative adaptive dynamic programming (ADP) method that operates autonomously of system-specific information. Initially, we formulate the time-varying Hamilton-Jacobi-Isaacs (HJI) equations, which pose a significant challenge for resolution due to their time-dependent and nonlinear nature. Subsequently, a novel offline policy iteration (PI) algorithm is introduced, highlighting its convergence and reinforcing the substantive proof of the existence of Nash equilibrium points. Moreover, a novel action-dependent Q-function is established to facilitate entirely model-free learning, representing the initial foray into the mixed H-2/H-infinity control problem involving coupled players. The Lyapunov direct approach is employed to ensure the stability of the closed-loop uncertain affine nonlinear system under the ADP-based control scheme, guaranteeing uniform ultimate boundedness (UUB). Finally, a numerical simulation is conducted to validate the effectiveness of the aforementioned ADP-based control approach.

引用

页数：11

共 35 条

[1] A Generalized Minimax Q-Learning Algorithm for Two-Player Zero-Sum Stochastic Games [J].

Diddigi, Raghuram Bharadwaj ;

Kamanchi, Chandramouli ;

Bhatnagar, Shalabh .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (09) :4816-4823

[2] Off-Policy Model-Free Learning for Multi-Player Non-Zero-Sum Games With Constrained Inputs [J].

Huo, Yu ;

Wang, Ding ;

Qiao, Junfei ;

Li, Menghua .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2023, 70 (02) :910-920

[3] Robust Adaptive Dynamic Programming and Feedback Stabilization of Nonlinear Systems [J].

Jiang, Yu ;

Jiang, Zhong-Ping .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (05) :882-893

[4] Non-equilibrium dynamic games and cyber-physical security: A cognitive hierarchy approach [J].

Kanellopoulos, Aris ;

Vamvoudakis, Kyriakos G. .

SYSTEMS & CONTROL LETTERS, 2019, 125 :59-66

[5] Excitation for Adaptive Optimal Control of Nonlinear Systems in Differential Games [J].

Karg, Philipp ;

Koepf, Florian ;

Braun, Christian A. ;

Hohmann, Soeren .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (01) :596-603

[6] Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics [J].

Kiumarsi, Bahare ;

Lewis, Frank L. ;

Modares, Hamidreza ;

Karimpour, Ali ;

Naghibi-Sistani, Mohammad-Bagher .

AUTOMATICA, 2014, 50 (04) :1167-1175

[7] Model-Free Q-Learning for the Tracking Problem of Linear Discrete-Time Systems [J].

Li, Chun ;

Ding, Jinliang ;

Lewis, Frank L. ;

Chai, Tianyou .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) :3191-3201

[8] Off-policy Q-learning: Solving Nash equilibrium of multi-player games with network-induced delay and unmeasured state [J].

Li, Jinna ;

Xiao, Zhenfei ;

Fan, Jialu ;

Chai, Tianyou ;

Lewis, Frank L. L. .

AUTOMATICA, 2022, 136

[9] Off-Policy Reinforcement Learning for Synchronization in Multiagent Graphical Games [J].

Li, Jinna ;

Modares, Hamidreza ;

Chai, Tianyou ;

Lewis, Frank L. ;

Xie, Lihua .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (10) :2434-2445

[10] Multiplayer Stackelberg-Nash Game for Nonlinear System via Value Iteration-Based Integral Reinforcement Learning [J].

Li, Man ;

Qin, Jiahu ;

Freris, Nikolaos M. ;

Ho, Daniel W. C. .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (04) :1429-1440

← 1 2 3 4 →