Adaptive Q-Learning Based Model-Free H∞ Control of Continuous-Time Nonlinear Systems: Theory and Application

被引:4
作者
Zhao, Jun [1 ]
Lv, Yongfeng [2 ]
Wang, Zhangu [1 ]
Zhao, Ziliang [1 ]
机构
[1] Shandong Univ Sci & Technol, Coll Transportat, Shandong Key Lab Hydrogen Elect Hybrid Power Syst, Qingdao 266590, Peoples R China
[2] Taiyuan Univ Technol, Coll Elect & Power Engn, Taiyuan 030024, Peoples R China
来源
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2025年 / 9卷 / 02期
基金
中国国家自然科学基金;
关键词
Q-learning; Artificial neural networks; Nonlinear systems; Cost function; Control systems; Heuristic algorithms; System dynamics; Reinforcement learning; H-infinity control; learning law; LINEAR-SYSTEMS;
D O I
10.1109/TETCI.2024.3449870
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although model based H-infinity control scheme for nonlinear continuous-time (CT) systems with unknown system dynamics has been extensively studied, model-free H-infinity control of nonlinear CT systems via Q-learning is still a challenging problem. This paper develops a novel Q-learning based model-free H-infinity control scheme for nonlinear CT systems, where the adaptive critic and actor continuously and simultaneously update each other, eliminating the need for iterative steps. As a result, a hybrid structure is avoided and there is no longer a requirement for an initial stabilizing control policy. To obtain the H-infinity control of the CT nonlinear system, the Q-learning strategy is introduced to online resolve the H-infinity control problem in a non-iterative approach, where the system dynamics are not required. In addition, a new learning law is further developed by utilizing a sliding mode scheme to online update the critic neural network (NN) weights. Due to the strong convergence of critic NN weights, the actor NN used in most H-infinity control algorithms is removed. Finally, numerical simulation and experimental results of an adaptive cruise control (ACC) system based on a real vehicle effectively demonstrate the feasibility of the presented control method and learning algorithm.
引用
收藏
页码:1143 / 1152
页数:10
相关论文
共 35 条
[1]   Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].
Abu-Khalaf, M ;
Lewis, FL .
AUTOMATICA, 2005, 41 (05) :779-791
[2]  
BAIRD LC, 1994, 1994 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOL 1-7, P2448, DOI 10.1109/ICNN.1994.374604
[3]   Optimal Actor-Critic Policy With Optimized Training Datasets [J].
Banerjee, Chayan ;
Chen, Zhiyong ;
Noman, Nasimul ;
Zamani, Mohsen .
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2022, 6 (06) :1324-1334
[4]   Game-Theoretic Inverse Reinforcement Learning: A Differential Pontryagin's Maximum Principle Approach [J].
Cao, Kun ;
Xie, Lihua .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (11) :9506-9513
[5]  
Chen AS, 2019, IEEE DECIS CONTR P, P1007, DOI 10.1109/CDC40024.2019.9030116
[6]   Homotopic policy iteration-based learning design for unknown linear continuous-time systemsx2729; [J].
Chen, Ci ;
Lewis, Frank L. ;
Li, Bo .
AUTOMATICA, 2022, 138
[7]   Gradient-based and least-squares-based iterative algorithms for Hammerstein systems using the hierarchical identification principle [J].
Ding, Feng ;
Liu, Xinggao ;
Chu, Jian .
IET CONTROL THEORY AND APPLICATIONS, 2013, 7 (02) :176-184
[8]  
Du Y, 2023, IEEE T EM TOP COMP I, V7, P1036, DOI [10.1109/TETCI.2022.3145706, 10.1109/IECON49645.2022.9968766]
[9]   Adaptive nearly optimal control for a class of continuous-time nonaffine nonlinear systems with inequality constraints [J].
Fan, Quan-Yong ;
Yang, Guang-Hong .
ISA TRANSACTIONS, 2017, 66 :122-133
[10]   Robust Adaptive Dynamic Programming of Two-Player Zero-Sum Games for Continuous-Time Linear Systems [J].
Fu, Yue ;
Fu, Jun ;
Chai, Tianyou .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2015, 26 (12) :3314-3319