DTSRL: Efficient reinforcement learning for approximate optimal tracking control of discrete-time nonlinear systems

被引:0
作者
Fu, Hao [1 ]
Zhou, Shuai
Liu, Wei
机构
[1] Wuhan Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430081, Peoples R China
来源
COMMUNICATIONS IN NONLINEAR SCIENCE AND NUMERICAL SIMULATION | 2025年 / 150卷
基金
中国国家自然科学基金;
关键词
Reinforcement learning; Neural network; Learning efficiency; Tracking control;
D O I
10.1016/j.cnsns.2025.109019
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
The optimal tracking control for unknown nonlinear systems via reinforcement learning (RL) is still an open problem in the absence of an initial stabilizing policy. Its fundamental challenge lies in improving poor learning efficiency, arising from an additional switching term. In view of such a difficulty, this paper proposes a dual-time-scale RL (DTSRL) algorithm by devising an alternate learning mechanism aiming to enhance learning efficiency. Specifically, a convergence boundary factor is introduced into update of the actor-critic networks to establish an updating regulator, thereby relaxing the switching condition. Then, an alternate learning mechanism is devised to achieve a steady switching procedure by integrating this updating regulator into a dual-time-scale manner, effectively mitigating the impact of approximation errors. In addition, the tracking error and the weight estimation errors of the actor-critic networks are demonstrated to be uniformly ultimately bounded by means of the Lyapunov method. A comparative simulation is carried out to show that our DTSRL outperforms the existing works in terms of the learning efficiency.
引用
收藏
页数:16
相关论文
共 30 条
[1]   Single critic network based fault-tolerant tracking control for underactuated AUV with actuator fault [J].
Che, Gaofeng .
OCEAN ENGINEERING, 2022, 254
[2]  
Dierks T, 2010, P AMER CONTR CONF, P1568
[3]   Simultaneous control of rigidity and hand tremor by adaptive fuzzy Q-learning [J].
Faraji, Behnam ;
Paghaleh, Saeed Mollahoseini ;
Gheisarnejad, Meysam ;
Khooban, Mohammad-Hassan .
COMMUNICATIONS IN NONLINEAR SCIENCE AND NUMERICAL SIMULATION, 2024, 130
[4]   Data-Based Optimal Synchronization Control for Discrete-Time Nonlinear Heterogeneous Multiagent Systems [J].
Fu, Hao ;
Chen, Xin ;
Wang, Wei ;
Wu, Min .
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (04) :2477-2490
[5]  
Joshi G, 2019, IEEE DECIS CONTR P, P4601, DOI 10.1109/CDC40024.2019.9029173
[6]   Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics [J].
Kiumarsi, Bahare ;
Lewis, Frank L. ;
Modares, Hamidreza ;
Karimpour, Ali ;
Naghibi-Sistani, Mohammad-Bagher .
AUTOMATICA, 2014, 50 (04) :1167-1175
[7]   A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems [J].
Li, Chun ;
Ding, Jinliang ;
Lewis, Frank L. ;
Chai, Tianyou .
AUTOMATICA, 2021, 129
[8]   Approximate Optimal Robust Tracking Control Based on State Error and Derivative Without Initial Admissible Input [J].
Li, Dongdong ;
Dong, Jiuxiang .
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2024, 54 (02) :1059-1069
[9]   Off-Policy Interleaved Q-Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems [J].
Li, Jinna ;
Chai, Tianyou ;
Lewis, Frank L. ;
Ding, Zhengtao ;
Jiang, Yi .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (05) :1308-1320
[10]   Event-triggered explorized IRL-based decentralized fault-tolerant guaranteed cost control for interconnected systems against actuator failures [J].
Liang, Yuling ;
Luo, Yanhong ;
Su, Hanguang ;
Zhang, Xiaoling ;
Chang, Hongbin ;
Zhang, Jun .
NEUROCOMPUTING, 2025, 615