Receding Horizon Actor-Critic Learning Control for Nonlinear Time-Delay Systems With Unknown Dynamics

被引:12
作者
Liu, Jiahang [1 ,2 ]
Zhang, Xinglong [1 ]
Xu, Xin [1 ]
Xiong, Quan [1 ]
机构
[1] Natl Univ Def Technol, Coll Intelligence Sci & Technol, Changsha 410073, Peoples R China
[2] Beijing Inst Biotechnol, Beijing 100071, Peoples R China
来源
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2023年 / 53卷 / 08期
基金
中国国家自然科学基金;
关键词
Delay effects; Optimal control; Control systems; Stability criteria; Simulation; Predictive control; Costs; Discrete-time nonlinear systems; Koopman operator; receding horizon control; reinforcement learning (RL); time-delay systems; MODEL-PREDICTIVE CONTROL; KOOPMAN OPERATOR; STABILITY; DESIGN;
D O I
10.1109/TSMC.2023.3254911
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the development of modern mechatronics and networked systems, the controller design of time-delay systems has received notable attention. Time delays can greatly influence the stability and performance of the systems, especially for optimal control design. In this article, we propose a receding horizon actor-critic learning control approach for near-optimal control of nonlinear time-delay systems (RACL-TD) with unknown dynamics. In the proposed approach, a data-driven predictor for nonlinear time-delay systems is first learned based on the Koopman theory using precollected samples. Then, a receding horizon actor-critic architecture is designed to learn a near-optimal control policy. In RACL-TD, the terminal cost is determined by using the Lyapunov-Krasovskii approach so that the influences of the delayed states and control inputs can be well addressed. Furthermore, a relaxed terminal condition is present to reduce the computational cost. The convergence and optimality of RACL-TD in each prediction interval as well as the closed-loop property of the system are discussed and analyzed. Simulation results on a two-stage time-delayed chemical reactor illustrate that RACL-TD can achieve better control performance than nonlinear model predictive control (MPC) and infinite-horizon adaptive dynamic programming. Moreover, RACL-TD can have less computational cost than nonlinear MPC.
引用
收藏
页码:4980 / 4993
页数:14
相关论文
共 47 条
[31]  
Smith O.J., 1957, CHEM ENG PROG, V53, P217
[32]   Learning-based predictive control for linear systems: A unitary approach [J].
Terzi, Enrico ;
Fagiano, Lorenzo ;
Farina, Marcello ;
Scattolini, Riccardo .
AUTOMATICA, 2019, 108
[33]  
Tu J.H., 2014, J COMPUT DYNAM, P391, DOI DOI 10.3934/JCD.2014.1.391
[34]   Dual Heuristic dynamic Programming for nonlinear discrete-time uncertain systems with state delay [J].
Wang, Bin ;
Zhao, Dongbin ;
Alippi, Cesare ;
Liu, Derong .
NEUROCOMPUTING, 2014, 134 :222-229
[35]   Data-Driven Iterative Adaptive Critic Control Toward an Urban Wastewater Treatment Plant [J].
Wang, Ding ;
Ha, Mingming ;
Qiao, Junfei .
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2021, 68 (08) :7362-7369
[36]   An Approximate Neuro-Optimal Solution of Discounted Guaranteed Cost Control Design [J].
Wang, Ding ;
Qiao, Junfei ;
Cheng, Long .
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (01) :77-86
[37]  
Wang D, 2012, PROCEEDINGS OF THE 10TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA 2012), P138, DOI 10.1109/WCICA.2012.6357855
[38]   Adaptive Neural Control of Stochastic Nonlinear Time-Delay Systems With Multiple Constraints [J].
Wang, Tong ;
Qiu, Jianbin ;
Gao, Huijun .
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2017, 47 (08) :1875-1883
[39]  
Wei Qing-Lai, 2010, Acta Automatica Sinica, V36, P121, DOI 10.3724/SP.J.1004.2010.00121
[40]   A Data-Driven Approximation of the Koopman Operator: Extending Dynamic Mode Decomposition [J].
Williams, Matthew O. ;
Kevrekidis, Ioannis G. ;
Rowley, Clarence W. .
JOURNAL OF NONLINEAR SCIENCE, 2015, 25 (06) :1307-1346