Receding Horizon Actor-Critic Learning Control for Nonlinear Time-Delay Systems With Unknown Dynamics

被引：12

作者：

Liu, Jiahang ^{[1
,2
]}

Zhang, Xinglong ^{[1
]}

Xu, Xin ^{[1
]}

Xiong, Quan ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Intelligence Sci & Technol, Changsha 410073, Peoples R China

[2] Beijing Inst Biotechnol, Beijing 100071, Peoples R China

来源：

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2023年 / 53卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Delay effects; Optimal control; Control systems; Stability criteria; Simulation; Predictive control; Costs; Discrete-time nonlinear systems; Koopman operator; receding horizon control; reinforcement learning (RL); time-delay systems; MODEL-PREDICTIVE CONTROL; KOOPMAN OPERATOR; STABILITY; DESIGN;

D O I：

10.1109/TSMC.2023.3254911

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the development of modern mechatronics and networked systems, the controller design of time-delay systems has received notable attention. Time delays can greatly influence the stability and performance of the systems, especially for optimal control design. In this article, we propose a receding horizon actor-critic learning control approach for near-optimal control of nonlinear time-delay systems (RACL-TD) with unknown dynamics. In the proposed approach, a data-driven predictor for nonlinear time-delay systems is first learned based on the Koopman theory using precollected samples. Then, a receding horizon actor-critic architecture is designed to learn a near-optimal control policy. In RACL-TD, the terminal cost is determined by using the Lyapunov-Krasovskii approach so that the influences of the delayed states and control inputs can be well addressed. Furthermore, a relaxed terminal condition is present to reduce the computational cost. The convergence and optimality of RACL-TD in each prediction interval as well as the closed-loop property of the system are discussed and analyzed. Simulation results on a two-stage time-delayed chemical reactor illustrate that RACL-TD can achieve better control performance than nonlinear model predictive control (MPC) and infinite-horizon adaptive dynamic programming. Moreover, RACL-TD can have less computational cost than nonlinear MPC.

引用

页码：4980 / 4993

页数：14

共 47 条

[31]

Smith O.J., 1957, CHEM ENG PROG, V53, P217

[32] Learning-based predictive control for linear systems: A unitary approach [J].

Terzi, Enrico ;

Fagiano, Lorenzo ;

Farina, Marcello ;

Scattolini, Riccardo .

AUTOMATICA, 2019, 108

[33]

Tu J.H., 2014, J COMPUT DYNAM, P391, DOI DOI 10.3934/JCD.2014.1.391

[34] Dual Heuristic dynamic Programming for nonlinear discrete-time uncertain systems with state delay [J].

Wang, Bin ;

Zhao, Dongbin ;

Alippi, Cesare ;

Liu, Derong .

NEUROCOMPUTING, 2014, 134 :222-229

[35] Data-Driven Iterative Adaptive Critic Control Toward an Urban Wastewater Treatment Plant [J].

Wang, Ding ;

Ha, Mingming ;

Qiao, Junfei .

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2021, 68 (08) :7362-7369

[36] An Approximate Neuro-Optimal Solution of Discounted Guaranteed Cost Control Design [J].

Wang, Ding ;

Qiao, Junfei ;

Cheng, Long .

IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (01) :77-86

[37]

Wang D, 2012, PROCEEDINGS OF THE 10TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA 2012), P138, DOI 10.1109/WCICA.2012.6357855

[38] Adaptive Neural Control of Stochastic Nonlinear Time-Delay Systems With Multiple Constraints [J].

Wang, Tong ;

Qiu, Jianbin ;

Gao, Huijun .

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2017, 47 (08) :1875-1883

[39]

Wei Qing-Lai, 2010, Acta Automatica Sinica, V36, P121, DOI 10.3724/SP.J.1004.2010.00121

[40] A Data-Driven Approximation of the Koopman Operator: Extending Dynamic Mode Decomposition [J].

Williams, Matthew O. ;

Kevrekidis, Ioannis G. ;

Rowley, Clarence W. .

JOURNAL OF NONLINEAR SCIENCE, 2015, 25 (06) :1307-1346

← 1 2 3 4 5 →