Constrained Reinforcement Learning-Based Closed-Loop Reference Model for Optimal Tracking Control of Unknown Continuous-Time Systems

被引:5
作者
Zhang, Haoran [1 ]
Zhao, Chunhui [1 ]
Ding, Jinliang [2 ]
机构
[1] Zhejiang Univ, Coll Control Sci & Engn, Hangzhou 310027, Peoples R China
[2] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Peoples R China
基金
中国国家自然科学基金;
关键词
Optimal tracking control problem; unknown continuous-time system; reinforcement learning; closed-loop reference model; peaking phenomenon; MRAC; ADAPTIVE-CONTROL; NONLINEAR-SYSTEMS;
D O I
10.1109/TASE.2023.3340726
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Although reinforcement learning (RL) is effective in stabilizing systems, it faces many challenges in solving the tracking problem of unknown continuous-time systems. One of the major challenges is that RL-based control can hardly satisfy both the transient and steady-state performance requirements for the tracking problem simultaneously. In this study, instead of implementing an RL controller, the RL agent acts as a planner in the closed-loop reference model. The RL-based planner concentrates on tracking performance optimization by the constrained integral RL algorithm. Meanwhile, the system is controlled by the proposed library-based adaptive controller, which contains a library of candidate functions for modeling the unknown system dynamics. A natural gradient-like adaptive law is developed to update the controller, ensuring asymptotic tracking and promoting sparsity in the controller parameter. Compared with the conventional RL-based control, the proposed framework can eliminate the tracking error while avoiding the high-frequency oscillation and peaking phenomenon. Furthermore, we theoretically demonstrate that our approach can improve the transient performance in terms of the L-2 norm of the tracking error and explicitly limit the L-infinity norm of the peaking value through the Lyapunov analysis. Simulations are presented to support the theoretical findings at the end of the paper.
引用
收藏
页码:7312 / 7324
页数:13
相关论文
共 40 条
[1]   Implicit Regularization and Momentum Algorithms in Nonlinearly Parameterized Adaptive Control and Prediction [J].
Boffi, Nicholas M. ;
Slotine, Jean-Jacques E. .
NEURAL COMPUTATION, 2021, 33 (03) :590-673
[2]   Discovering governing equations from data by sparse identification of nonlinear dynamical systems [J].
Brunton, Steven L. ;
Proctor, Joshua L. ;
Kutz, J. Nathan .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (15) :3932-3937
[3]  
Cecilia A, 2021, 2021 EUROPEAN CONTROL CONFERENCE (ECC), P2187, DOI 10.23919/ECC54610.0000/2021.9655070
[4]   Dissipativity-based finite-time asynchronous output feedback control for wind turbine system via a hidden Markov model [J].
Cheng, Peng ;
Wang, Hai ;
Stojanovic, Vladimir ;
Liu, Fei ;
He, Shuping ;
Shi, Kaibo .
INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2022, 53 (15) :3177-3189
[5]   Data-driven control of hydraulic servo actuator: An event-triggered adaptive dynamic programming approach [J].
Djordjevic, Vladimir ;
Tao, Hongfeng ;
Song, Xiaona ;
He, Shuping ;
Gao, Weinan ;
Stojanovic, Vladimir .
MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (05) :8561-8582
[6]  
Dydek ZT, 2010, IEEE CONTR SYST MAG, V30, P32, DOI 10.1109/MCS.2010.936292
[7]   Adaptive control of a class of time-delay systems [J].
Evesque, S ;
Annaswamy, AM ;
Niculescu, S ;
Dowling, AP .
JOURNAL OF DYNAMIC SYSTEMS MEASUREMENT AND CONTROL-TRANSACTIONS OF THE ASME, 2003, 125 (02) :186-193
[8]   On Adaptive Control With Closed-Loop Reference Models: Transients, Oscillations, and Peaking [J].
Gibson, Travis E. ;
Annaswamy, Anuradha M. ;
Lavretsky, Eugene .
IEEE ACCESS, 2013, 1 :703-717
[9]   Online Policies for Real-Time Control Using MRAC-RL [J].
Guha, Anubhav ;
Annaswamy, Anuradha M. .
2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, :1808-1813
[10]  
Krstic M., 2021, ARXIV