Constrained Reinforcement Learning-Based Closed-Loop Reference Model for Optimal Tracking Control of Unknown Continuous-Time Systems

被引：5

作者：

Zhang, Haoran ^{[1
]}

Zhao, Chunhui ^{[1
]}

Ding, Jinliang ^{[2
]}

机构：

[1] Zhejiang Univ, Coll Control Sci & Engn, Hangzhou 310027, Peoples R China

[2] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Peoples R China

来源：

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING | 2024年 / 21卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Optimal tracking control problem; unknown continuous-time system; reinforcement learning; closed-loop reference model; peaking phenomenon; MRAC; ADAPTIVE-CONTROL; NONLINEAR-SYSTEMS;

D O I：

10.1109/TASE.2023.3340726

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Although reinforcement learning (RL) is effective in stabilizing systems, it faces many challenges in solving the tracking problem of unknown continuous-time systems. One of the major challenges is that RL-based control can hardly satisfy both the transient and steady-state performance requirements for the tracking problem simultaneously. In this study, instead of implementing an RL controller, the RL agent acts as a planner in the closed-loop reference model. The RL-based planner concentrates on tracking performance optimization by the constrained integral RL algorithm. Meanwhile, the system is controlled by the proposed library-based adaptive controller, which contains a library of candidate functions for modeling the unknown system dynamics. A natural gradient-like adaptive law is developed to update the controller, ensuring asymptotic tracking and promoting sparsity in the controller parameter. Compared with the conventional RL-based control, the proposed framework can eliminate the tracking error while avoiding the high-frequency oscillation and peaking phenomenon. Furthermore, we theoretically demonstrate that our approach can improve the transient performance in terms of the L-2 norm of the tracking error and explicitly limit the L-infinity norm of the peaking value through the Lyapunov analysis. Simulations are presented to support the theoretical findings at the end of the paper.

引用

页码：7312 / 7324

页数：13

共 40 条

[1] Implicit Regularization and Momentum Algorithms in Nonlinearly Parameterized Adaptive Control and Prediction [J].

Boffi, Nicholas M. ;

Slotine, Jean-Jacques E. .

NEURAL COMPUTATION, 2021, 33 (03) :590-673

[2] Discovering governing equations from data by sparse identification of nonlinear dynamical systems [J].

Brunton, Steven L. ;

Proctor, Joshua L. ;

Kutz, J. Nathan .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (15) :3932-3937

[3]

Cecilia A, 2021, 2021 EUROPEAN CONTROL CONFERENCE (ECC), P2187, DOI 10.23919/ECC54610.0000/2021.9655070

[4] Dissipativity-based finite-time asynchronous output feedback control for wind turbine system via a hidden Markov model [J].

Cheng, Peng ;

Wang, Hai ;

Stojanovic, Vladimir ;

Liu, Fei ;

He, Shuping ;

Shi, Kaibo .

INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2022, 53 (15) :3177-3189

[5] Data-driven control of hydraulic servo actuator: An event-triggered adaptive dynamic programming approach [J].

Djordjevic, Vladimir ;

Tao, Hongfeng ;

Song, Xiaona ;

He, Shuping ;

Gao, Weinan ;

Stojanovic, Vladimir .

MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (05) :8561-8582

[6]

Dydek ZT, 2010, IEEE CONTR SYST MAG, V30, P32, DOI 10.1109/MCS.2010.936292

[7] Adaptive control of a class of time-delay systems [J].

Evesque, S ;

Annaswamy, AM ;

Niculescu, S ;

Dowling, AP .

JOURNAL OF DYNAMIC SYSTEMS MEASUREMENT AND CONTROL-TRANSACTIONS OF THE ASME, 2003, 125 (02) :186-193

[8] On Adaptive Control With Closed-Loop Reference Models: Transients, Oscillations, and Peaking [J].

Gibson, Travis E. ;

Annaswamy, Anuradha M. ;

Lavretsky, Eugene .

IEEE ACCESS, 2013, 1 :703-717

[9] Online Policies for Real-Time Control Using MRAC-RL [J].

Guha, Anubhav ;

Annaswamy, Anuradha M. .

2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, :1808-1813

[10]

Krstic M., 2021, ARXIV

← 1 2 3 4 →