Adaptive optimal output tracking of continuous-time systems via output-feedback-based reinforcement learning

被引:36
作者
Chen, Ci [1 ,2 ]
Xie, Lihua [3 ]
Xie, Kan [1 ,4 ]
Lewis, Frank L. [5 ]
Xie, Shengli [6 ,7 ]
机构
[1] Guangdong Univ Technol, Sch Automat, Guangzhou, Peoples R China
[2] Guangdong Key Lab IoT Informat Technol, Guangzhou, Peoples R China
[3] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore, Singapore
[4] 111 Ctr Intelligent Batch Mfg Based IoT Technol, Guangzhou, Peoples R China
[5] Univ Texas Arlington, UTA Res Inst, Ft Worth, TX USA
[6] Minist Educ, Key Lab Intelligent Informat Proc & Syst Integrat, Guangzhou, Peoples R China
[7] Guangdong HongKong Macao Joint Lab Smart Discrete, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Reinforcement learning; Off-policy; Output tracking; Output feedback; Adaptive optimal control; LINEAR-SYSTEMS;
D O I
10.1016/j.automatica.2022.110581
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning provides a powerful tool for designing a satisfactory controller through interactions with the environment. Although off-policy learning algorithms were recently designed for tracking problems, most of these results either are full-state feedback or have bounded control errors, which may not be flexible or desirable for engineering problems in the real world. To address these problems, we propose an output-feedback-based reinforcement learning approach that allows us to find the optimal control solution using input-output data and ensure the asymptotic tracking control of continuous-time systems. More specifically, we first propose a dynamical controller revised from the standard output regulation theory and use it to formulate an optimal output tracking problem. Then, a state observer is used to re-express the system state. Consequently, we address the rank issue of the parameterization matrix and analyze the state re-expression error that are crucial for transforming the off-policy learning into an output-feedback form. A comprehensive simulation study is given to demonstrate the effectiveness of the proposed approach.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:14
相关论文
共 48 条
[1]  
[Anonymous], 2020, Trans. Cybern.
[2]  
[Anonymous], 2016, Introduction to linear algebra
[3]  
Bertsekas DP, 1995, Dynamic Programming and Optimal Control, V1
[4]  
Chen C.-T., 1999, The Oxford Series in Electrical and Computer Engineering, V3rd
[5]  
Chen Ci, 2021, Journal of Guangdong University of Technology, V38, P29, DOI 10.12052/gdutxb.210105
[6]   Homotopic policy iteration-based learning design for unknown linear continuous-time systemsx2729; [J].
Chen, Ci ;
Lewis, Frank L. ;
Li, Bo .
AUTOMATICA, 2022, 138
[7]   Off-policy learning for adaptive optimal output synchronization of heterogeneous multi-agent systems [J].
Chen, Ci ;
Lewis, Frank L. ;
Xie, Kan ;
Xie, Shengli ;
Liu, Yilu .
AUTOMATICA, 2020, 119
[8]   Reinforcement Learning-Based Adaptive Optimal Exponential Tracking Control of Linear Systems With Unknown Dynamics [J].
Chen, Ci ;
Modares, Hamidreza ;
Xie, Kan ;
Lewis, Frank L. ;
Wan, Yan ;
Xie, Shengli .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (11) :4423-4438
[9]   NEURAL NETWORKS FOR SOLVING SYSTEMS OF LINEAR-EQUATIONS AND RELATED PROBLEMS [J].
CICHOCKI, A ;
UNBEHAUEN, R .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-FUNDAMENTAL THEORY AND APPLICATIONS, 1992, 39 (02) :124-138
[10]  
Fan JL, 2018, INT C ADV MECH SYST, P207, DOI 10.1109/ICAMechS.2018.8506843