Adaptive optimal output tracking of continuous-time systems via output-feedback-based reinforcement learning

被引：36

作者：

Chen, Ci ^{[1
,2
]}

Xie, Lihua ^{[3
]}

Xie, Kan ^{[1
,4
]}

Lewis, Frank L. ^{[5
]}

Xie, Shengli ^{[6
,7
]}

机构：

[1] Guangdong Univ Technol, Sch Automat, Guangzhou, Peoples R China

[2] Guangdong Key Lab IoT Informat Technol, Guangzhou, Peoples R China

[3] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore, Singapore

[4] 111 Ctr Intelligent Batch Mfg Based IoT Technol, Guangzhou, Peoples R China

[5] Univ Texas Arlington, UTA Res Inst, Ft Worth, TX USA

[6] Minist Educ, Key Lab Intelligent Informat Proc & Syst Integrat, Guangzhou, Peoples R China

[7] Guangdong HongKong Macao Joint Lab Smart Discrete, Guangzhou, Peoples R China

来源：

AUTOMATICA | 2022年 / 146卷

基金：

中国国家自然科学基金;

关键词：

Reinforcement learning; Off-policy; Output tracking; Output feedback; Adaptive optimal control; LINEAR-SYSTEMS;

D O I：

10.1016/j.automatica.2022.110581

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement learning provides a powerful tool for designing a satisfactory controller through interactions with the environment. Although off-policy learning algorithms were recently designed for tracking problems, most of these results either are full-state feedback or have bounded control errors, which may not be flexible or desirable for engineering problems in the real world. To address these problems, we propose an output-feedback-based reinforcement learning approach that allows us to find the optimal control solution using input-output data and ensure the asymptotic tracking control of continuous-time systems. More specifically, we first propose a dynamical controller revised from the standard output regulation theory and use it to formulate an optimal output tracking problem. Then, a state observer is used to re-express the system state. Consequently, we address the rank issue of the parameterization matrix and analyze the state re-expression error that are crucial for transforming the off-policy learning into an output-feedback form. A comprehensive simulation study is given to demonstrate the effectiveness of the proposed approach.(c) 2022 Elsevier Ltd. All rights reserved.

引用

页数：14

共 48 条

[1]

[Anonymous], 2020, Trans. Cybern.

[2]

[Anonymous], 2016, Introduction to linear algebra

[3]

Bertsekas DP, 1995, Dynamic Programming and Optimal Control, V1

[4]

Chen C.-T., 1999, The Oxford Series in Electrical and Computer Engineering, V3rd

[5]

Chen Ci, 2021, Journal of Guangdong University of Technology, V38, P29, DOI 10.12052/gdutxb.210105

[6] Homotopic policy iteration-based learning design for unknown linear continuous-time systemsx2729; [J].

Chen, Ci ;

Lewis, Frank L. ;

Li, Bo .

AUTOMATICA, 2022, 138

[7] Off-policy learning for adaptive optimal output synchronization of heterogeneous multi-agent systems [J].

Chen, Ci ;

Lewis, Frank L. ;

Xie, Kan ;

Xie, Shengli ;

Liu, Yilu .

AUTOMATICA, 2020, 119

[8] Reinforcement Learning-Based Adaptive Optimal Exponential Tracking Control of Linear Systems With Unknown Dynamics [J].

Chen, Ci ;

Modares, Hamidreza ;

Xie, Kan ;

Lewis, Frank L. ;

Wan, Yan ;

Xie, Shengli .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (11) :4423-4438

[9] NEURAL NETWORKS FOR SOLVING SYSTEMS OF LINEAR-EQUATIONS AND RELATED PROBLEMS [J].

CICHOCKI, A ;

UNBEHAUEN, R .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-FUNDAMENTAL THEORY AND APPLICATIONS, 1992, 39 (02) :124-138

[10]

Fan JL, 2018, INT C ADV MECH SYST, P207, DOI 10.1109/ICAMechS.2018.8506843

← 1 2 3 4 5 →