A Note on State Parameterizations in Output Feedback Reinforcement Learning Control of Linear Systems

被引:6
作者
Rizvi, Syed Ali Asad [1 ]
Lin, Zongli [2 ]
机构
[1] Tennessee Technol Univ, Dept Elect & Comp Engn, Cookeville, TN 38505 USA
[2] Univ Virginia, Charles L Brown Dept Elect & Comp Engn, Charlottesville, VA 22904 USA
关键词
Output feedback; State feedback; Convergence; Observers; Q-learning; Regulators; Observability; Adaptive dynamic programming; optimal control; output feedback control; reinforcement learning (RL); state parameterization; ZERO-SUM GAMES; TRACKING CONTROL;
D O I
10.1109/TAC.2022.3228969
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This note presents an analysis of the state parameterizations used in output feedback reinforcement learning (RL) control. Output feedback algorithms based on state parameterization involve additional conditions on the state parameterization beyond the standard conditions on the system matrices for their convergence to the optimal solution. It is shown that the state parameterization matrix needs to be of full row rank to guarantee the convergence of the output feedback RL algorithms. We present conditions in terms of the system matrices and the user-defined observer dynamics that ensure full row rank of the state parameterization matrix.
引用
收藏
页码:6200 / 6207
页数:8
相关论文
共 30 条
[1]  
BRADTKE SJ, 1994, PROCEEDINGS OF THE 1994 AMERICAN CONTROL CONFERENCE, VOLS 1-3, P3475
[2]  
Bu JJ, 2019, Arxiv, DOI arXiv:1911.04672
[3]   Off-policy learning for adaptive optimal output synchronization of heterogeneous multi-agent systems [J].
Chen, Ci ;
Lewis, Frank L. ;
Xie, Kan ;
Xie, Shengli ;
Liu, Yilu .
AUTOMATICA, 2020, 119
[4]   Reinforcement Learning-Based Adaptive Optimal Exponential Tracking Control of Linear Systems With Unknown Dynamics [J].
Chen, Ci ;
Modares, Hamidreza ;
Xie, Kan ;
Lewis, Frank L. ;
Wan, Yan ;
Xie, Shengli .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (11) :4423-4438
[5]  
Fazel M, 2018, PR MACH LEARN RES, V80
[6]  
Gang Tao, 2003, Adaptive Control Design and Analysis, V37
[7]   Adaptive Optimal Output Regulation of Time-Delay Systems via Measurement Feedback [J].
Gao, Weinan ;
Jiang, Zhong-Ping .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (03) :938-945
[8]  
Gao WN, 2016, IEEE DECIS CONTR P, P5845, DOI 10.1109/CDC.2016.7799168
[9]   Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming [J].
Gao, Weinan ;
Jiang, Yu ;
Jiang, Zhong-Ping ;
Chai, Tianyou .
AUTOMATICA, 2016, 72 :37-45
[10]   Policy Iteration for Linear Quadratic Games With Stochastic Parameters [J].
Gravell, Benjamin ;
Ganapathy, Karthik ;
Summers, Tyler .
IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (01) :307-312