Regret Analysis of Certainty Equivalence Policies in Continuous-Time Linear-Quadratic Systems

被引:0
作者
Faradonbeh, Mohamad Kazem Shirani [1 ]
机构
[1] Univ Georgia, Dept Stat, Athens, GA 30605 USA
来源
2022 26TH INTERNATIONAL CONFERENCE ON SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC) | 2022年
关键词
Adaptive control; Reinforcement learning; Optimal policies; Stochastic differential equations; Regret bounds; Learning-based control; CONSISTENCY;
D O I
10.1109/ICSTCC55426.2022.9931839
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This work theoretically studies a ubiquitous reinforcement learning policy for controlling the canonical model of continuous-time stochastic linear-quadratic systems. We show that randomized certainty equivalent policy addresses the exploration-exploitation dilemma in linear control systems that evolve according to unknown stochastic differential equations and their operating cost is quadratic. More precisely, we establish square-root of time regret bounds, indicating that randomized certainty equivalent policy learns optimal control actions fast from a single state trajectory. Further, linear scaling of the regret with the number of parameters is shown. The presented analysis introduces novel and useful technical approaches, and sheds light on fundamental challenges of continuous-time reinforcement learning.
引用
收藏
页码:368 / 373
页数:6
相关论文
共 31 条