Regret Analysis of Certainty Equivalence Policies in Continuous-Time Linear-Quadratic Systems

被引：0

作者：

Faradonbeh, Mohamad Kazem Shirani ^{[1
]}

机构：

[1] Univ Georgia, Dept Stat, Athens, GA 30605 USA

来源：

2022 26TH INTERNATIONAL CONFERENCE ON SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC) | 2022年

关键词：

Adaptive control; Reinforcement learning; Optimal policies; Stochastic differential equations; Regret bounds; Learning-based control; CONSISTENCY;

D O I：

10.1109/ICSTCC55426.2022.9931839

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This work theoretically studies a ubiquitous reinforcement learning policy for controlling the canonical model of continuous-time stochastic linear-quadratic systems. We show that randomized certainty equivalent policy addresses the exploration-exploitation dilemma in linear control systems that evolve according to unknown stochastic differential equations and their operating cost is quadratic. More precisely, we establish square-root of time regret bounds, indicating that randomized certainty equivalent policy learns optimal control actions fast from a single state trajectory. Further, linear scaling of the regret with the number of parameters is shown. The presented analysis introduces novel and useful technical approaches, and sheds light on fundamental challenges of continuous-time reinforcement learning.

引用

页码：368 / 373

页数：6

共 31 条

[1] Abbasi-Yadkori Y., 2011, P 24 ANN C LEARNING, P1
[2] Basei Matteo, 2021, Logarithmic regret for episodic continuoustime linear-quadratic reinforcement learning over a finite-time horizon
[3] Adaptive Dynamic Programming for Stochastic Systems With State and Control Dependent Noise
Bian, Tao
Jiang, Yu
Jiang, Zhong-Ping
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2016, 61 (12) : 4170 - 4175
[4] Bosworth J. T, 1992, Linearized aerodynamic and control law models of the X-29A airplane and comparison with flight data, V4356
[5] CONTINUOUS-TIME STOCHASTIC ADAPTIVE-CONTROL - NONEXPLOSION, EPSILON-CONSISTENCY AND STABILITY
CAINES, PE
[J]. SYSTEMS & CONTROL LETTERS, 1992, 19 (03) : 169 - 176
[6] STOCHASTIC ε-OPTIMAL LINEAR QUADRATIC ADAPTATION: AN ALTERNATING CONTROLS POLICY
Caines, Peter E.
Levanony, David
[J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2019, 57 (02) : 1094 - 1126
[7] Cassel A., 2020, PR MACH LEARN RES, P1328
[8] Chen Xinyi, 2021, P MACHINE LEARNING R, V134
[9] Reinforcement learning in continuous time and space
Doya, K
[J]. NEURAL COMPUTATION, 2000, 12 (01) : 219 - 245
[10] Adaptive continuous-time linear quadratic Gaussian control
Duncan, TE
Guo, L
Pasik-Duncan, B
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1999, 44 (09) : 1653 - 1662

← 1 2 3 4 →