Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems

被引：4

作者：

Luo, Yuwei ^{[1
]}

Gupta, Varun ^{[2
]}

Kolar, Mladen ^{[2
]}

机构：

[1] Stanford Univ, 655 Knight Way, Stanford, CA 94305 USA

[2] Univ Chicago, 5807 S Woodlawn Ave, Chicago, IL 60637 USA

来源：

PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS | 2022年 / 6卷 / 01期

关键词：

Linear Quadratic Regulator; dynamic regret; non-stationary learning; ordinary least squares estimator; TIME;

D O I：

10.1145/3508029

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon T with fixed and known cost matrices Q, R, but unknown and non-stationary dynamics {A(t), B-t}. The sequence of dynamics matrices can be arbitrary, but with a total variation, V-T, assumed to be o(T) and unknown to the controller. Under the assumption that a sequence of stabilizing, but potentially sub-optimal controllers is available for all t, we present an algorithm that achieves the optimal dynamic regret of (O) over tilde ((VTT3/5)-T-2/5). With piecewise constant dynamics, our algorithm achieves the optimal regret of (O) over tilde(root ST) where S is the number of switches. The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems. We also argue that non-adaptive forgetting (e.g., restarting or using sliding window learning with a static window size) may not be regret optimal for the LQR problem, even when the window size is optimally tuned with the knowledge of V-T. The main technical challenge in the analysis of our algorithm is to prove that the ordinary least squares (OLS) estimator has a small bias when the parameter to be estimated is non-stationary. Our analysis also highlights that the key motif driving the regret is that the LQR problem is in spirit a bandit problem with linear feedback and locally quadratic cost. This motif is more universal than the LQR problem itself, and therefore we believe our results should find wider application.

引用

页数：72

共 50 条

[1] Lorentzian non-stationary dynamical systems
Molaei, Mohammad Reza
Khajoei, Najmeh
JOURNAL OF INTERDISCIPLINARY MATHEMATICS, 2022, 25 (08) : 2127 - 2140
[2] A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits
Abbasi-Yadkori, Yasin
Gyorgy, Andraes
Lazic, Nevena
JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[3] Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits
Saha, Aadirupa
Gupta, Shubham
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 19027 - 19049
[4] Stability study of non-stationary homogeneous dynamical systems
Wafa, Belhedi
Salwa, Elloumi
2018 15TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS AND DEVICES (SSD), 2018, : 362 - 365
[5] A scaling property in the non-stationary dissipative dynamical systems
Fang, HP
COMMUNICATIONS IN THEORETICAL PHYSICS, 2001, 35 (02) : 151 - 152
[6] THE ACCUMULATION OF PERTURBATIONS IN NON-STATIONARY LINEAR SYSTEMS
ROITENBERG, JN
DOKLADY AKADEMII NAUK SSSR, 1958, 121 (02): : 221 - 224
[7] LINEAR SYSTEMS WITH NON-STATIONARY RANDOM INPUTS
GAONKAR, GH
INTERNATIONAL JOURNAL OF CONTROL, 1971, 14 (01) : 161 - &
[8] Achieving Optimal Dynamic Regret for Non-stationary Bandits without Prior Information
Auer, Peter
Chen, Yifang
Gajane, Pratik
Lee, Chung-Wei
Luo, Haipeng
Ortner, Ronald
Wei, Chen-Yu
CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
[9] ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits
Buening, Thomas Kleine
Saha, Aadirupa
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
[10] Solutions of Non-stationary Dynamic Problems of Linear Viscoelasticity
E. A. Korovaytseva
S. G. Pshenichnov
Lobachevskii Journal of Mathematics, 2019, 40 : 328 - 334

← 1 2 3 4 5 →