Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems

被引:4
|
作者
Luo, Yuwei [1 ]
Gupta, Varun [2 ]
Kolar, Mladen [2 ]
机构
[1] Stanford Univ, 655 Knight Way, Stanford, CA 94305 USA
[2] Univ Chicago, 5807 S Woodlawn Ave, Chicago, IL 60637 USA
关键词
Linear Quadratic Regulator; dynamic regret; non-stationary learning; ordinary least squares estimator; TIME;
D O I
10.1145/3508029
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon T with fixed and known cost matrices Q, R, but unknown and non-stationary dynamics {A(t), B-t}. The sequence of dynamics matrices can be arbitrary, but with a total variation, V-T, assumed to be o(T) and unknown to the controller. Under the assumption that a sequence of stabilizing, but potentially sub-optimal controllers is available for all t, we present an algorithm that achieves the optimal dynamic regret of (O) over tilde ((VTT3/5)-T-2/5). With piecewise constant dynamics, our algorithm achieves the optimal regret of (O) over tilde(root ST) where S is the number of switches. The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems. We also argue that non-adaptive forgetting (e.g., restarting or using sliding window learning with a static window size) may not be regret optimal for the LQR problem, even when the window size is optimally tuned with the knowledge of V-T. The main technical challenge in the analysis of our algorithm is to prove that the ordinary least squares (OLS) estimator has a small bias when the parameter to be estimated is non-stationary. Our analysis also highlights that the key motif driving the regret is that the LQR problem is in spirit a bandit problem with linear feedback and locally quadratic cost. This motif is more universal than the LQR problem itself, and therefore we believe our results should find wider application.
引用
收藏
页数:72
相关论文
共 50 条
  • [1] Lorentzian non-stationary dynamical systems
    Molaei, Mohammad Reza
    Khajoei, Najmeh
    JOURNAL OF INTERDISCIPLINARY MATHEMATICS, 2022, 25 (08) : 2127 - 2140
  • [2] A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits
    Abbasi-Yadkori, Yasin
    Gyorgy, Andraes
    Lazic, Nevena
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [3] Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits
    Saha, Aadirupa
    Gupta, Shubham
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 19027 - 19049
  • [4] Stability study of non-stationary homogeneous dynamical systems
    Wafa, Belhedi
    Salwa, Elloumi
    2018 15TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS AND DEVICES (SSD), 2018, : 362 - 365
  • [5] A scaling property in the non-stationary dissipative dynamical systems
    Fang, HP
    COMMUNICATIONS IN THEORETICAL PHYSICS, 2001, 35 (02) : 151 - 152
  • [6] THE ACCUMULATION OF PERTURBATIONS IN NON-STATIONARY LINEAR SYSTEMS
    ROITENBERG, JN
    DOKLADY AKADEMII NAUK SSSR, 1958, 121 (02): : 221 - 224
  • [7] LINEAR SYSTEMS WITH NON-STATIONARY RANDOM INPUTS
    GAONKAR, GH
    INTERNATIONAL JOURNAL OF CONTROL, 1971, 14 (01) : 161 - &
  • [8] Achieving Optimal Dynamic Regret for Non-stationary Bandits without Prior Information
    Auer, Peter
    Chen, Yifang
    Gajane, Pratik
    Lee, Chung-Wei
    Luo, Haipeng
    Ortner, Ronald
    Wei, Chen-Yu
    CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [9] ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits
    Buening, Thomas Kleine
    Saha, Aadirupa
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [10] Solutions of Non-stationary Dynamic Problems of Linear Viscoelasticity
    E. A. Korovaytseva
    S. G. Pshenichnov
    Lobachevskii Journal of Mathematics, 2019, 40 : 328 - 334