Model-Free Linear Quadratic Control via Reduction to Expert Prediction

被引:0
作者
Abbasi-Yadkori, Yasin [1 ]
Lazic, Nevena [2 ]
Szepesvari, Csaba [3 ]
机构
[1] Adobe Res, San Jose, CA 95110 USA
[2] Google Brain, Mountain View, CA USA
[3] Deepmind, London, England
来源
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89 | 2019年 / 89卷
关键词
ADAPTIVE-CONTROL; IDENTIFICATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model-free approaches for reinforcement learning (RL) and continuous control find policies based only on past states and rewards, without fitting a model of the system dynamics. They are appealing as they are general purpose and easy to implement; however, they also come with fewer theoretical guarantees than model-based RE In this work, we present a new model-free algorithm for controlling linear quadratic (LQ) systems, and show that its regret scales as O(T xi+2/3) for any small xi > 0 if time horizon satisfies T > C-1/xi for a constant C. The algorithm is based on a reduction of control of Markov decision processes to an expert prediction problem. In practice, it corresponds to a variant of policy iteration with forced exploration, where the policy in each phase is greedy with respect to the average of all previous value functions. This is the first model-free algorithm for adaptive control of LQ systems that provably achieves sublinear regret and has a polynomial computation cost. Empirically, our algorithm dramatically outperforms standard policy iteration, but performs worse than a model-based approach.
引用
收藏
页数:10
相关论文
共 48 条
[1]  
Abbasi-Yadkori Y, 2014, PR MACH LEARN RES, V32
[2]  
Abbasi-Yadkori Y, 2015, UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, P2
[3]  
Abbasi-Yadkori Yasin, 2011, JMLR WORKSHOP C P, P1
[4]  
Abeille Marc, 2017, AISTATS
[5]  
[Anonymous], THESIS
[6]  
[Anonymous], 2012, ADV NEURAL INFORM PR
[7]  
[Anonymous], ARXIV170904047
[8]   Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path [J].
Antos, Andras ;
Szepesvari, Csaba ;
Munos, Remi .
MACHINE LEARNING, 2008, 71 (01) :89-129
[9]  
Arora Sanjeev, 2018, INT C LEARN REPR, DOI 10.48550/arXiv.1706.08224
[10]  
Bastani H., 2015, SSRN