Output Feedback Reinforcement Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem

被引:0
作者
Rizvi, Syed Ali Asad [1 ]
Lin, Zongli [1 ]
机构
[1] Univ Virginia, Dept Elect & Comp Engn, Charlottesville, VA 22904 USA
来源
2017 IEEE 56TH ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC) | 2017年
关键词
Reinforcement learning; Q-learning; LQR; output feedback;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning belongs to a class of artificial intelligence algorithms which can be used to design adaptive optimal controllers learned online. These methods have mostly been based on state feedback, which limits their application in practical scenarios. In this paper, we present an output feedback Q-learning algorithm to solve the discrete-time linear quadratic regulator (LQR) problem. An output feedback Q-learning scheme is proposed that learns the optimal controller online without requiring any knowledge of system dynamics, making it completely model-free. Both policy iteration (PI) and value iteration (VI) algorithms are developed, where the later does not require an initially stabilizing policy. The convergence of these algorithms has been shown. The proposed method does not require a discounting factor which is typically introduced in the cost function to trade-off between the excitation noise bias and system stability. The method is therefore exact and converges to the actual LQR control solution obtained by solving the Riccati equation. Simulation results have been used to show the effectiveness of the scheme.
引用
收藏
页数:6
相关论文
共 12 条
  • [1] [Anonymous], 1995, Algebraic Riccati Equations
  • [2] BRADTKE SJ, 1994, PROCEEDINGS OF THE 1994 AMERICAN CONTROL CONFERENCE, VOLS 1-3, P3475
  • [4] Optimal Tracking Control of Unknown Discrete-Time Linear Systems Using Input-Output Measured Data
    Kiumarsi, Bahare
    Lewis, Frank L.
    Naghibi-Sistani, Mohammad-Bagher
    Karimpour, Ali
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (12) : 2770 - 2779
  • [5] Landelius T., 1997, THESIS
  • [6] Lewis F., 1995, Optimal control
  • [7] Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data
    Lewis, F. L.
    Vamvoudakis, Kyriakos G.
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2011, 41 (01): : 14 - 25
  • [8] Reinforcement Learning and Feedback Control USING NATURAL DECISION METHODS TO DESIGN OPTIMAL ADAPTIVE CONTROLLERS
    Lewis, Frank L.
    Vrabie, Draguna
    Vamvoudakis, Kyriakos G.
    [J]. IEEE CONTROL SYSTEMS MAGAZINE, 2012, 32 (06): : 76 - 105
  • [9] Optimal Output-Feedback Control of Unknown Continuous-Time Linear Systems Using Off-policy Reinforcement Learning
    Modares, Hamidreza
    Lewis, Frank L.
    Jiang, Zhong-Ping
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (11) : 2401 - 2410
  • [10] Stability Analysis of Discrete-Time Infinite-Horizon Optimal Control With Discounted Cost
    Postoyan, Romain
    Busoniu, Lucian
    Nesic, Dragan
    Daafouz, Jamal
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (06) : 2736 - 2749