Output Feedback Reinforcement Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem

被引：0

作者：

Rizvi, Syed Ali Asad ^{[1
]}

Lin, Zongli ^{[1
]}

机构：

[1] Univ Virginia, Dept Elect & Comp Engn, Charlottesville, VA 22904 USA

来源：

2017 IEEE 56TH ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC) | 2017年

关键词：

Reinforcement learning; Q-learning; LQR; output feedback;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement learning belongs to a class of artificial intelligence algorithms which can be used to design adaptive optimal controllers learned online. These methods have mostly been based on state feedback, which limits their application in practical scenarios. In this paper, we present an output feedback Q-learning algorithm to solve the discrete-time linear quadratic regulator (LQR) problem. An output feedback Q-learning scheme is proposed that learns the optimal controller online without requiring any knowledge of system dynamics, making it completely model-free. Both policy iteration (PI) and value iteration (VI) algorithms are developed, where the later does not require an initially stabilizing policy. The convergence of these algorithms has been shown. The proposed method does not require a discounting factor which is typically introduced in the cost function to trade-off between the excitation noise bias and system stability. The method is therefore exact and converges to the actual LQR control solution obtained by solving the Riccati equation. Simulation results have been used to show the effectiveness of the scheme.

引用

页数：6

共 12 条

[1]

[Anonymous], 1995, Algebraic Riccati Equations

[2]

BRADTKE SJ, 1994, PROCEEDINGS OF THE 1994 AMERICAN CONTROL CONFERENCE, VOLS 1-3, P3475

[3] ITERATIVE TECHNIQUE FOR COMPUTATION OF STEADY STATE GAINS FOR DISCRETE OPTIMAL REGULATOR [J].

HEWER, GA .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1971, AC16 (04) :382-+

[4] Optimal Tracking Control of Unknown Discrete-Time Linear Systems Using Input-Output Measured Data [J].

Kiumarsi, Bahare ;

Lewis, Frank L. ;

Naghibi-Sistani, Mohammad-Bagher ;

Karimpour, Ali .

IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (12) :2770-2779

[5]

Landelius T., 1997, THESIS

[6]

Lewis F., 1995, Optimal control

[7] Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data [J].

Lewis, F. L. ;

Vamvoudakis, Kyriakos G. .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2011, 41 (01) :14-25

[8] Reinforcement Learning and Feedback Control USING NATURAL DECISION METHODS TO DESIGN OPTIMAL ADAPTIVE CONTROLLERS [J].

Lewis, Frank L. ;

Vrabie, Draguna ;

Vamvoudakis, Kyriakos G. .

IEEE CONTROL SYSTEMS MAGAZINE, 2012, 32 (06) :76-105

[9] Optimal Output-Feedback Control of Unknown Continuous-Time Linear Systems Using Off-policy Reinforcement Learning [J].

Modares, Hamidreza ;

Lewis, Frank L. ;

Jiang, Zhong-Ping .

IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (11) :2401-2410

[10] Stability Analysis of Discrete-Time Infinite-Horizon Optimal Control With Discounted Cost [J].

Postoyan, Romain ;

Busoniu, Lucian ;

Nesic, Dragan ;

Daafouz, Jamal .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (06) :2736-2749

← 1 2 →