Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem

被引：69

作者：

Rizvi, Syed Ali Asad ^{[1
]}

Lin, Zongli ^{[1
]}

机构：

[1] Univ Virginia, Charles L Brown Dept Elect & Comp Engn, Charlottesville, VA 22904 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2019年 / 30卷 / 05期

关键词：

Approximate dynamic programming (ADP); linear quadratic regulation (LQR); output feedback; Q-learning; reinforcement learning (RL); SYSTEMS;

D O I：

10.1109/TNNLS.2018.2870075

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Approximate dynamic programming (ADP) and reinforcement learning (RL) have emerged as important tools in the design of optimal and adaptive control systems. Most of the existing RL and ADP methods make use of full-state feedback, a requirement that is often difficult to satisfy in practical applications. As a result, output feedback methods are more desirable as they relax this requirement. In this paper, we present a new output feedback-based Q-learning approach to solving the linear quadratic regulation (LQR) control problem for discrete-time systems. The proposed scheme is completely online in nature and works without requiring the system dynamics information. More specifically, a new representation of the LQR Q-function is developed in terms of the input-output data. Based on this new Q-function representation, output feedback LQR controllers are designed. We present two output feedback iterative Q-learning algorithms based on the policy iteration and the value iteration methods. This scheme has the advantage that it does not incur any excitation noise bias, and therefore, the need of using discounted cost functions is circumvented, which in turn ensures closed-loop stability. It is shown that the proposed algorithms converge to the solution of the LQR Riccati equation. A comprehensive simulation study is carried out, which illustrates the proposed scheme.

引用

页码：1523 / 1536

页数：14

共 37 条

[1] Data-based optimal control
Aangenent, W
Kostic, D
de Jager, B
van de Molengraft, R
Steinbuch, M
[J]. ACC: PROCEEDINGS OF THE 2005 AMERICAN CONTROL CONFERENCE, VOLS 1-7, 2005, : 1460 - 1465
[2] [Anonymous], 2006, ADAPTIVE CONTROL TUT, DOI [DOI 10.1137/1.9780898718652, 10.1137/1.9780898718652]
[3] Bertsekas D. P., 1996, Neuro-Dynamic Programming
[4] BRADTKE SJ, 1994, PROCEEDINGS OF THE 1994 AMERICAN CONTROL CONFERENCE, VOLS 1-3, P3475
[5] Adaptive Event-Triggered Control Based on Heuristic Dynamic Programming for Nonlinear Discrete-Time Systems
Dong, Lu
Zhong, Xiangnan
Sun, Changyin
He, Haibo
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (07) : 1594 - 1605
[6] Gao WN, 2016, P AMER CONTR CONF, P2512, DOI 10.1109/ACC.2016.7525294
[7] Reinforcement learning-based output feedback control of nonlinear systems with input constraints
He, P
Jagannathan, S
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2005, 35 (01): : 150 - 154
[8] ITERATIVE TECHNIQUE FOR COMPUTATION OF STEADY STATE GAINS FOR DISCRETE OPTIMAL REGULATOR
HEWER, GA
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1971, AC16 (04) : 382 - +
[9] Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics
Jiang, Yu
Jiang, Zhong-Ping
[J]. AUTOMATICA, 2012, 48 (10) : 2699 - 2704
[10] Reinforcement learning and optimal adaptive control: An overview and implementation examples
Khan, Said G.
Herrmann, Guido
Lewis, Frank L.
Pipe, Tony
Melhuish, Chris
[J]. ANNUAL REVIEWS IN CONTROL, 2012, 36 (01) : 42 - 59

← 1 2 3 4 →