H∞ Optimal Control of Unknown Linear Discrete-time Systems: An Off-policy Reinforcement Learning Approach

被引：0

作者：

Kiumarsi, Bahare ^{[1
]}

Modares, Hamidreza ^{[1
]}

Lewis, Frank L. ^{[1
]}

Jiang, Zhong-Ping ^{[2
]}

机构：

[1] Univ Texas Arlington, UTARI, Ft Worth, TX 76118 USA

[2] NYU, Control & Networks Lab, Dept Elect & Comp Engn, Polytech Sch Engn, Brooklyn, NY 11201 USA

来源：

PROCEEDINGS OF THE 2015 7TH IEEE INTERNATIONAL CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS (CIS) AND ROBOTICS, AUTOMATION AND MECHATRONICS (RAM) | 2015年

关键词：

H-infinity control; reinforcement learning; off-policy; game algebraic Riccati equation; FEEDBACK;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper proposes a model-free Hos control design for linear discrete-time systems using reinforcement learning (RL). A novel off-policy RL algorithm is used to solve the game algebraic Riccati equation (GARE) online using the measured data along the system trajectories. The proposed RL algorithm has the following advantages compared to existing model-free RL methods for solving H-infinity control problem: 1) It is data efficient and fast since a stream of experiences which is obtained from executing a fixed behavioral policy is reused to update many value functions correspond to different leaning policies sequentially. 2) The disturbance input does not need to be adjusted in a specific manner. 3) There is no bias as a result of adding a probing noise to the control input to maintain persistence of excitation conditions. A simulation example is used to verify the effectiveness of the proposed control scheme.

引用

页码：41 / 46

页数：6

共 16 条

[1] Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control [J].

Al-Tamimi, Asma ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

AUTOMATICA, 2007, 43 (03) :473-481

[2]

[Anonymous], 1996, Neuro-dynamic programming

[3]

Basar T., 1999, SIAMS CLASSIC APPL M, V23

[4]

Basar T., 1995, OPTIMAL CONTROL RELA

[5] H-INFINITY CONTROL VIA MEASUREMENT FEEDBACK FOR AFFINE NONLINEAR-SYSTEMS [J].

ISIDOR, A .

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 1994, 4 (04) :553-574

[6] Global L2-gain design for a class of nonlinear systems [J].

Isidori, A ;

Lin, W .

SYSTEMS & CONTROL LETTERS, 1998, 34 (05) :295-302

[7] Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics [J].

Jiang, Yu ;

Jiang, Zhong-Ping .

AUTOMATICA, 2012, 48 (10) :2699-2704

[8] Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics [J].

Kiumarsi, Bahare ;

Lewis, Frank L. ;

Modares, Hamidreza ;

Karimpour, Ali ;

Naghibi-Sistani, Mohammad-Bagher .

AUTOMATICA, 2014, 50 (04) :1167-1175

[9]

Lewis F. L., 2012, OPTIMAL CONTROL

[10] Integral Reinforcement Learning for Linear Continuous-Time Zero-Sum Games With Completely Unknown Dynamics [J].

Li, Hongliang ;

Liu, Derong ;

Wang, Ding .

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2014, 11 (03) :706-714

← 1 2 →