Adaptive Suboptimal Output-Feedback Control for Linear Systems Using Integral Reinforcement Learning

被引：86

作者：

Zhu, Lemei M. ^{[1
]}

Modares, Hamidreza ^{[2
]}

Peen, Gan Oon ^{[3
]}

Lewis, Frank L. ^{[2
,4
]}

Yue, Baozeng ^{[5
]}

机构：

[1] North China Inst Sci & Technol, Dept Basic, Beijing 101601, Hebei, Peoples R China

[2] Univ Texas, Arlington Res Inst, Ft Worth, TX 76118 USA

[3] Singapore Inst Mfg Technol, Singapore 638075, Singapore

[4] Northeastern Univ, State Key Lab Synthet Proc Automat, Shenyang 110036, Peoples R China

[5] Beijing Inst Technol, Sch Aerosp Engn, Dept Mech, Beijing 100081, Peoples R China

来源：

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY | 2015年 / 23卷 / 01期

基金：

美国国家科学基金会; 中国国家自然科学基金;

关键词：

Integral reinforcement learning (IRL); linear continuous-time (CT) systems; optimal control; output feedback; DISCRETE-TIME-SYSTEMS; POLICY ITERATION; TRACKING CONTROL; ALGORITHM;

D O I：

10.1109/TCST.2014.2322778

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement learning (RL) techniques have been successfully used to find optimal state-feedback controllers for continuous-time (CT) systems. However, in most real-world control applications, it is not practical to measure the system states and it is desirable to design output-feedback controllers. This paper develops an online learning algorithm based on the integral RL (IRL) technique to find a suboptimal output-feedback controller for partially unknown CT linear systems. The proposed IRL-based algorithm solves an IRL Bellman equation in each iteration online in real time to evaluate an output-feedback policy and updates the output-feedback gain using the information given by the evaluated policy. The knowledge of the system drift dynamics is not required by the proposed method. An adaptive observer is used to provide the knowledge of the full states for the IRL Bellman equation during learning. However, the observer is not needed after the learning process is finished. The convergence of the proposed algorithm to a suboptimal output-feedback solution and the performance of the proposed method are verified through simulation on two real-world applications, namely, the X-Y table and the F-16 aircraft.

引用

页码：264 / 273

页数：10

共 37 条

[21]

Modares H., IEEE T AUTO IN PRESS

[22] Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems [J].

Modares, Hamidreza ;

Lewis, Frank L. ;

Naghibi-Sistani, Mohammad-Bagher .

AUTOMATICA, 2014, 50 (01) :193-202

[23] CONVERGENCE OF A NUMERICAL ALGORITHM FOR CALCULATING OPTIMAL OUTPUT-FEEDBACK GAINS [J].

MOERDER, DD ;

CALISE, AJ .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1985, 30 (09) :900-903

[24] Adaptive Learning in Tracking Control Based on the Dual Critic Network Design [J].

Ni, Zhen ;

He, Haibo ;

Wen, Jinyu .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (06) :913-928

[25]

Powell WB, 2007, APPROXIMATE DYNAMIC PROGRAMMING: SOLVING THE CURSES OF DIMENSIONALITY, P1, DOI 10.1002/9780470182963

[26] On-line learning control by association and reinforcement [J].

Si, J ;

Wang, YT .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 2001, 12 (02) :264-276

[27]

Stevens B. L., 2003, AIRCRAFT CONTROL SIM

[28]

Sutton RS, 2018, ADAPT COMPUT MACH LE, P1

[29] Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem [J].

Vamvoudakis, Kyriakos G. ;

Lewis, Frank L. .

AUTOMATICA, 2010, 46 (05) :878-888

[30] Adaptive optimal control for continuous-time linear systems based on policy iteration [J].

Vrabie, D. ;

Pastravanu, O. ;

Abu-Khalaf, M. ;

Lewis, F. L. .

AUTOMATICA, 2009, 45 (02) :477-484

← 1 2 3 4 →