Adaptive Suboptimal Output-Feedback Control for Linear Systems Using Integral Reinforcement Learning

被引：86

作者：

Zhu, Lemei M. ^{[1
]}

Modares, Hamidreza ^{[2
]}

Peen, Gan Oon ^{[3
]}

Lewis, Frank L. ^{[2
,4
]}

Yue, Baozeng ^{[5
]}

机构：

[1] North China Inst Sci & Technol, Dept Basic, Beijing 101601, Hebei, Peoples R China

[2] Univ Texas, Arlington Res Inst, Ft Worth, TX 76118 USA

[3] Singapore Inst Mfg Technol, Singapore 638075, Singapore

[4] Northeastern Univ, State Key Lab Synthet Proc Automat, Shenyang 110036, Peoples R China

[5] Beijing Inst Technol, Sch Aerosp Engn, Dept Mech, Beijing 100081, Peoples R China

来源：

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY | 2015年 / 23卷 / 01期

基金：

美国国家科学基金会; 中国国家自然科学基金;

关键词：

Integral reinforcement learning (IRL); linear continuous-time (CT) systems; optimal control; output feedback; DISCRETE-TIME-SYSTEMS; POLICY ITERATION; TRACKING CONTROL; ALGORITHM;

D O I：

10.1109/TCST.2014.2322778

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement learning (RL) techniques have been successfully used to find optimal state-feedback controllers for continuous-time (CT) systems. However, in most real-world control applications, it is not practical to measure the system states and it is desirable to design output-feedback controllers. This paper develops an online learning algorithm based on the integral RL (IRL) technique to find a suboptimal output-feedback controller for partially unknown CT linear systems. The proposed IRL-based algorithm solves an IRL Bellman equation in each iteration online in real time to evaluate an output-feedback policy and updates the output-feedback gain using the information given by the evaluated policy. The knowledge of the system drift dynamics is not required by the proposed method. An adaptive observer is used to provide the knowledge of the full states for the IRL Bellman equation during learning. However, the observer is not needed after the learning process is finished. The convergence of the proposed algorithm to a suboptimal output-feedback solution and the performance of the proposed method are verified through simulation on two real-world applications, namely, the X-Y table and the F-16 aircraft.

引用

页码：264 / 273

页数：10

共 37 条

[1] A stable neural network-based observer with application to flexible-joint manipulators [J].

Abdollahi, F ;

Talebi, HA ;

Patel, RV .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 2006, 17 (01) :118-129

[2]

Abdollahi F., 2006, Proc. 9th Int. Conf. Neural Inform. Process, P1910

[3] Issues on stability of ADP feedback controllers for dynamical systems [J].

Balakrishnan, S. N. ;

Ding, Jie ;

Lewis, Frank L. .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04) :913-917

[4]

Bertsekas D., 2005, Dynamic Programming and Optimal Control. Athena Scientific optimization and computation series

[5]

BRADTKE SJ, 1994, PROCEEDINGS OF THE 1994 AMERICAN CONTROL CONFERENCE, VOLS 1-3, P3475

[6] Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update [J].

Dierks, Travis ;

Jagannathan, Sarangapani .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2012, 23 (07) :1118-1129

[7] Point-to-point motion control for a high-acceleration positioning table via cascaded learning schemes [J].

Ding, Han ;

Wu, Jianhua .

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2007, 54 (05) :2735-2744

[8] Necessary and sufficient conditions for H-∞ static output-feedback control [J].

Gadewadikar, Jyotirmay ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2006, 29 (04) :915-920

[9] A three-network architecture for on-line learning and optimization based on adaptive dynamic programming [J].

He, Haibo ;

Ni, Zhen ;

Fu, Jian .

NEUROCOMPUTING, 2012, 78 (01) :3-13

[10]

Howerd R. A., 1960, DYNAMIC PROGRAMMING

← 1 2 3 4 →