Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control

被引：180

作者：

Luo, Biao ^{[1
]}

Liu, Derong ^{[2
]}

Wu, Huai-Ning ^{[3
]}

Wang, Ding ^{[1
]}

Lewis, Frank L. ^{[4
,5
]}

机构：

[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China

[2] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China

[3] Beihang Univ, Sci & Technol Aircraft Control Lab, Beijing 100191, Peoples R China

[4] Univ Texas Arlington, Res Inst, Ft Worth, TX 76118 USA

[5] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Liaoning, Peoples R China

来源：

IEEE TRANSACTIONS ON CYBERNETICS | 2017年 / 47卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Adaptive control; adaptive dynamic programming (ADP); data-based; off-policy learning; optimal control; policy gradient; DISCRETE-TIME-SYSTEMS; H-INFINITY CONTROL; AFFINE NONLINEAR-SYSTEMS; OPTIMAL TRACKING CONTROL; HORIZON OPTIMAL-CONTROL; ZERO-SUM GAMES; LINEAR-SYSTEMS; CONTROL DESIGN; ITERATION; ALGORITHM;

D O I：

10.1109/TCYB.2016.2623859

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The model-free optimal control problem of general discrete-time nonlinear systems is considered in this paper, and a data-based policy gradient adaptive dynamic programming (PGADP) algorithm is developed to design an adaptive optimal controller method. By using offline and online data rather than the mathematical system model, the PGADP algorithm improves control policy with a gradient descent scheme. The convergence of the PGADP algorithm is proved by demonstrating that the constructed Q-function sequence converges to the optimal Q-function. Based on the PGADP algorithm, the adaptive control method is developed with an actor-critic structure and the method of weighted residuals. Its convergence properties are analyzed, where the approximate Q-function converges to its optimum. Computer simulation results demonstrate the effectiveness of the PGADP-based adaptive control method.

引用

页码：3341 / 3354

页数：14

共 74 条

[1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].

Abu-Khalaf, M ;

Lewis, FL .

AUTOMATICA, 2005, 41 (05) :779-791

[2] Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof [J].

Al-Tamimi, Asma ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04) :943-949

[3] Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control [J].

Al-Tamimi, Asma ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

AUTOMATICA, 2007, 43 (03) :473-481

[4]

[Anonymous], 2010, INT C MACH LEARN

[5]

[Anonymous], 2016, DYNAMIC PROGRAMMING

[6]

[Anonymous], 2014, ICML ICML 14

[7]

Bertsekas D. P., 1996, NEURODYNAMIC PROGRAM

[8] Adaptive dynamic programming and optimal control of nonlinear nonaffine systems [J].

Bian, Tao ;

Jiang, Yu ;

Jiang, Zhong-Ping .

AUTOMATICA, 2014, 50 (10) :2624-2632

[9] Generalized Hamilton-Jacobi-Blellman formulation-based neural network control of affine nonlinear discrete-time systems [J].

Chen, Zheng ;

Jagannathan, Sarangapani .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 2008, 19 (01) :90-106

[10]

Degris T., 2012, ARXIV12054839

← 1 2 3 4 5 6 7 8 →