Infinite Horizon Self-Learning Optimal Control of Nonaffine Discrete-Time Nonlinear Systems

被引：127

作者：

Wei, Qinglai ^{[1
]}

Liu, Derong ^{[1
]}

Yang, Xiong ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2015年 / 26卷 / 04期

基金：

中国国家自然科学基金; 北京市自然科学基金;

关键词：

Adaptive critic designs; adaptive dynamic programming (ADP); approximate dynamic programming; generalized policy iteration; neural networks (NNs); neurodynamic programming; nonlinear systems; optimal control; reinforcement learning; OPTIMAL TRACKING CONTROL; DYNAMIC-PROGRAMMING ALGORITHM; ADAPTIVE OPTIMAL-CONTROL; ZERO-SUM GAMES; UNKNOWN DYNAMICS; CONTROL SCHEME; APPROXIMATION ERRORS; POLICY ITERATION; LINEAR-SYSTEMS; CRITIC DESIGNS;

D O I：

10.1109/TNNLS.2015.2401334

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, a novel iterative adaptive dynamic programming (ADP)-based infinite horizon self-learning optimal control algorithm, called generalized policy iteration algorithm, is developed for nonaffine discrete-time (DT) nonlinear systems. Generalized policy iteration algorithm is a general idea of interacting policy and value iteration algorithms of ADP. The developed generalized policy iteration algorithm permits an arbitrary positive semidefinite function to initialize the algorithm, where two iteration indices are used for policy improvement and policy evaluation, respectively. It is the first time that the convergence, admissibility, and optimality properties of the generalized policy iteration algorithm for DT nonlinear systems are analyzed. Neural networks are used to implement the developed algorithm. Finally, numerical examples are presented to illustrate the performance of the developed algorithm.

引用

页码：866 / 879

页数：14

共 71 条

[1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach
Abu-Khalaf, M
Lewis, FL
[J]. AUTOMATICA, 2005, 41 (05) : 779 - 791
[2] Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof
Al-Tamimi, Asma
Lewis, Frank L.
Abu-Khalaf, Murad
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04): : 943 - 949
[3] [Anonymous], 1996, Neuro-dynamic programming
[4] [Anonymous], 2013, Optimal adaptive control and differential games by reinforcement learning principles
[5] Apostol T.M., 1974, Mathematical Analysis
[6] Beard R., 1995, IMPROVING CLOSED LOO
[7] A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems
Bhasin, S.
Kamalapurkar, R.
Johnson, M.
Vamvoudakis, K. G.
Lewis, F. L.
Dixon, W. E.
[J]. AUTOMATICA, 2013, 49 (01) : 82 - 92
[8] Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update
Dierks, Travis
Jagannathan, Sarangapani
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2012, 23 (07) : 1118 - 1129
[9] Helicopter trimming and tracking control using direct neural dynamic programming
Enns, R
Si, J
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2003, 14 (04): : 929 - 939
[10] An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time
Fairbank, Michael
Alonso, Eduardo
Prokhorov, Danil
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (12) : 2088 - 2100

← 1 2 3 4 5 6 7 8 →