Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Convergence Analysis

被引:145
作者
Wei, Qinglai [1 ]
Lewis, Frank L. [2 ,3 ]
Liu, Derong [4 ]
Song, Ruizhuo [4 ]
Lin, Hanquan [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
[2] Univ Texas Arlington, UTA Res Inst, Arlington, TX 76118 USA
[3] Northeastern Univ, Shenyang 110036, Liaoning, Peoples R China
[4] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China
来源
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2018年 / 48卷 / 06期
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
Adaptive critic designs; adaptive dynamic programming (ADP); approximate dynamic programming; local iteration; neural networks; neuro-dynamic programming; nonlinear systems; optimal control; OPTIMAL TRACKING CONTROL; NONLINEAR-SYSTEMS; POLICY ITERATION; CONTROL SCHEME; FEEDBACK-CONTROL; LEARNING CONTROL; DESIGN; ALGORITHM; GAMES;
D O I
10.1109/TSMC.2016.2623766
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, convergence properties are established for the newly developed discrete-time local value iteration adaptive dynamic programming (ADP) algorithm. The present local iterative ADP algorithm permits an arbitrary positive semidefinite function to initialize the algorithm. Employing a state-dependent learning rate function, for the first time, the iterative value function and iterative control law can be updated in a subset of the state space instead of the whole state space, which effectively relaxes the computational burden. A new analysis method for the convergence property is developed to prove that the iterative value functions will converge to the optimum under some mild constraints. Monotonicity of the local value iteration ADP algorithm is presented, which shows that under some special conditions of the initial value function and the learning rate function, the iterative value function can monotonically converge to the optimum. Finally, three simulation examples and comparisons are given to illustrate the performance of the developed algorithm.
引用
收藏
页码:875 / 891
页数:17
相关论文
共 54 条
[1]   Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof [J].
Al-Tamimi, Asma ;
Lewis, Frank L. ;
Abu-Khalaf, Murad .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04) :943-949
[2]  
[Anonymous], 1996, Neuro-dynamic programming
[3]  
Bertsekas D. P., 2007, Dynamic Programming and Optimal Control
[4]   Global optimal feedback control for general nonlinear systems with nonquadratic performance criteria [J].
Çimen, T ;
Banks, SP .
SYSTEMS & CONTROL LETTERS, 2004, 53 (05) :327-346
[5]   Finite-Horizon Control-Constrained Nonlinear Optimal Control Using Single Network Adaptive Critics [J].
Heydari, Ali ;
Balakrishnan, Sivasubramanya N. .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (01) :145-157
[6]   Distributed Optimal Co-multi-microgrids Energy Management for Energy Internet [J].
Huang, Bonan ;
Li, Yushuai ;
Zhang, Huaguang ;
Sun, Qiuye .
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2016, 3 (04) :357-364
[7]   Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems [J].
Jiang, Yu ;
Jiang, Zhong-Ping .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2015, 60 (11) :2917-2929
[8]   Actor-Critic-Based Optimal Tracking for Partially Unknown Nonlinear Discrete-Time Systems [J].
Kiumarsi, Bahare ;
Lewis, Frank L. .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2015, 26 (01) :140-151
[9]   Reinforcement Learning and Feedback Control USING NATURAL DECISION METHODS TO DESIGN OPTIMAL ADAPTIVE CONTROLLERS [J].
Lewis, Frank L. ;
Vrabie, Draguna ;
Vamvoudakis, Kyriakos G. .
IEEE CONTROL SYSTEMS MAGAZINE, 2012, 32 (06) :76-105
[10]   Train Rescheduling With Stochastic Recovery Time: A New Track-Backup Approach [J].
Li, Xiang ;
Shou, Biying ;
Ralescu, Dan .
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2014, 44 (09) :1216-1233