A boundedness result for the direct heuristic dynamic programming

被引:112
作者
Liu, Feng [2 ]
Sun, Jian [2 ]
Si, Jennie [1 ]
Guo, Wentao [2 ]
Mei, Shengwei [2 ]
机构
[1] Arizona State Univ, Dept Elect Engn, Tempe, AZ 85287 USA
[2] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
Approximate dynamic programming (ADP); Direct heuristic dynamic programming (direct HDP); Lyapunov stability; Uniformly ultimately boundedness (UUB); LEARNING CONTROL; POWER-SYSTEM; REINFORCEMENT; NEUROCONTROL; STABILITY;
D O I
10.1016/j.neunet.2012.02.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Approximate/adaptive dynamic programming (ADP) has been studied extensively in recent years for its potential scalability to solve large state and control space problems, including those involving continuous states and continuous controls. The applicability of ADP algorithms, especially the adaptive critic designs has been demonstrated in several case studies. Direct heuristic dynamic programming (direct HDP) is one of the ADP algorithms inspired by the adaptive critic designs. It has been shown applicable to industrial scale, realistic and complex control problems. In this paper, we provide a uniformly ultimately boundedness (UUB) result for the direct HDP learning controller under mild and intuitive conditions. By using a Lyapunov approach we show that the estimation errors of the learning parameters or the weights in the action and critic networks remain UUB. This result provides a useful controller convergence guarantee for the first time for the direct HDP design. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:229 / 235
页数:7
相关论文
共 27 条
[1]   Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].
Abu-Khalaf, M ;
Lewis, FL .
AUTOMATICA, 2005, 41 (05) :779-791
[2]   Policy iterations on the Hamilton-Jacobi-Isaacs equation for H∞ state feedback control with input saturation [J].
Abu-Khalaf, Murad ;
Lewis, Frank L. ;
Huang, Jie .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2006, 51 (12) :1989-1995
[3]   Neurodynamic programming and zero-sum games for constrained control systems [J].
Abu-Khalaf, Murad ;
Lewis, Frank L. ;
Huang, Jie .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2008, 19 (07) :1243-1252
[4]  
[Anonymous], 1992, HDB INTELLIGENT CONT
[5]  
[Anonymous], 1979, Introduction to dynamic systems: theory, models, and applica-tions
[6]   Issues on stability of ADP feedback controllers for dynamical systems [J].
Balakrishnan, S. N. ;
Ding, Jie ;
Lewis, Frank L. .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04) :913-917
[7]   NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS [J].
BARTO, AG ;
SUTTON, RS ;
ANDERSON, CW .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05) :834-846
[8]  
Bellman R.E., 1962, Applied Dynamic Programming
[9]  
BRADTKE SJ, 1994, PROCEEDINGS OF THE 1994 AMERICAN CONTROL CONFERENCE, VOLS 1-3, P3475
[10]   Helicopter flight-control reconfiguration for main rotor actuator failures [J].
Enns, R ;
Si, J .
JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2003, 26 (04) :572-584