Hamiltonian-Driven Adaptive Dynamic Programming With Approximation Errors

被引:82
作者
Yang, Yongliang [1 ,2 ]
Modares, Hamidreza [3 ]
Vamvoudakis, Kyriakos G. [4 ]
He, Wei [1 ,2 ]
Xu, Cheng-Zhong [5 ]
Wunsch, Donald C. [6 ]
机构
[1] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China
[2] Univ Sci & Technol Beijing, Inst Artificial Intelligence, Beijing 100083, Peoples R China
[3] Michigan State Univ, Mech Engn Dept, E Lansing, MI 48824 USA
[4] Georgia Tech, Daniel Guggenheim Sch Aerosp Engn, Atlanta, GA 30332 USA
[5] Univ Macau, Fac Sci & Technol, State Key Lab Internet Things Smart City, Macau, Peoples R China
[6] Missouri Univ Sci & Technol, Dept Elect & Comp Engn, Rolla, MO 65401 USA
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
Costs; Mathematical model; Stability analysis; Approximation error; Approximation algorithms; Dynamic programming; Iterative algorithms; Hamilton-Jacobi-Bellman (HJB) equation; Hamiltonian-driven framework; inexact adaptive dynamic programming (ADP); optimal control; H-INFINITY CONTROL; CONTINUOUS-TIME SYSTEMS; NONLINEAR-SYSTEMS; TRACKING CONTROL; ROBUST-CONTROL; ARCHITECTURE; STATE;
D O I
10.1109/TCYB.2021.3108034
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we consider an iterative adaptive dynamic programming (ADP) algorithm within the Hamiltonian-driven framework to solve the Hamilton-Jacobi-Bellman (HJB) equation for the infinite-horizon optimal control problem in continuous time for nonlinear systems. First, a novel function, ``min-Hamiltonian,'' is defined to capture the fundamental properties of the classical Hamiltonian. It is shown that both the HJB equation and the policy iteration (PI) algorithm can be formulated in terms of the min-Hamiltonian within the Hamiltonian-driven framework. Moreover, we develop an iterative ADP algorithm that takes into consideration the approximation errors during the policy evaluation step. We then derive a sufficient condition on the iterative value gradient to guarantee closed-loop stability of the equilibrium point as well as convergence to the optimal value. A model-free extension based on an off-policy reinforcement learning (RL) technique is also provided. Finally, numerical results illustrate the efficacy of the proposed framework.
引用
收藏
页码:13762 / 13773
页数:12
相关论文
共 58 条
[1]   Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].
Abu-Khalaf, M ;
Lewis, FL .
AUTOMATICA, 2005, 41 (05) :779-791
[2]  
[Anonymous], 2010, Dynamic Programming
[3]  
Beard R., 1995, Improving the closed-loop performance of nonlinear systems
[4]   Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation [J].
Beard, RW ;
Saridis, GN ;
Wen, JT .
AUTOMATICA, 1997, 33 (12) :2159-2177
[5]  
Bertsekas D. P., 1996, Neuro-Dynamic Programming, V1st
[6]  
Bertsekas D. P., 2011, Dynamic Programming and Optimal Control, VII
[7]   A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems [J].
Bhasin, S. ;
Kamalapurkar, R. ;
Johnson, M. ;
Vamvoudakis, K. G. ;
Lewis, F. L. ;
Dixon, W. E. .
AUTOMATICA, 2013, 49 (01) :82-92
[8]   Reinforcement Learning and Adaptive Optimal Control for Continuous-Time Nonlinear Systems: A Value Iteration Approach [J].
Bian, Tao ;
Jiang, Zhong-Ping .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (07) :2781-2790
[9]   CONTINUOUS-TIME ROBUST DYNAMIC PROGRAMMING [J].
Bian, Tao ;
Jiang, Zhong-Ping .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2019, 57 (06) :4150-4174
[10]   Adaptive Dynamic Programming for Stochastic Systems With State and Control Dependent Noise [J].
Bian, Tao ;
Jiang, Yu ;
Jiang, Zhong-Ping .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2016, 61 (12) :4170-4175