Hamiltonian-Driven Adaptive Dynamic Programming With Approximation Errors

被引：82

作者：

Yang, Yongliang ^{[1
,2
]}

Modares, Hamidreza ^{[3
]}

Vamvoudakis, Kyriakos G. ^{[4
]}

He, Wei ^{[1
,2
]}

Xu, Cheng-Zhong ^{[5
]}

Wunsch, Donald C. ^{[6
]}

机构：

[1] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China

[2] Univ Sci & Technol Beijing, Inst Artificial Intelligence, Beijing 100083, Peoples R China

[3] Michigan State Univ, Mech Engn Dept, E Lansing, MI 48824 USA

[4] Georgia Tech, Daniel Guggenheim Sch Aerosp Engn, Atlanta, GA 30332 USA

[5] Univ Macau, Fac Sci & Technol, State Key Lab Internet Things Smart City, Macau, Peoples R China

[6] Missouri Univ Sci & Technol, Dept Elect & Comp Engn, Rolla, MO 65401 USA

来源：

IEEE TRANSACTIONS ON CYBERNETICS | 2022年 / 52卷 / 12期

基金：

中国国家自然科学基金; 美国国家科学基金会;

关键词：

Costs; Mathematical model; Stability analysis; Approximation error; Approximation algorithms; Dynamic programming; Iterative algorithms; Hamilton-Jacobi-Bellman (HJB) equation; Hamiltonian-driven framework; inexact adaptive dynamic programming (ADP); optimal control; H-INFINITY CONTROL; CONTINUOUS-TIME SYSTEMS; NONLINEAR-SYSTEMS; TRACKING CONTROL; ROBUST-CONTROL; ARCHITECTURE; STATE;

D O I：

10.1109/TCYB.2021.3108034

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this article, we consider an iterative adaptive dynamic programming (ADP) algorithm within the Hamiltonian-driven framework to solve the Hamilton-Jacobi-Bellman (HJB) equation for the infinite-horizon optimal control problem in continuous time for nonlinear systems. First, a novel function, ``min-Hamiltonian,'' is defined to capture the fundamental properties of the classical Hamiltonian. It is shown that both the HJB equation and the policy iteration (PI) algorithm can be formulated in terms of the min-Hamiltonian within the Hamiltonian-driven framework. Moreover, we develop an iterative ADP algorithm that takes into consideration the approximation errors during the policy evaluation step. We then derive a sufficient condition on the iterative value gradient to guarantee closed-loop stability of the equilibrium point as well as convergence to the optimal value. A model-free extension based on an off-policy reinforcement learning (RL) technique is also provided. Finally, numerical results illustrate the efficacy of the proposed framework.

引用

页码：13762 / 13773

页数：12

共 58 条

[1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].