Hamiltonian-Driven Adaptive Dynamic Programming With Efficient Experience Replay

被引:129
作者
Yang, Yongliang [1 ]
Pan, Yongping [2 ]
Xu, Cheng-Zhong [3 ]
Wunsch, Donald C. [4 ]
机构
[1] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Key Lab Knowledge Automat Ind Proc, Minist Educ, Beijing 100083, Peoples R China
[2] Sun Yat Sen Univ, Sch Adv Mfg, Shenzhen 518107, Peoples R China
[3] Univ Macau, Dept Comp & Informat Sci, State Key Lab Internet Things Smart City, Macau, Peoples R China
[4] Missouri Univ Sci & Technol, Dept Elect & Comp Engn, Rolla, MO 65409 USA
关键词
Mathematical models; Optimal control; Optimization; Convergence; Iterative algorithms; Dynamic programming; Learning systems; Hamilton-Jacobi-Bellman (HJB) equation; Hamiltonian-driven adaptive dynamic programming (ADP); pseudo-Hamiltonian; quasi-Hamiltonian; relaxed excitation condition; LINEAR-SYSTEMS;
D O I
10.1109/TNNLS.2022.3213566
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article presents a novel efficient experience-replay-based adaptive dynamic programming (ADP) for the optimal control problem of a class of nonlinear dynamical systems within the Hamiltonian-driven framework. The quasi-Hamiltonian is presented for the policy evaluation problem with an admissible policy. With the quasi-Hamiltonian, a novel composite critic learning mechanism is developed to combine the instantaneous data with the historical data. In addition, the pseudo-Hamiltonian is defined to deal with the performance optimization problem. Based on the pseudo-Hamiltonian, the conventional Hamilton-Jacobi-Bellman (HJB) equation can be represented in a filtered form, which can be implemented online. Theoretical analysis is investigated in terms of the convergence of the adaptive critic design and the stability of the closed-loop systems, where parameter convergence can be achieved under a weakened excitation condition. Simulation studies are investigated to verify the efficacy of the presented design scheme.
引用
收藏
页码:3278 / 3290
页数:13
相关论文
共 49 条
[1]   Experience Replay for Real-Time Reinforcement Learning Control [J].
Adam, Sander ;
Busoniu, Lucian ;
Babuska, Robert .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (02) :201-212
[2]  
[Anonymous], 2012, Calculus of Variations and Optimal Control Theory: A Concise Introduction
[3]  
BRYSON A, 1969, APPL OPTIMAL CONTROL
[4]  
Busoniu R., 2017, ReinforcementLearning and Dynamic ProgrammingUsing Function Approximators
[5]   Concurrent learning adaptive control of linear systems with exponentially convergent bounds [J].
Chowdhary, Girish ;
Yucelen, Tansel ;
Muehlegg, Maximillian ;
Johnson, Eric N. .
INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2013, 27 (04) :280-301
[6]   Concurrent Learning for Convergence in Adaptive Control without Persistency of Excitation [J].
Chowdhary, Girish ;
Johnson, Eric .
49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, :3674-3679
[7]   Composite Experience Replay-Based Deep Reinforcement Learning With Application in Wind Farm Control [J].
Dong, Hongyang ;
Zhao, Xiaowei .
IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2022, 30 (03) :1281-1295
[8]   Optimal Tracking Control for Uncertain Nonlinear Systems With Prescribed Performance via Critic-Only ADP [J].
Dong, Hongyang ;
Zhao, Xiaowei ;
Luo, Biao .
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (01) :561-573
[9]   Learning-based adaptive optimal output regulation of linear and nonlinear systems: an overview [J].
Gao, Weinan ;
Jiang, Zhong-Ping .
CONTROL THEORY AND TECHNOLOGY, 2022, 20 (01) :1-19
[10]   Adaptive Dynamic Programming and Adaptive Optimal Output Regulation of Linear Systems [J].
Gao, Weinan ;
Jiang, Zhong-Ping .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2016, 61 (12) :4164-4169