Hamiltonian-Driven Adaptive Dynamic Programming With Efficient Experience Replay

被引：129

作者：

Yang, Yongliang ^{[1
]}

Pan, Yongping ^{[2
]}

Xu, Cheng-Zhong ^{[3
]}

Wunsch, Donald C. ^{[4
]}

机构：

[1] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Key Lab Knowledge Automat Ind Proc, Minist Educ, Beijing 100083, Peoples R China

[2] Sun Yat Sen Univ, Sch Adv Mfg, Shenzhen 518107, Peoples R China

[3] Univ Macau, Dept Comp & Informat Sci, State Key Lab Internet Things Smart City, Macau, Peoples R China

[4] Missouri Univ Sci & Technol, Dept Elect & Comp Engn, Rolla, MO 65409 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 03期

关键词：

Mathematical models; Optimal control; Optimization; Convergence; Iterative algorithms; Dynamic programming; Learning systems; Hamilton-Jacobi-Bellman (HJB) equation; Hamiltonian-driven adaptive dynamic programming (ADP); pseudo-Hamiltonian; quasi-Hamiltonian; relaxed excitation condition; LINEAR-SYSTEMS;

D O I：

10.1109/TNNLS.2022.3213566

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This article presents a novel efficient experience-replay-based adaptive dynamic programming (ADP) for the optimal control problem of a class of nonlinear dynamical systems within the Hamiltonian-driven framework. The quasi-Hamiltonian is presented for the policy evaluation problem with an admissible policy. With the quasi-Hamiltonian, a novel composite critic learning mechanism is developed to combine the instantaneous data with the historical data. In addition, the pseudo-Hamiltonian is defined to deal with the performance optimization problem. Based on the pseudo-Hamiltonian, the conventional Hamilton-Jacobi-Bellman (HJB) equation can be represented in a filtered form, which can be implemented online. Theoretical analysis is investigated in terms of the convergence of the adaptive critic design and the stability of the closed-loop systems, where parameter convergence can be achieved under a weakened excitation condition. Simulation studies are investigated to verify the efficacy of the presented design scheme.

引用

页码：3278 / 3290

页数：13

共 49 条

[1] Experience Replay for Real-Time Reinforcement Learning Control [J].

Adam, Sander ;

Busoniu, Lucian ;

Babuska, Robert .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (02) :201-212

[2]

[Anonymous], 2012, Calculus of Variations and Optimal Control Theory: A Concise Introduction

[3]

BRYSON A, 1969, APPL OPTIMAL CONTROL

[4]

Busoniu R., 2017, ReinforcementLearning and Dynamic ProgrammingUsing Function Approximators

[5] Concurrent learning adaptive control of linear systems with exponentially convergent bounds [J].

Chowdhary, Girish ;

Yucelen, Tansel ;

Muehlegg, Maximillian ;

Johnson, Eric N. .

INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2013, 27 (04) :280-301

[6] Concurrent Learning for Convergence in Adaptive Control without Persistency of Excitation [J].

Chowdhary, Girish ;

Johnson, Eric .

49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, :3674-3679

[7] Composite Experience Replay-Based Deep Reinforcement Learning With Application in Wind Farm Control [J].

Dong, Hongyang ;

Zhao, Xiaowei .

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2022, 30 (03) :1281-1295

[8] Optimal Tracking Control for Uncertain Nonlinear Systems With Prescribed Performance via Critic-Only ADP [J].

Dong, Hongyang ;

Zhao, Xiaowei ;

Luo, Biao .

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (01) :561-573

[9] Learning-based adaptive optimal output regulation of linear and nonlinear systems: an overview [J].

Gao, Weinan ;

Jiang, Zhong-Ping .

CONTROL THEORY AND TECHNOLOGY, 2022, 20 (01) :1-19

[10] Adaptive Dynamic Programming and Adaptive Optimal Output Regulation of Linear Systems [J].

Gao, Weinan ;

Jiang, Zhong-Ping .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2016, 61 (12) :4164-4169

← 1 2 3 4 5 →