Novel Discounted Adaptive Critic Control Designs With Accelerated Learning Formulation

被引：17

作者：

Ha, Mingming ^{[1
,2
]}

Wang, Ding ^{[3
]}

Liu, Derong ^{[4
,5
]}

机构：

[1] Ant Grp, MYbank, Beijing 100020, Peoples R China

[2] Univ Sci & Technol Beijing, Sch Automation & Elect Engn, Beijing 100083, Peoples R China

[3] Beijing Univ Technol, Fac Informat Technol, Beijing Key Lab Computat Intelligence & Intelligen, Beijing 100124, Peoples R China

[4] Southern Univ Sci & Technol, Sch Syst Design & Intelligent Mfg, Shenzhen 518055, Peoples R China

[5] Univ Illinois, Dept Elect & Comp Engn, Chicago, IL 60607 USA

来源：

IEEE TRANSACTIONS ON CYBERNETICS | 2024年 / 54卷 / 05期

基金：

北京市自然科学基金; 中国国家自然科学基金;

关键词：

Iterative methods; Convergence; Power system stability; Optimal control; Stability criteria; Cost function; Closed loop systems; Adaptive critic designs; adaptive dynamic programming (ADP); discrete-time nonlinear systems; fast convergence rate; reinforcement learning; value iteration (VI); STABILITY ANALYSIS; VALUE-ITERATION; SUBJECT;

D O I：

10.1109/TCYB.2022.3233593

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Inspired by the successive relaxation method, a novel discounted iterative adaptive dynamic programming framework is developed, in which the iterative value function sequence possesses an adjustable convergence rate. The different convergence properties of the value function sequence and the stability of the closed-loop systems under the new discounted value iteration (VI) are investigated. Based on the properties of the given VI scheme, an accelerated learning algorithm with convergence guarantee is presented. Moreover, the implementations of the new VI scheme and its accelerated learning design are elaborated, which involve value function approximation and policy improvement. A nonlinear fourth-order ball-and-beam balancing plant is used to verify the performance of the developed approaches. Compared with the traditional VI, the present discounted iterative adaptive critic designs greatly accelerate the convergence rate of the value function and reduce the computational cost simultaneously.

引用

页码：3003 / 3016

页数：14

共 51 条

[1] Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof [J].

Al-Tamimi, Asma ;

Lewis, Frank .

2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, :38-+

[2] Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming [J].

Bertsekas, Dimitri P. .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) :500-509

[3]

Burden R.L., 2010, NUMERICAL ANAL, Vnineth

[4] Stability and almost disturbance decoupling analysis of nonlinear system subject to feedback linearization and feedforward neural network controller [J].

Chien, Ting-Li ;

Chen, Chung-Cheng ;

Huang, Yi-Chieh ;

Lin, Wen-Jiun .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 2008, 19 (07) :1220-1230

[5] Functional Nonlinear Model Predictive Control Based on Adaptive Dynamic Programming [J].

Dong, Lu ;

Yan, Jun ;

Yuan, Xin ;

He, Haibo ;

Sun, Changyin .

IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (12) :4206-4218

[6] Discounted Iterative Adaptive Critic Designs With Novel Stability Analysis for Tracking Control [J].

Ha, Mingming ;

Wang, Ding ;

Liu, Derong .

IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2022, 9 (07) :1262-1272

[7] A Novel Value Iteration Scheme With Adjustable Convergence Rate [J].

Ha, Mingming ;

Wang, Ding ;

Liu, Derong .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (10) :7430-7442

[8] Data-Based Nonaffine Optimal Tracking Control Using Iterative DHP Approach [J].

Ha, Mingming ;

Wang, Ding ;

Liu, Derong .

IFAC PAPERSONLINE, 2020, 53 (02) :4246-4251

[9] Generalized value iteration for discounted optimal control with stability analysis [J].

Ha, Mingming ;

Wang, Ding ;

Liu, Derong .

SYSTEMS & CONTROL LETTERS, 2021, 147 (147)

[10]

Ha MM, 2020, CHIN CONTR CONF, P1951, DOI 10.23919/CCC50068.2020.9188706

← 1 2 3 4 5 6 →