A Novel Value Iteration Scheme With Adjustable Convergence Rate

被引:36
作者
Ha, Mingming [1 ]
Wang, Ding [2 ,3 ]
Liu, Derong [4 ]
机构
[1] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China
[2] Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[3] Beijing Univ Technol, Beijing Key Lab Computat Intelligence & Intellige, Beijing, Peoples R China
[4] Univ Illinois, Dept Elect & Comp Engn, Chicago, IL 60607 USA
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
Convergence; Iterative algorithms; Stability criteria; Heuristic algorithms; Approximation algorithms; Optimal control; Numerical stability; Adaptive dynamic programming (ADP); admissible control policy; convergence rate; discrete-time nonlinear systems; reinforcement learning (RL); value iteration; OPTIMAL ADAPTIVE-CONTROL; STABILITY ANALYSIS; DESIGN; INPUT;
D O I
10.1109/TNNLS.2022.3143527
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article, a novel value iteration scheme is developed with convergence and stability discussions. A relaxation factor is introduced to adjust the convergence rate of the value function sequence. The convergence conditions with respect to the relaxation factor are given. The stability of the closed-loop system using the control policies generated by the present VI algorithm is investigated. Moreover, an integrated VI approach is developed to accelerate and guarantee the convergence by combining the advantages of the present and traditional value iterations. Also, a relaxation function is designed to adaptively make the developed value iteration scheme possess fast convergence property. Finally, the theoretical results and the effectiveness of the present algorithm are validated by numerical examples.
引用
收藏
页码:7430 / 7442
页数:13
相关论文
共 43 条
[1]   Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof [J].
Al-Tamimi, Asma ;
Lewis, Frank .
2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, :38-+
[2]   Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming [J].
Bertsekas, Dimitri P. .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) :500-509
[3]  
Burden R. L., 1985, NUMERICAL ANAL
[4]   Finite-Horizon Discounted Optimal Control: Stability and Performance [J].
Granzotto, Mathieu ;
Postoyan, Romain ;
Busoniu, Lucian ;
Nesic, Dragan ;
Daafouz, Jamal .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (02) :550-565
[5]   Generalized value iteration for discounted optimal control with stability analysis [J].
Ha, Mingming ;
Wang, Ding ;
Liu, Derong .
SYSTEMS & CONTROL LETTERS, 2021, 147 (147)
[6]   Event-Triggered Adaptive Critic Control Design for Discrete-Time Constrained Nonlinear Systems [J].
Ha, Mingming ;
Wang, Ding ;
Liu, Derong .
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (09) :3158-3168
[7]   Event-triggered constrained control with DHP implementation for nonaffine discrete-time systems [J].
Ha, Mingming ;
Wang, Ding ;
Liu, Derong .
INFORMATION SCIENCES, 2020, 519 :110-123
[8]   Modeling and trajectory tracking control for flapping-wing micro aerial vehicles [J].
He, Wei ;
Mu, Xinxing ;
Zhang, Liang ;
Zou, Yao .
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2021, 8 (01) :148-156
[9]   Dynamical Modeling and Boundary Vibration Control of a Rigid-Flexible Wing System [J].
He, Wei ;
Wang, Tingting ;
He, Xiuyu ;
Yang, Lung-Jieh ;
Kaynak, Okyay .
IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2020, 25 (06) :2711-2721
[10]   Optimal Codesign of Control Input and Triggering Instants for Networked Control Systems Using Adaptive Dynamic Programming [J].
Heydari, Ali .
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2019, 66 (01) :482-490