Offline and Online Adaptive Critic Control Designs With Stability Guarantee Through Value Iteration

被引：54

作者：

Ha, Mingming ^{[1
]}

Wang, Ding ^{[2
,3
]}

Liu, Derong ^{[4
]}

机构：

[1] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China

[2] Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China

[3] Beijing Univ Technol, Beijing Key Lab Computat Intelligence & Intellige, Beijing 100124, Peoples R China

[4] Univ Illinois, Dept Elect & Comp Engn, Chicago, IL 60607 USA

来源：

IEEE TRANSACTIONS ON CYBERNETICS | 2022年 / 52卷 / 12期

基金：

中国国家自然科学基金; 北京市自然科学基金;

关键词：

Stability criteria; Asymptotic stability; Numerical stability; Power system stability; Heuristic algorithms; Cost function; Trajectory; Adaptive dynamic programming; asymptotic stability; online adaptive critic control; policy iteration (PI); reinforcement learning (RL); value iteration (VI); FEEDBACK-CONTROL; ROBUST-CONTROL; SYSTEMS; ALGORITHM; GAME;

D O I：

10.1109/TCYB.2021.3107801

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This article is concerned with the stability of the closed-loop system using various control policies generated by value iteration. Some stability properties involving admissibility criteria, the attraction domain, and so forth, are investigated. An offline integrated value iteration (VI) scheme with a stability guarantee is developed by combining the advantages of VI and policy iteration, which is convenient to obtain admissible control policies. Also, based on the concept of attraction domain, an online adaptive dynamic programming algorithm using immature control policies is developed. Remarkably, it is ensured that the state trajectory under the online algorithm converges to the origin. Particularly, for linear systems, the online ADP algorithm with a general scheme possesses more enhanced stability property. The theoretical results reveal that the stability of the linear system can be guaranteed even if the control policy sequence includes finite unstable elements. The numerical results verify the effectiveness of the present algorithms.

引用

页码：13262 / 13274

页数：13

共 50 条

[1] Online Model-Free n-Step HDP With Stability Analysis [J].

Al Dabooni, Seaar ;

Wunsch, Donald C., II .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (04) :1255-1269

[2] An Improved N-Step Value Gradient Learning Adaptive Dynamic Programming Algorithm for Online Learning [J].

Al-Dabooni, Seaar ;

Wunsch, Donald C., II .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (04) :1155-1169

[3] Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof [J].

Al-Tamimi, Asma ;

Lewis, Frank .

2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, :38-+

[4] Improving RTS Game AI by Supervised Policy Learning, Tactical Search, and Deep Reinforcement Learning [J].

Barriga, Nicolas A. ;

Stanescu, Marius ;

Besoain, Felipe ;

Buro, Michael .

IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2019, 14 (03) :8-18

[5] Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming [J].

Bertsekas, Dimitri P. .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) :500-509

[6] Reinforcement Learning in Video Games Using Nearest Neighbor Interpolation and Metric Learning [J].

Emigh, Matthew S. ;

Kriminger, Evan G. ;

Brockmeier, Austin J. ;

Principe, Jose C. ;

Pardalos, Panos M. .

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, 2016, 8 (01) :56-66

[7] Reinforcement Learning Approach for Optimal Distributed Energy Management in a Microgrid [J].

Foruzan, Elham ;

Soh, Leen-Kiat ;

Asgarpoor, Sohrab .

IEEE TRANSACTIONS ON POWER SYSTEMS, 2018, 33 (05) :5749-5758

[8] Finite-Horizon Discounted Optimal Control: Stability and Performance [J].

Granzotto, Mathieu ;

Postoyan, Romain ;

Busoniu, Lucian ;

Nesic, Dragan ;

Daafouz, Jamal .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (02) :550-565

[9] Generalized value iteration for discounted optimal control with stability analysis [J].

Ha, Mingming ;

Wang, Ding ;

Liu, Derong .

SYSTEMS & CONTROL LETTERS, 2021, 147 (147)

[10] Stability Analysis of Optimal Adaptive Control Using Value Iteration With Approximation Errors [J].

Heydari, Ali .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (09) :3119-3126

← 1 2 3 4 5 →