Accelerated Value Iteration for Nonlinear Zero-Sum Games with Convergence Guarantee

被引：0

作者：

Wang, Yuan ^{[1
,2
,3
,4
]}

Zhao, Mingming ^{[1
,2
,3
,4
]}

Liu, Nan ^{[1
,2
,3
,4
]}

Wang, Ding ^{[1
]}

机构：

[1] Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China

[2] Beijing Univ Technol, Beijing Key Lab Computat Intelligence & Intelligen, Beijing 100124, Peoples R China

[3] Beijing Univ Technol, Beijing Inst Articial Intelligence, Beijing 100124, Peoples R China

[4] Beijing Univ Technol, Beijing Lab Smart Environm Protect, Beijing 100124, Peoples R China

来源：

GUIDANCE NAVIGATION AND CONTROL | 2024年 / 04卷 / 01期

基金：

中国国家自然科学基金; 北京市自然科学基金;

关键词：

Adaptive dynamic programming; convergence rate; value iteration; zero-sum games; STABILITY ANALYSIS; TRACKING; SYSTEMS; DESIGNS;

D O I：

10.1142/S2737480724500031

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, an accelerated value iteration (VI) algorithm is established to solve the zero-sum game problem with convergence guarantee. First, inspired by the successive over relaxation theory, the convergence rate of the iterative value function sequence is accelerated significantly with the relaxation factor. Second, the convergence and monotonicity of the value function sequence are analyzed under different ranges of the relaxation factor. Third, two practical approaches, namely the integrated scheme and the relaxation function, are introduced into the accelerated VI algorithm to guarantee the convergence of the iterative value function sequence for zero-sum games. The integrated scheme consists of the accelerated stage and the convergence stage, and the relaxation function can adjust the value of the relaxation factor. Finally, including the autopilot controller, the fantastic performance of the accelerated VI algorithm is verified through two examples with practical physical backgrounds.

引用

页数：28

共 39 条

[1] Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control [J].

Al-Tamimi, Asma ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

AUTOMATICA, 2007, 43 (03) :473-481

[2] Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming [J].

Bertsekas, Dimitri P. .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) :500-509

[3]

Dong W, 2021, GUID NAVIG CONTROL, V01, DOI 10.1142/S2737480721500114

[4] Novel Discounted Adaptive Critic Control Designs With Accelerated Learning Formulation [J].

Ha, Mingming ;

Wang, Ding ;

Liu, Derong .

IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (05) :3003-3016

[5] Discounted Iterative Adaptive Critic Designs With Novel Stability Analysis for Tracking Control [J].

Ha, Mingming ;

Wang, Ding ;

Liu, Derong .

IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2022, 9 (07) :1262-1272

[6] A Novel Value Iteration Scheme With Adjustable Convergence Rate [J].

Ha, Mingming ;

Wang, Ding ;

Liu, Derong .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (10) :7430-7442

[7] Stability Analysis of Optimal Adaptive Control Under Value Iteration Using a Stabilizing Initial Policy [J].

Heydari, Ali .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (09) :4522-4527

[8] Actor-Critic-Based Optimal Tracking for Partially Unknown Nonlinear Discrete-Time Systems [J].

Kiumarsi, Bahare ;

Lewis, Frank L. .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2015, 26 (01) :140-151

[9] A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems [J].

Li, Chun ;

Ding, Jinliang ;

Lewis, Frank L. ;

Chai, Tianyou .

AUTOMATICA, 2021, 129

[10] Neuro-Optimal Control for Discrete Stochastic Processes via a Novel Policy Iteration Algorithm [J].

Liang, Mingming ;

Wang, Ding ;

Liu, Derong .

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (11) :3972-3985

← 1 2 3 4 →