On-Policy and Off-Policy Value Iteration Algorithms for Stochastic Zero-Sum Dynamic Games

被引：0

作者：

GUO Liangyuan ^{[1
]}

WANG BingChang ^{[1
]}

ZHANG JiFeng ^{[2
,3
]}

机构：

[1] School of Control Science and Engineering, Shandong University

[2] School of Automation and Electrical Engineering, Zhongyuan University of Technology

[3] Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences

来源：

Journal of Systems Science & Complexity | 2025年 / 38卷 / 01期

关键词：

D O I：

暂无

中图分类号：

O225 [对策论（博弈论）];

学科分类号：

070105 ; 1201 ;

摘要：

This paper considers the value iteration algorithms of stochastic zero-sum linear quadratic games with unkown dynamics. On-policy and off-policy learning algorithms are developed to solve the stochastic zero-sum games, where the system dynamics is not required. By analyzing the value function iterations, the convergence of the model-based algorithm is shown. The equivalence of several types of value iteration algorithms is established. The effectiveness of model-free algorithms is demonstrated by a numerical example.

引用

页码：421 / 435

页数：15

共 7 条

[1] Model-free optimal control of discrete-time systems with additive and multiplicative noises [J].

Lai, Jing ;

Xiong, Junlin ;

Shu, Zhan .

AUTOMATICA, 2023, 147

[2]

Policy Iteration Q-Learning for Data-Based Two-Player Zero-Sum Game of Linear Discrete-Time Systems..[J].Luo Biao;Yang Yin;Liu Derong.IEEE transactions on cybernetics.2020,

[3] CONTINUOUS-TIME ROBUST DYNAMIC PROGRAMMING [J].

Bian, Tao ;

Jiang, Zhong-Ping .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2019, 57 (06) :4150-4174

[4]

Stability Analysis of Optimal Adaptive Control Under Value Iteration Using a Stabilizing Initial Policy..[J].Heydari Ali.IEEE transactions on neural networks and learning systems.2017, 9

[5] Adaptive Dynamic Programming for Stochastic Systems With State and Control Dependent Noise [J].

Bian, Tao ;

Jiang, Yu ;

Jiang, Zhong-Ping .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2016, 61 (12) :4170-4175

[6]

Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm.[J].Derong Liu;Hongliang Li;Ding Wang.Neurocomputing.2013,

[7] Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics [J].

Jiang, Yu ;

Jiang, Zhong-Ping .

AUTOMATICA, 2012, 48 (10) :2699-2704

← 1 →