On-Policy and Off-Policy Value Iteration Algorithms for Stochastic Zero-Sum Dynamic Games

被引:0
作者
GUO Liangyuan [1 ]
WANG BingChang [1 ]
ZHANG JiFeng [2 ,3 ]
机构
[1] School of Control Science and Engineering, Shandong University
[2] School of Automation and Electrical Engineering, Zhongyuan University of Technology
[3] Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences
关键词
D O I
暂无
中图分类号
O225 [对策论(博弈论)];
学科分类号
070105 ; 1201 ;
摘要
This paper considers the value iteration algorithms of stochastic zero-sum linear quadratic games with unkown dynamics. On-policy and off-policy learning algorithms are developed to solve the stochastic zero-sum games, where the system dynamics is not required. By analyzing the value function iterations, the convergence of the model-based algorithm is shown. The equivalence of several types of value iteration algorithms is established. The effectiveness of model-free algorithms is demonstrated by a numerical example.
引用
收藏
页码:421 / 435
页数:15
相关论文
共 7 条
[1]   Model-free optimal control of discrete-time systems with additive and multiplicative noises [J].
Lai, Jing ;
Xiong, Junlin ;
Shu, Zhan .
AUTOMATICA, 2023, 147
[2]  
Policy Iteration Q-Learning for Data-Based Two-Player Zero-Sum Game of Linear Discrete-Time Systems..[J].Luo Biao;Yang Yin;Liu Derong.IEEE transactions on cybernetics.2020,
[3]   CONTINUOUS-TIME ROBUST DYNAMIC PROGRAMMING [J].
Bian, Tao ;
Jiang, Zhong-Ping .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2019, 57 (06) :4150-4174
[4]  
Stability Analysis of Optimal Adaptive Control Under Value Iteration Using a Stabilizing Initial Policy..[J].Heydari Ali.IEEE transactions on neural networks and learning systems.2017, 9
[5]   Adaptive Dynamic Programming for Stochastic Systems With State and Control Dependent Noise [J].
Bian, Tao ;
Jiang, Yu ;
Jiang, Zhong-Ping .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2016, 61 (12) :4170-4175
[6]  
Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm.[J].Derong Liu;Hongliang Li;Ding Wang.Neurocomputing.2013,
[7]   Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics [J].
Jiang, Yu ;
Jiang, Zhong-Ping .
AUTOMATICA, 2012, 48 (10) :2699-2704