Value set iteration for two-person zero-sum Markov games

被引：1

作者：

Chang, Hyeong Soo ^{[1
]}

机构：

[1] Sogang Univ, Dept Comp Sci & Engn, Seoul, South Korea

来源：

AUTOMATICA | 2017年 / 76卷

关键词：

Two-person zero-sum Markov game; Value iteration; Policy iteration; Stochastic game;

D O I：

10.1016/j.automatica.2016.10.010

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present a novel exact algorithm called "value set iteration" (VSI) for solving two-person zero-sum Markov games (MGs) as a generalization of value iteration (VI) and as a general framework of combining multiple solution methods. We introduce a novel operator in the value function space and iteratively apply the operator with any sequence of the set of policies, extending Chang's VSI for MDPs into the MG setting. We show that VSI for MGs converges to the equilibrium value function with at least linear convergence rate and establish that VSI can potentially improve the convergence speed in terms of the number of iterations by proper setting of the sequence of the set of policies. (C) 2016 Elsevier Ltd. All rights reserved.

引用

收藏

页码：61 / 64

页数：4

相关论文

共 43 条

[31] Generic uniqueness of the bias vector of finite zero-sum stochastic games with perfect information [J].

Akian, Marianne ;

Gaubert, Stephane ;

Hochart, Antoine .

JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2018, 457 (02) :1038-1064

[32] Online solution of nonquadratic two-player zero-sum games arising in the H∞ control of constrained input systems [J].

Modares, Hamidreza ;

Lewis, Frank L. ;

Sistani, Mohammad-Bagher Naghibi .

INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2014, 28 (3-5) :232-254

[33] Event-triggered adaptive fuzzy optimal control of modular robot manipulators using zero-sum differential game through value iteration [J].

Dong, Bo ;

Feng, Zhian ;

Cui, Yiming ;

Zhu, Xinye ;

An, Tianjiao .

INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2023, 37 (09) :2364-2379

[34] Adaptive critic designs for discrete-time zero-sum games with application to H∞ control [J].

Al-Tamimi, Asma ;

Abu-Khalaf, Murad ;

Lewis, Frank L. .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2007, 37 (01) :240-247

[35] Adaptive critic control with multi-step policy evaluation for nonlinear zero-sum games [J].

Li, Xin ;

Wang, Ding ;

Wang, Jiangyu ;

Qiao, Junfei .

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2024, 34 (01) :551-566

[36] Zero-sum continuous-time Markov pure jump game over a fixed duration [J].

Guo, Xin ;

Zhang, Yi .

JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2017, 452 (02) :1194-1208

[37] Differential dynamic programming for finite-horizon zero-sum differential games of nonlinear systems [J].

Zhang, Bin ;

Jia, Yingmin ;

Zhang, Yuqi .

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2023, 33 (18) :11062-11084

[38] Integral Reinforcement Learning for Linear Continuous-Time Zero-Sum Games With Completely Unknown Dynamics [J].

Li, Hongliang ;

Liu, Derong ;

Wang, Ding .

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2014, 11 (03) :706-714

[39] Zero-sum risk-sensitive stochastic games with unbounded payoff functions and varying discount factors [J].

Guo, Xin ;

Chen, Jian ;

Li, Zechao .

JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2023, 519 (02)

[40] Policy Iteration Q-Learning for Data-Based Two-Player Zero-Sum Game of Linear Discrete-Time Systems [J].

Luo, Biao ;

Yang, Yin ;

Liu, Derong .

IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (07) :3630-3640

← 1 2 3 4 5 →