Solving Ergodic Markov Decision Processes and Perfect Information Zero-sum Stochastic Games by Variance Reduced Deflated Value Iteration

被引：0

作者：

Akian, Marianne ^{[1
,2
]}

Gaubert, Stephane ^{[1
,2
]}

Qu, Zheng ^{[3
]}

Saadi, Omar ^{[1
,2
]}

机构：

[1] Ecole Polytech, INRIA, F-91128 Palaiseau, France

[2] Ecole Polytech, CMAP, Route Saclay, F-91128 Palaiseau, France

[3] Univ Hong Kong, Dept Math, Room 419,Run Run Shaw Bldg,Pokfulam Rd, Hong Kong, Peoples R China

来源：

2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC) | 2019年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, Sidford, Wang, Wu and Ye (2018) developed an algorithm combining variance reduction techniques with value iteration to solve discounted Markov decision processes. This algorithm has a sublinear complexity when the discount factor is fixed. Here, we extend this approach to mean-payoff problems, including both Markov decision processes and perfect information zero-sum stochastic games. We obtain sublinear complexity bounds, assuming there is a distinguished state which is accessible from all initial states and for all policies. Our method is based on a reduction from the mean payoff problem to the discounted problem by a Doob h-transform, combined with a deflation technique. The complexity analysis of this algorithm uses at the same time the techniques developed by Sidford et al. in the discounted case and non-linear spectral theory techniques (Collatz-Wielandt characterization of the eigenvalue).

引用

页码：5963 / 5970

页数：8

共 50 条

[21] Information Structures and Values in Zero-Sum Stochastic Games
Nayyar, Ashutosh
Gupta, Abhishek
2017 AMERICAN CONTROL CONFERENCE (ACC), 2017, : 3658 - 3663
[22] On-policy and Off-policy Value Iteration Algorithms for Stochastic Zero-Sum Games
Guo, Liangyuan
Wang, Bing-Chang
Sun, Bo
2024 14TH ASIAN CONTROL CONFERENCE, ASCC 2024, 2024, : 1773 - 1777
[23] A policy-improvement type algorithm for solving zero-sum two-person stochastic games of perfect information
T.E.S. Raghavan
Zamir Syed
Mathematical Programming, 2003, 95 : 513 - 532
[24] A policy-improvement type algorithm for solving zero-sum two-person stochastic games of perfect information
Raghavan, TES
Syed, Z
MATHEMATICAL PROGRAMMING, 2003, 95 (03) : 513 - 532
[25] A policy iteration algorithm for zero-sum stochastic games with mean payoff
Cochet-Terrasson, Jean
Gaubert, Stephane
COMPTES RENDUS MATHEMATIQUE, 2006, 343 (05) : 377 - 382
[26] Perfect information two-person zero-sum markov games with imprecise transition probabilities
Hyeong Soo Chang
Mathematical Methods of Operations Research, 2006, 64 : 335 - 351
[27] Perfect information two-person zero-sum markov games with imprecise transition probabilities
Chang, Hyeong Soo
MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2006, 64 (02) : 335 - 351
[28] Decomposition Techniques for Markov Zero-sum Games with Nested Information
Zheng, Jiefu
Castanon, David A.
2013 IEEE 52ND ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2013, : 574 - 581
[29] Accelerated Value Iteration for Nonlinear Zero-Sum Games with Convergence Guarantee
Yuan Wang
Mingming Zhao
Nan Liu
Ding Wang
Guidance,Navigation and Control, 2024, (01) : 125 - 152
[30] Accelerated Value Iteration for Nonlinear Zero-Sum Games with Convergence Guarantee
Wang, Yuan
Zhao, Mingming
Liu, Nan
Wang, Ding
GUIDANCE NAVIGATION AND CONTROL, 2024, 04 (01)

← 1 2 3 4 5 →