Solving Ergodic Markov Decision Processes and Perfect Information Zero-sum Stochastic Games by Variance Reduced Deflated Value Iteration
被引:0
|
作者:
Akian, Marianne
论文数: 0引用数: 0
h-index: 0
机构:
Ecole Polytech, INRIA, F-91128 Palaiseau, France
Ecole Polytech, CMAP, Route Saclay, F-91128 Palaiseau, FranceEcole Polytech, INRIA, F-91128 Palaiseau, France
Akian, Marianne
[1
,2
]
Gaubert, Stephane
论文数: 0引用数: 0
h-index: 0
机构:
Ecole Polytech, INRIA, F-91128 Palaiseau, France
Ecole Polytech, CMAP, Route Saclay, F-91128 Palaiseau, FranceEcole Polytech, INRIA, F-91128 Palaiseau, France
Gaubert, Stephane
[1
,2
]
Qu, Zheng
论文数: 0引用数: 0
h-index: 0
机构:
Univ Hong Kong, Dept Math, Room 419,Run Run Shaw Bldg,Pokfulam Rd, Hong Kong, Peoples R ChinaEcole Polytech, INRIA, F-91128 Palaiseau, France
Qu, Zheng
[3
]
Saadi, Omar
论文数: 0引用数: 0
h-index: 0
机构:
Ecole Polytech, INRIA, F-91128 Palaiseau, France
Ecole Polytech, CMAP, Route Saclay, F-91128 Palaiseau, FranceEcole Polytech, INRIA, F-91128 Palaiseau, France
Saadi, Omar
[1
,2
]
机构:
[1] Ecole Polytech, INRIA, F-91128 Palaiseau, France
[2] Ecole Polytech, CMAP, Route Saclay, F-91128 Palaiseau, France
[3] Univ Hong Kong, Dept Math, Room 419,Run Run Shaw Bldg,Pokfulam Rd, Hong Kong, Peoples R China
来源:
2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC)
|
2019年
关键词:
D O I:
暂无
中图分类号:
TP [自动化技术、计算机技术];
学科分类号:
0812 ;
摘要:
Recently, Sidford, Wang, Wu and Ye (2018) developed an algorithm combining variance reduction techniques with value iteration to solve discounted Markov decision processes. This algorithm has a sublinear complexity when the discount factor is fixed. Here, we extend this approach to mean-payoff problems, including both Markov decision processes and perfect information zero-sum stochastic games. We obtain sublinear complexity bounds, assuming there is a distinguished state which is accessible from all initial states and for all policies. Our method is based on a reduction from the mean payoff problem to the discounted problem by a Doob h-transform, combined with a deflation technique. The complexity analysis of this algorithm uses at the same time the techniques developed by Sidford et al. in the discounted case and non-linear spectral theory techniques (Collatz-Wielandt characterization of the eigenvalue).