Generalized Second-Order Value Iteration in Markov Decision Processes

被引:5
作者
Kamanchi, Chandramouli [1 ]
Diddigi, Raghuram Bharadwaj [1 ]
Bhatnagar, Shalabh [1 ]
机构
[1] Indian Inst Sci, Dept Comp Sci & Automat, Bengaluru 560012, India
关键词
Mathematical model; Convergence; Newton method; Standards; Approximation algorithms; Markov processes; Entropy; Markov decision process; value iteration; successive relaxation;
D O I
10.1109/TAC.2021.3112851
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Value iteration is a fixed point iteration technique utilized to obtain the optimal value function and policy in a discounted reward Markov decision process (MDP). Here, a contraction operator is constructed and applied repeatedly to arrive at the optimal solution. Value iteration is a first-order method and, therefore, it may take a large number of iterations to converge to the optimal solution. Successive relaxation is a popular technique that can be applied to solve a fixed point equation. It has been shown in the literature that under a special structure of the MDP, successive overrelaxation technique computes the optimal value function faster than standard value iteration. In this article, we propose a second-order value iteration procedure that is obtained by applying the Newton-Raphson method to the successive relaxation value iteration scheme. We prove the global convergence of our algorithm to the optimal solution asymptotically and show the second-order convergence. Through experiments, we demonstrate the effectiveness of our proposed approach.
引用
收藏
页码:4241 / 4247
页数:7
相关论文
共 19 条
[1]  
[Anonymous], 2019, GITHUB PYTHON MDPTOO
[2]   DYNAMIC PROGRAMMING [J].
BELLMAN, R .
SCIENCE, 1966, 153 (3731) :34-&
[3]  
Bertsekas D.P., 1996, NEURO DYNAMIC PROGRA, V3
[4]  
Borkar V.S., 2008, STOCHASTIC APPROXIMA
[5]  
Dai B, 2018, PR MACH LEARN RES, V80
[6]  
Devraj A. M., 2017, Advances in Neural Information Processing Systems (NeurIPS), P2235
[7]  
Furmston T, 2016, J MACH LEARN RES, V17
[8]  
Goyal V, 2021, Arxiv, DOI arXiv:1905.09963
[9]  
Haarnoja T, 2017, PR MACH LEARN RES, V70
[10]  
Haarnoja T, 2018, PR MACH LEARN RES, V80