Incremental value iteration for time-aggregated Markov-decision processes

被引:23
作者
Sun, Tao [1 ]
Zhao, Qianchuan
Luh, Peter B.
机构
[1] Tsing Hua Univ, Ctr Intelligent & Networked Syst CFINS, Dept Automat, Beijing 100084, Peoples R China
[2] Univ Connecticut, Dept Elect & Comp Engn, Storrs, CT 06269 USA
关键词
fractional cost; Markov-decision processes (MDPs); policy iteration; time aggregation; value iteration;
D O I
10.1109/TAC.2007.908359
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A value iteration algorithm, for time-aggregated Markov-decision processes (MDPs) is developed to solve problems with large state spaces. The algorithm is based on a novel approach which solves a time aggregated MDP by incrementally solving a set of standard MDPs. Therefore, the algorithm converges under the same assumption as standard value iteration. Such assumption is much weaker than that required by the existing time aggregated value iteration algorithm. The algorithms developed in this paper are also applicable to MDPs with fractional costs.
引用
收藏
页码:2177 / 2182
页数:6
相关论文
共 5 条
[1]   A time aggregation approach to Markov decision processes [J].
Cao, XR ;
Ren, ZY ;
Bhatnagar, S ;
Fu, M ;
Marcus, S .
AUTOMATICA, 2002, 38 (06) :929-943
[2]   Joint replacement in an operational planning phase [J].
Dekker, R ;
Wildeman, RE ;
vanEgmond, R .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 1996, 91 (01) :74-88
[3]  
Puterman M.L., 2008, Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics
[4]   Markov decision processes with fractional costs [J].
Ren, ZY ;
Krogh, BH .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2005, 50 (05) :646-650
[5]  
Sutton R. S., 1998, Reinforcement Learning: An Introduction, V22447