Computing semi-stationary optimal policies for multichain semi-Markov decision processes

被引:1
作者
Mondal, Prasenjit [1 ]
机构
[1] Govt Gen Degree Coll, Dept Math, Ranibandh 722135, Bankura, India
关键词
Semi-Markov decision processes; Limiting ratio average reward; Multichain structure; Pure optimal semi-stationary policies; Linear programming;
D O I
10.1007/s10479-017-2686-x
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
We consider semi-Markov decision processes with finite state and action spaces and a general multichain structure. A form of limiting ratio average (undiscounted) reward is the criterion for comparing different policies. The main result is that the value vector and a pure optimal semi-stationary policy (i.e., a policy which depends only on the initial state and the current state) for such an SMDP can be computed directly from an optimal solution of a finite set (whose cardinality equals the number of states) of linear programming (LP) problems. To be more precise, we prove that the single LP associated with a fixed initial state provides the value and an optimal pure stationary policy of the corresponding SMDP. The relation between the set of feasible solutions of each LP and the set of stationary policies is also analyzed. Examples are worked out to describe the algorithm.
引用
收藏
页码:843 / 865
页数:23
相关论文
共 32 条
[1]  
Baykal-Gursoy M, 2010, WILEY ENCY OPERATION, DOI [10.1002/9780470400531.eorms0757., DOI 10.1002/9780470400531.E0RMS0757]
[2]   A MARKOVIAN DECISION PROCESS [J].
BELLMAN, R .
JOURNAL OF MATHEMATICS AND MECHANICS, 1957, 6 (05) :679-684
[3]  
Bewley T., 1978, Mathematics of Operations Research, V3, P104, DOI 10.1287/moor.3.2.104
[4]   DISCRETE DYNAMIC-PROGRAMMING [J].
BLACKWELL, D .
ANNALS OF MATHEMATICAL STATISTICS, 1962, 33 (02) :719-&
[5]   Optimization for condition-based maintenance with semi-Markov decision process [J].
Chen, DY ;
Trivedi, KS .
RELIABILITY ENGINEERING & SYSTEM SAFETY, 2005, 90 (01) :25-29
[6]   MULTICHAIN MARKOV RENEWAL PROGRAMS [J].
DENARDO, EV ;
FOX, BL .
SIAM JOURNAL ON APPLIED MATHEMATICS, 1968, 16 (03) :468-&
[7]   ON SEQUENTIAL DECISIONS AND MARKOV-CHAINS [J].
DERMAN, C .
MANAGEMENT SCIENCE, 1962, 9 (01) :16-24
[8]   ON SEQUENTIAL CONTROL PROCESSES [J].
DERMAN, C .
ANNALS OF MATHEMATICAL STATISTICS, 1964, 35 (01) :341-&
[9]  
Doob JL, 1953, STOCHASTIC PROCESSES, P52369
[10]   NOTE ON SIMULTANEOUS RECURRENCE CONDITIONS ON A SET OF DENUMERABLE STOCHASTIC MATRICES [J].
FEDERGRUEN, A ;
HORDIJK, A ;
TIJMS, HC .
JOURNAL OF APPLIED PROBABILITY, 1978, 15 (04) :842-847