Markov decision processes associated with two threshold probability criteria

被引:0
作者
Masahiko SAKAGUCHI
Yoshio OHTSUBO
机构
[1] DepartmentofMathematics,FacultyofScience,KochiUniversity
关键词
Markov decision process; Minimizing risk model; Threshold probability; Policy space iteration;
D O I
暂无
中图分类号
O211.62 [马尔可夫过程];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This paper deals with Markov decision processes with a target set for nonpositive rewards.Two types of threshold probability criteria are discussed.The frst criterion is a probability that a total reward is not greater than a given initial threshold value,and the second is a probability that the total reward is less than it.Our frst(resp.second)optimizing problem is to minimize the frst(resp.second)threshold probability.These problems suggest that the threshold value is a permissible level of the total reward to reach a goal(the target set),that is,we would reach this set over the level,if possible.For the both problems,we show that 1)the optimal threshold probability is a unique solution to an optimality equation,2)there exists an optimal deterministic stationary policy,and 3)a value iteration and a policy space iteration are given.In addition,we prove that the frst(resp.second)optimal threshold probability is a monotone increasing and right(resp.left)continuous function of the initial threshold value and propose a method to obtain an optimal policy and the optimal threshold probability in the frst problem by using them in the second problem.
引用
收藏
页码:548 / 557
页数:10
相关论文
共 11 条
[1]   Two Characterizations of Optimality in Dynamic Programming [J].
Karatzas, Ioannis ;
Sudderth, William D. .
APPLIED MATHEMATICS AND OPTIMIZATION, 2010, 61 (03) :421-434
[2]  
Stochastic shortest path problems with associative accumulative criteria[J] . Yoshio Ohtsubo. Applied Mathematics and Computation . 2007 (1)
[3]   Optimal Routing for Maximizing the Travel Time Reliability [J].
Yueyue Fan ;
Yu Nie .
Networks and Spatial Economics, 2006, 6 :333-344
[4]   Arriving on time [J].
Fan, YY ;
Kalaba, RE ;
Moore, JE .
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2005, 127 (03) :497-513
[5]   Equivalence classes for optimizing risk models in Markov decision processes [J].
Ohtsubo, Y ;
Toyonaga, K .
MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2004, 60 (02) :239-250
[6]  
Optimal threshold probability in undiscounted Markov decision processes with a target set[J] . Yoshio Ohtsubo. Applied Mathematics and Computation . 2003 (2)
[7]   Minimizing risk models in stochastic shortest path problems [J].
Ohtsubo, Y .
MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2003, 57 (01) :79-88
[8]   Optimal policy for minimizing risk models in Markov decision processes [J].
Ohtsubo, Y ;
Toyonaga, K .
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2002, 271 (01) :66-81
[9]   Minimizing risk models in Markov decision processes with policies depending on target values [J].
Wu, CB ;
Lin, YL .
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1999, 231 (01) :47-67
[10]   NEGATIVE DYNAMIC PROGRAMMING [J].
STRAUCH, RE .
ANNALS OF MATHEMATICAL STATISTICS, 1966, 37 (04) :871-&