Markov decision processes associated with two threshold probability criteria

被引：0

作者：

Masahiko SAKAGUCHI

Yoshio OHTSUBO

机构：

[1] DepartmentofMathematics,FacultyofScience,KochiUniversity

来源：

Journal of Control Theory and Applications | 2013年 / 11卷 / 04期

关键词：

Markov decision process; Minimizing risk model; Threshold probability; Policy space iteration;

D O I：

暂无

中图分类号：

O211.62 [马尔可夫过程];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

This paper deals with Markov decision processes with a target set for nonpositive rewards.Two types of threshold probability criteria are discussed.The frst criterion is a probability that a total reward is not greater than a given initial threshold value,and the second is a probability that the total reward is less than it.Our frst(resp.second)optimizing problem is to minimize the frst(resp.second)threshold probability.These problems suggest that the threshold value is a permissible level of the total reward to reach a goal(the target set),that is,we would reach this set over the level,if possible.For the both problems,we show that 1)the optimal threshold probability is a unique solution to an optimality equation,2)there exists an optimal deterministic stationary policy,and 3)a value iteration and a policy space iteration are given.In addition,we prove that the frst(resp.second)optimal threshold probability is a monotone increasing and right(resp.left)continuous function of the initial threshold value and propose a method to obtain an optimal policy and the optimal threshold probability in the frst problem by using them in the second problem.

引用

页码：548 / 557

页数：10

共 11 条

[1] Two Characterizations of Optimality in Dynamic Programming [J].

Karatzas, Ioannis ;

Sudderth, William D. .

APPLIED MATHEMATICS AND OPTIMIZATION, 2010, 61 (03) :421-434

[2]

Stochastic shortest path problems with associative accumulative criteria[J] . Yoshio Ohtsubo. Applied Mathematics and Computation . 2007 (1)

[3] Optimal Routing for Maximizing the Travel Time Reliability [J].

Yueyue Fan ;

Yu Nie .

Networks and Spatial Economics, 2006, 6 :333-344

[4] Arriving on time [J].

Fan, YY ;

Kalaba, RE ;

Moore, JE .

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2005, 127 (03) :497-513

[5] Equivalence classes for optimizing risk models in Markov decision processes [J].

Ohtsubo, Y ;

Toyonaga, K .

MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2004, 60 (02) :239-250

[6]

Optimal threshold probability in undiscounted Markov decision processes with a target set[J] . Yoshio Ohtsubo. Applied Mathematics and Computation . 2003 (2)

[7] Minimizing risk models in stochastic shortest path problems [J].

Ohtsubo, Y .

MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2003, 57 (01) :79-88

[8] Optimal policy for minimizing risk models in Markov decision processes [J].

Ohtsubo, Y ;

Toyonaga, K .

JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2002, 271 (01) :66-81

[9] Minimizing risk models in Markov decision processes with policies depending on target values [J].

Wu, CB ;

Lin, YL .

JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1999, 231 (01) :47-67

[10] NEGATIVE DYNAMIC PROGRAMMING [J].

STRAUCH, RE .

ANNALS OF MATHEMATICAL STATISTICS, 1966, 37 (04) :871-&

← 1 2 →