Constrained Markov decision processes with first passage criteria

被引：10

作者：

Huang, Yonghui ^{[1
]}

Wei, Qingda ^{[1
]}

Guo, Xianping ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Math & Computat Sci, Guangzhou 510275, Guangdong, Peoples R China

来源：

ANNALS OF OPERATIONS RESEARCH | 2013年 / 206卷 / 01期

关键词：

Markov decision processes; Target set; First passage time; Expected first passage reward/cost; Constrained optimal policy; BOREL SPACES; TIME; MODELS; RATES;

D O I：

10.1007/s10479-012-1292-1

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

This paper deals with constrained Markov decision processes (MDPs) with first passage criteria. The objective is to maximize the expected reward obtained during a first passage time to some target set, and a constraint is imposed on the associated expected cost over this first passage time. The state space is denumerable, and the rewards/costs are possibly unbounded. In addition, the discount factor is state-action dependent and is allowed to be equal to one. We develop suitable conditions for the existence of a constrained optimal policy, which are generalizations of those for constrained MDPs with the standard discount criteria. Moreover, it is revealed that the constrained optimal policy randomizes between two stationary policies differing in at most one state. Finally, we use a controlled queueing system to illustrate our results, which exhibits some advantage of our optimality conditions.

引用

页码：197 / 219

页数：23

共 29 条

[1] Convergence of the optimal values of constrained Markov control processes [J].

Alvarez-Mena, J ;

Hernández-Lerma, O .

MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2002, 55 (03) :461-484

[2]

[Anonymous], MATH SCI ENG

[3] The effects of different inflation risk premiums on interest rate spreads [J].

Berument, H ;

Kilinc, Z ;

Ozlale, U .

PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2004, 333 (1-4) :317-324

[4] OPTIMAL POLICIES FOR CONTROLLED MARKOV-CHAINS WITH A CONSTRAINT [J].

BEUTLER, FJ ;

ROSS, KW .

JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1985, 112 (01) :236-252

[5] An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes [J].

Bhatnagar, Shalabh .

SYSTEMS & CONTROL LETTERS, 2010, 59 (12) :760-766

[6] Stochastic target hitting time and the problem of early retirement [J].

Boda, K ;

Filar, JA ;

Lin, YL ;

Spanjers, L .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2004, 49 (03) :409-419

[7]

Guo X. P., 2000, ACTA MATH APPL SIN-E, V16, P205

[8]

Guo XP, 2009, STOCH MOD APPL PROBA, V62, P1, DOI 10.1007/978-3-642-02547-1_1

[9] Constrained continuous-time Markov control processes with discounted criteria [J].

Guo, XP ;

Hernández-Lerma, S .

STOCHASTIC ANALYSIS AND APPLICATIONS, 2003, 21 (02) :379-399

[10] Optimal pension funding dynamics over infinite control horizon when stochastic rates of return are stationary [J].

Haberman, S ;

Sung, JH .

INSURANCE MATHEMATICS & ECONOMICS, 2005, 36 (01) :103-116

← 1 2 3 →