The expected total cost criterion for markov decision processes under constraints

被引:0
作者
机构
[1] Team CQFD, INRIA Bordeaux Sud-Ouest, 33405 Talence cedex
[2] Université Bordeaux, IMB, INRIA Bordeaux Sud-Ouest
[3] Department of Mathematical Sciences, University of Liverpool, Liverpool
来源
Dufour, F. (dufour@math.u-bordeaux1.fr) | 1600年 / Applied Probability Trust卷 / 45期
基金
英国工程与自然科学研究理事会;
关键词
Constraints; Expected total cost criterion; Linear programming; Markov decision process; Occupation measure;
D O I
10.1239/aap/1377868541
中图分类号
学科分类号
摘要
In this work, we study discrete-time Markov decision processes (MDPs) with constraints when all the objectives have the same form of expected total cost over the infinite time horizon. Our objective is to analyze this problem by using the linear programming approach. Under some technical hypotheses, it is shown that if there exists an optimal solution for the associated linear program then there exists a randomized stationary policy which is optimal for the MDP, and that the optimal value of the linear program coincides with the optimal value of the constrained control problem. A second important result states that the set of randomized stationary policies provides a sufficient set for solving this MDP. It is important to note that, in contrast with the classical results of the literature, we do not assume the MDP to be transient or absorbing. More importantly, we do not impose the cost functions to be nonnegative or to be bounded below. Several examples are presented to illustrate our results. © ?Applied Probability Trust 2013.
引用
收藏
页码:837 / 859
页数:22
相关论文
共 15 条
  • [1] Altman E., Constrained Markov Decision Processes, (1999)
  • [2] Bauerle N., Rieder U., Markov Decision Processes with Applications to Finance, (2011)
  • [3] Bertsekas D.P., Shreve S.E., Stochastic optimal control, Math Sci. Eng, 139, (1978)
  • [4] Borkar V.S., Topics in controlled markov chains, Pitman Res Notes Math. Ser, 240, (1991)
  • [5] Borkar V.S., Handbook of markov decision processes, Internat. Ser. Operat. Res. Manag. Sci, 40, pp. 347-375, (2002)
  • [6] Dufour F., Piunovskiy A.B., Multiobjective stopping problem for discrete-time Markov processes: Convex analytic approach, J. Appl. Prob, 47, pp. 947-966, (2010)
  • [7] Dufour F., Horiguchi M., Piunovskiy A.B., The expected total cost criterion for Markov decision processes under constraints: A convex analytic approach, Adv. Appl. Prob, 44, pp. 774-793, (2012)
  • [8] Filar J., Vrieze K., Competitive Markov Decision Processes, (1997)
  • [9] Hernandez-Lerma O., Lasserre J.B., Discrete-time markov control processes, Appl Math, 30, (1996)
  • [10] Hernandez-Lerma O., Lasserre J.B., Further topics on discrete-time markov control processes, Appl Math, 42, (1999)