EXTREME OCCUPATION MEASURES IN MARKOV DECISION PROCESSES WITH AN ABSORBING STATE

被引:1
|
作者
Piunovskiy, Alexey [1 ]
Zhang, Yi [2 ]
机构
[1] Univ Liverpool, Dept Math Sci, Liverpool L69 7ZL, England
[2] Univ Birmingham, Sch Math, Birmingham B15 2TT, England
基金
英国工程与自然科学研究理事会;
关键词
Markov decision process; total cost; occupation measure; mathematical program- ming; extreme point; mixture; GRADUAL-IMPULSIVE CONTROL; TOTAL-COST CRITERION; PROGRAMMING APPROACH; POLICIES; CONSTRAINTS; OPTIMALITY;
D O I
10.1137/23M1572398
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we consider a Markov decision process (MDP) with a Borel state space X \cup {\Delta }, where \Delta is an absorbing state (cemetery), and a Borel action space A. We consider the space of finite occupation measures restricted on X \times A and the extreme points in it. It is possible that some strategies have infinite occupation measures. Nevertheless, we prove that every finite extreme occupation measure is generated by a deterministic stationary strategy. Then, for this MDP, we consider a constrained problem with total undiscounted criteria and J constraints, where the cost functions are nonnegative. By assumption, the strategies inducing infinite occupation measures are not optimal. Then our second main result is that, under mild conditions, the solution to this constrained MDP is given by a mixture of no more than J + 1 occupation measures generated by deterministic stationary strategies.
引用
收藏
页码:65 / 90
页数:26
相关论文
共 50 条