SUFFICIENCY OF DETERMINISTIC POLICIES FOR ATOMLESS DISCOUNTED AND UNIFORMLY ABSORBING MDPs WITH MULTIPLE CRITERIA

被引:11
作者
Feinberg, Eugene A. [1 ]
Piunovskiy, Alexey [2 ]
机构
[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
[2] Univ Liverpool, Dept Math Sci, Liverpool L69 7ZL, Merseyside, England
基金
美国国家科学基金会;
关键词
atomless; discounted; Markov decision process; deterministic policy; convex; compact; MARKOV DECISION-PROCESSES; WALD-WOLFOWITZ THEOREM; COMPACTNESS; SPACE;
D O I
10.1137/18M1194924
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper studies Markov decision processes (MDPs) with atomless initial state distributions and atomless transition probabilities. Such MDPs are called atomless. The initial state distribution is considered to be fixed. We show that for discounted MDPs with bounded one-step reward vector-functions, for each policy there exists a deterministic (that is, nonrandomized and stationary) policy with the same performance vector. This fact is proved in the paper for a more general class of uniformly absorbing MDPs with expected total rewards, and then it is extended under certain assumptions to MDPs with unbounded rewards. For problems with multiple criteria and constraints, the results of this paper imply that for atomless MDPs studied in this paper it is sufficient to consider only deterministic policies, while without the atomless assumption it is wellknown that randomized policies can outperform deterministic ones. We also provide an example of an MDP demonstrating that if a vector measure is defined on a standard Borel space, then Lyapunov's convexity theorem is a special case of the described results.
引用
收藏
页码:163 / 191
页数:29
相关论文
共 33 条
[1]  
[Anonymous], 1999, STOCH MODEL SER, DOI 10.1201/9781315140223
[2]  
[Anonymous], 2002, Internat. Ser. Oper. Res. Management Sci.
[4]  
Bertsekas DP, 1996, Stochastic Optimal Control: The Discrete-Time Case
[5]   ON A THEOREM OF LYAPUNOV [J].
BLACKWELL, D .
ANNALS OF MATHEMATICAL STATISTICS, 1951, 22 (01) :112-118
[6]  
BOGACHEV V. I., 2007, MEASURE THEORY, VI
[7]  
Borkar V. S., 1988, PROBAB THEORY REL, V79, P642
[8]   ELIMINATION OF RANDOMIZATION IN CERTAIN PROBLEMS OF STATISTICS AND OF THE THEORY OF GAMES [J].
DVORETZKY, A ;
WALD, A ;
WOLFOWITZ, J .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1950, 36 (04) :256-260
[9]   ELIMINATION OF RANDOMIZATION IN CERTAIN STATISTICAL DECISION PROCEDURES AND ZERO-SUM 2-PERSON GAMES [J].
DVORETZKY, A ;
WALD, A ;
WOLFOWITZ, J .
ANNALS OF MATHEMATICAL STATISTICS, 1951, 22 (01) :1-21
[10]  
Dynkin E. B., 1979, Controlled Markov Processes