Constrained discounted dynamic programming

被引:61
作者
Feinberg, EA [1 ]
Shwartz, A [1 ]
机构
[1] TECHNION ISRAEL INST TECHNOL,DEPT ELECT ENGN,IL-32000 HAIFA,ISRAEL
关键词
dynamic programming; discounted Markov decision processes; additional constraints;
D O I
10.1287/moor.21.4.922
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
This paper deals with constrained optimization of Markov Decision Processes with a countable state space, compact action sets, continuous transition probabilities, and upper semicontinuous reward functions. The objective is to maximize the expected total discounted reward for one reward function, under several inequality constraints on similar criteria with other reward functions. Suppose a feasible policy exists for a problem with M constraints. We prove two results on the existence and structure of optimal policies. First, we show that there exists a randomized stationary optimal policy which requires at most M actions more than a nonrandomized stationary one. This result is known for several particular cases. Second, we prove that there exists an optimal policy which is (i) stationary (nonrandomized) from some step onward, (ii) randomized Markov before this step, but the total number of actions which are added by randomization is at most M, (iii) the total number of actions that are added by nonstationarity is at most M. We also establish Pareto optimality of policies from the two classes described above for multi-criteria problems. We describe an algorithm to compute optimal policies with properties (i)-(iii) for constrained problems. The policies that satisfy properties (i)-(iii) have the pleasing aesthetic property that the amount of randomization they require over any trajectory is restricted by the number of constraints. In contrast, a randomized stationary policy may require an infinite number of randomizations over time.
引用
收藏
页码:922 / 945
页数:24
相关论文
共 34 条
[1]  
Altman E., 1993, ZOR, Methods and Models of Operations Research, V37, P151, DOI 10.1007/BF01414154
[3]  
ALTMAN E, 1991, SIAM J CONTROL OPTIM, V27, P786
[4]  
[Anonymous], MATH CTR TRACTS
[5]  
[Anonymous], 1950, STAT DECISION FUNCTI
[6]  
[Anonymous], 1970, CONVEXITY OPTIMIZATI
[8]  
BLACKWELL D, 1954, THEORY GAMES STATIST
[9]   A CONVEX ANALYTIC APPROACH TO MARKOV DECISION-PROCESSES [J].
BORKAR, VS .
PROBABILITY THEORY AND RELATED FIELDS, 1988, 78 (04) :583-602
[10]   SOME REMARKS ON FINITE HORIZON MARKOVIAN DECISION MODELS [J].
DERMAN, C ;
KLEIN, M .
OPERATIONS RESEARCH, 1965, 13 (02) :272-&