Optimality of Quasi-Open-Loop Policies for Discounted Semi-Markov Decision Processes

被引：3

作者：

Adelman, Daniel ^{[1
]}

Mancini, Angelo J. ^{[1
]}

机构：

[1] Univ Chicago, Booth Sch Business, Chicago, IL 60637 USA

来源：

MATHEMATICS OF OPERATIONS RESEARCH | 2016年 / 41卷 / 04期

关键词：

semi-Markov decision processes; open-loop policies; optimal policies; semi-regenerative decision processes; OPTIMIZATION; EXISTENCE; ALGORITHM;

D O I：

10.1287/moor.2015.0775

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

Quasi-open-loop policies consist of sequences of Markovian decision rules that are insensitive to one component of the state space. Given a semi-Markov decision process (SMDP), we distinguish between exogenous and endogenous state components as follows: (i) the decision-maker's actions do not impact the evolution of an exogenous state component, and (ii) between consecutive decision epochs, the exogenous and endogenous state components are conditionally independent given the decision-maker's latest action. For simplicity, we consider an SMDP with one exogenous and one endogenous state component. When transition times between epochs are conditionally independent of the exogenous state given the most recent action, and the exogenous component is a multiplicative compound Poisson process, we provide an almost-everywhere condition on the reward function sufficient for the optimality of a quasi-open-loop policy. After adjusting the discount factor to account for the statistical properties of the exogenous state process, obtaining this policy amounts to solving a reduced SMDP in which the exogenous state is static. Depending on the relationship between the structure of the exogenous state process and the shape of the reward function, we can replace the almost-everywhere condition with one that applies only in expectation. Quasi-open-loop optimality holds even if the times between decision epochs depend on the Poisson process underlying the exogenous state component, and/or the Poisson process is replaced with a generic counting process.

引用

页码：1222 / 1247

页数：26

共 28 条

[1]

Adelman D, 2015, WORKING PAPER

[2]

[Anonymous], 1995, Controlled Queueing Systems

[3]

[Anonymous], 1999, CONVERGE PROBAB MEAS

[4]

Ash R., 1972, Real Analysis and Probability: Probability and Mathematical Statistics: a Series of Monographs and Textbooks, DOI DOI 10.1016/C2013-0-06164-6

[5] CAPACITY EXPANSION UNDER STOCHASTIC DEMANDS [J].

BEAN, JC ;

HIGLE, JL ;

SMITH, RL .

OPERATIONS RESEARCH, 1992, 40 :S210-S216

[6] CONTROLLED SEMI-MARKOV MODELS UNDER LONG-RUN AVERAGE REWARDS [J].

BHATTACHARYA, RN ;

MAJUMDAR, M .

JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 1989, 22 (02) :223-242

[7]

Cinlar E., 1975, Introduction to Stochastic Processes

[8] ON THE EXISTENCE OF AVERAGE OPTIMAL POLICIES IN SEMIREGENERATIVE DECISION-MODELS [J].

DEPPE, H .

MATHEMATICS OF OPERATIONS RESEARCH, 1984, 9 (04) :558-575

[9] A multiproduct dynamic pricing problem and its applications to network yield management [J].

Gallego, G ;

VanRyzin, G .

OPERATIONS RESEARCH, 1997, 45 (01) :24-41

[10]

Graves L. M., 1956, The theory of functions of real variables

← 1 2 3 →