Computation of weighted sums of rewards for concurrent MDPs

被引:20
作者
Buchholz, Peter [1 ]
Scheftelowitsch, Dimitri [1 ]
机构
[1] TU Dortmund, Informat 4, Dortmund, Germany
关键词
Markov decision processes; Optimization; Multi-objective optimization; Non-linear programming; MARKOV DECISION-PROCESSES; OPTIMALITY; SCENARIOS; ALGORITHM;
D O I
10.1007/s00186-018-0653-1
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
We consider sets of Markov decision processes (MDPs) with shared state and action spaces and assume that the individual MDPs in such a set represent different scenarios for a system's operation. In this setting, we solve the problem of finding a single policy that performs well under each of these scenarios by considering the weighted sum of value vectors for each of the scenarios. Several solution approaches as well as the general complexity of the problem are discussed and algorithms that are based on these solution approaches are presented. Finally, we compare the derived algorithms on a set of benchmark problems.
引用
收藏
页码:1 / 42
页数:42
相关论文
共 43 条
[1]  
Amato Christopher S., 2007, P 20 INT JOINT C ART, P2418
[2]  
[Anonymous], 1994, NONNEGATIVE MATRICES, DOI DOI 10.1137/1.9781611971262
[3]  
[Anonymous], 2002, HDB MARKOV DECISION
[4]   Optimal healthcare decision making under multiple mathematical models: application in prostate cancer screening [J].
Bertsimas, Dimitris ;
Silberholz, John ;
Trikalinos, Thomas .
HEALTH CARE MANAGEMENT SCIENCE, 2018, 21 (01) :105-118
[5]   Robust Product Line Design [J].
Bertsimas, Dimitris ;
Misic, Velibor V. .
OPERATIONS RESEARCH, 2017, 65 (01) :19-37
[6]   A combinatorial strongly subexponential strategy improvement algorithm for mean payoff games [J].
Bjorklund, Henrik ;
Vorobyov, Sergei .
DISCRETE APPLIED MATHEMATICS, 2007, 155 (02) :210-229
[7]   Robust control of the multi-armed bandit problem [J].
Caro, Felipe ;
Das Gupta, Aparupa .
ANNALS OF OPERATIONS RESEARCH, 2022, 317 (02) :461-480
[8]   Global optimization of MIQCPs with dynamic piecewise relaxations [J].
Castillo Castillo, Pedro A. ;
Castro, Pedro M. ;
Mahalec, Vladimir .
JOURNAL OF GLOBAL OPTIMIZATION, 2018, 71 (04) :691-716
[9]   Modeling methods and a branch and cut algorithm for pharmaceutical clinical trial planning using stochastic programming [J].
Colvin, Matthew ;
Maravelias, Christos T. .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2010, 203 (01) :205-215
[10]   A PROBABILISTIC PRODUCTION AND INVENTORY PROBLEM [J].
DEPENOUX, F .
MANAGEMENT SCIENCE, 1963, 10 (01) :98-108