Computation of weighted sums of rewards for concurrent MDPs

被引：20

作者：

Buchholz, Peter ^{[1
]}

Scheftelowitsch, Dimitri ^{[1
]}

机构：

[1] TU Dortmund, Informat 4, Dortmund, Germany

来源：

MATHEMATICAL METHODS OF OPERATIONS RESEARCH | 2019年 / 89卷 / 01期

关键词：

Markov decision processes; Optimization; Multi-objective optimization; Non-linear programming; MARKOV DECISION-PROCESSES; OPTIMALITY; SCENARIOS; ALGORITHM;

D O I：

10.1007/s00186-018-0653-1

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

We consider sets of Markov decision processes (MDPs) with shared state and action spaces and assume that the individual MDPs in such a set represent different scenarios for a system's operation. In this setting, we solve the problem of finding a single policy that performs well under each of these scenarios by considering the weighted sum of value vectors for each of the scenarios. Several solution approaches as well as the general complexity of the problem are discussed and algorithms that are based on these solution approaches are presented. Finally, we compare the derived algorithms on a set of benchmark problems.

引用

页码：1 / 42

页数：42

共 43 条

[1]

Amato Christopher S., 2007, P 20 INT JOINT C ART, P2418

[2]

[Anonymous], 1994, NONNEGATIVE MATRICES, DOI DOI 10.1137/1.9781611971262

[3]

[Anonymous], 2002, HDB MARKOV DECISION

[4] Optimal healthcare decision making under multiple mathematical models: application in prostate cancer screening [J].