Trading performance for stability in Markov decision processes

被引：14

作者：

Brazdil, Tomas ^{[1
]}

Chatterjee, Krishnendu ^{[2
]}

Forejt, Vojtech ^{[3
]}

Kucera, Antonin ^{[1
]}

机构：

[1] Masaryk Univ, Fac Informat, CS-60177 Brno, Czech Republic

[2] IST Austria, Klosterneuburg, Austria

[3] Univ Oxford, Dept Comp Sci, Oxford OX1 2JD, England

来源：

JOURNAL OF COMPUTER AND SYSTEM SCIENCES | 2017年 / 84卷

基金：

奥地利科学基金会; 英国工程与自然科学研究理事会;

关键词：

Markov decision processes; Mean payoff; Stability; Stochastic systems; Controller synthesis; MEAN-VARIANCE TRADEOFFS; ALGORITHMS;

D O I：

10.1016/j.jcss.2016.09.009

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We study controller synthesis problems for finite-state Markov decision processes, where the objective is to optimize the expected mean-payoff performance and stability (also known as variability in the literature). We argue that the basic notion of expressing the stability using the statistical variance of the mean payoff is sometimes insufficient, and propose an alternative definition. We show that a strategy ensuring both the expected mean payoff and the variance below given bounds requires randomization and memory, under both the above definitions. We then show that the problem of finding such a strategy can be expressed as a set of constraints. (C) 2016 The Authors. Published by Elsevier Inc.

引用

页码：144 / 170

页数：27

共 28 条

[1] Altman E., 1999, STOCH MODEL
[2] [Anonymous], 1998, Cambridge Series in Statistical and Probabilistic Mathematics
[3] [Anonymous], 2008, Log. Methods Comput. Sci., DOI [DOI 10.2168/LMCS-4(4:8)2008, 10.2168/LMCS-4(4:8)2008]
[4] MARKOV DECISION PROCESSES WITH MULTIPLE LONG-RUN AVERAGE OBJECTIVES
Brazdil, Tomas
Brozek, Vaclav
Chatterjee, Krishnendu
Forejt, Vojtech
Kucera, Antonin
[J]. LOGICAL METHODS IN COMPUTER SCIENCE, 2014, 10 (01)
[5] Trading Performance for Stability in Markov Decision Processes
Brazdil, Tomas
Chatterjee, Krishnendu
Forejt, Vojtech
Kucera, Antonin
[J]. 2013 28TH ANNUAL IEEE/ACM SYMPOSIUM ON LOGIC IN COMPUTER SCIENCE (LICS), 2013, : 331 - 340
[6] Canny J., 1988, Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing, P460, DOI 10.1145/62212.62257
[7] Chatterjee K, 2006, LECT NOTES COMPUT SC, V3884, P325
[8] Chatterjee K., 2012, O N2 TIME ALGORITHM, P1386
[9] Efficient and Dynamic Algorithms for Alternating Buchi Games and Maximal End-Component Decomposition
Chatterjee, Krishnendu
Henzinger, Monika
[J]. JOURNAL OF THE ACM, 2014, 61 (03)
[10] Symbolic algorithms for qualitative analysis of Markov decision processes with Buchi objectives
Chatterjee, Krishnendu
Henzinger, Monika
Joglekar, Manas
Shah, Nisarg
[J]. FORMAL METHODS IN SYSTEM DESIGN, 2013, 42 (03) : 301 - 327

← 1 2 3 →