Trading performance for stability in Markov decision processes

被引:14
作者
Brazdil, Tomas [1 ]
Chatterjee, Krishnendu [2 ]
Forejt, Vojtech [3 ]
Kucera, Antonin [1 ]
机构
[1] Masaryk Univ, Fac Informat, CS-60177 Brno, Czech Republic
[2] IST Austria, Klosterneuburg, Austria
[3] Univ Oxford, Dept Comp Sci, Oxford OX1 2JD, England
基金
奥地利科学基金会; 英国工程与自然科学研究理事会;
关键词
Markov decision processes; Mean payoff; Stability; Stochastic systems; Controller synthesis; MEAN-VARIANCE TRADEOFFS; ALGORITHMS;
D O I
10.1016/j.jcss.2016.09.009
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We study controller synthesis problems for finite-state Markov decision processes, where the objective is to optimize the expected mean-payoff performance and stability (also known as variability in the literature). We argue that the basic notion of expressing the stability using the statistical variance of the mean payoff is sometimes insufficient, and propose an alternative definition. We show that a strategy ensuring both the expected mean payoff and the variance below given bounds requires randomization and memory, under both the above definitions. We then show that the problem of finding such a strategy can be expressed as a set of constraints. (C) 2016 The Authors. Published by Elsevier Inc.
引用
收藏
页码:144 / 170
页数:27
相关论文
共 28 条
  • [1] Altman E., 1999, STOCH MODEL
  • [2] [Anonymous], 1998, Cambridge Series in Statistical and Probabilistic Mathematics
  • [3] [Anonymous], 2008, Log. Methods Comput. Sci., DOI [DOI 10.2168/LMCS-4(4:8)2008, 10.2168/LMCS-4(4:8)2008]
  • [4] MARKOV DECISION PROCESSES WITH MULTIPLE LONG-RUN AVERAGE OBJECTIVES
    Brazdil, Tomas
    Brozek, Vaclav
    Chatterjee, Krishnendu
    Forejt, Vojtech
    Kucera, Antonin
    [J]. LOGICAL METHODS IN COMPUTER SCIENCE, 2014, 10 (01)
  • [5] Trading Performance for Stability in Markov Decision Processes
    Brazdil, Tomas
    Chatterjee, Krishnendu
    Forejt, Vojtech
    Kucera, Antonin
    [J]. 2013 28TH ANNUAL IEEE/ACM SYMPOSIUM ON LOGIC IN COMPUTER SCIENCE (LICS), 2013, : 331 - 340
  • [6] Canny J., 1988, Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing, P460, DOI 10.1145/62212.62257
  • [7] Chatterjee K, 2006, LECT NOTES COMPUT SC, V3884, P325
  • [8] Chatterjee K., 2012, O N2 TIME ALGORITHM, P1386
  • [9] Efficient and Dynamic Algorithms for Alternating Buchi Games and Maximal End-Component Decomposition
    Chatterjee, Krishnendu
    Henzinger, Monika
    [J]. JOURNAL OF THE ACM, 2014, 61 (03)
  • [10] Symbolic algorithms for qualitative analysis of Markov decision processes with Buchi objectives
    Chatterjee, Krishnendu
    Henzinger, Monika
    Joglekar, Manas
    Shah, Nisarg
    [J]. FORMAL METHODS IN SYSTEM DESIGN, 2013, 42 (03) : 301 - 327