Quantile Markov Decision Processes

被引:6
作者
Li, Xiaocheng [1 ]
Zhong, Huaiyang [1 ]
Brandeau, Margaret L. [1 ]
机构
[1] Stanford Univ, Dept Management Sci & Engn, Stanford, CA 94305 USA
关键词
Markov decision process; dynamic programming; quantile; risk measure; medical decision making; VALUE-AT-RISK; TIME; APPROXIMATIONS; REGRESSION; THERAPY;
D O I
10.1287/opre.2021.2123
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly infinite). In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward instead of its expectation. In this paper, we consider the problem of optimizing the quantiles of the cumulative rewards of an MDP, which we refer to as a quantile Markov decision process (QMDP). We provide analytical results characterizing the optimal QMDP value function and present a dynamic programming-based algorithm to solve for the optimal policy. The algorithm also extends to the MDP problem with a conditional value-at-risk objective. We illustrate the practical relevance of our model by evaluating it on an HIV treatment initiation problem, in which patients aim to balance the potential benefits and risks of the treatment.
引用
收藏
页码:1428 / 1447
页数:21
相关论文
共 47 条
  • [1] Altman E., 1999, STOCH MODEL SER, V1st, DOI 10.1201/9781315140223
  • [2] [Anonymous], 1989, PROBAB ENG INF SCI, V3, P247
  • [3] Markov Decision Problems Where Means Bound Variances
    Arlotto, Alessandro
    Gans, Noah
    Steele, J. Michael
    [J]. OPERATIONS RESEARCH, 2014, 62 (04) : 864 - 875
  • [4] The use of quantile regression in health care research: a case study examining gender differences in the timeliness of thrombolytic therapy
    Austin, PC
    Tu, JV
    Daly, PA
    Alter, DA
    [J]. STATISTICS IN MEDICINE, 2005, 24 (05) : 791 - 816
  • [5] Markov Decision Processes with Average-Value-at-Risk criteria
    Baeuerle, Nicole
    Ott, Jonathan
    [J]. MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2011, 74 (03) : 361 - 379
  • [6] Bellemare MG, 2017, PR MACH LEARN RES, V70
  • [7] How accurate are value-at-risk models at commercial banks?
    Berkowitz, J
    O'Brien, J
    [J]. JOURNAL OF FINANCE, 2002, 57 (03) : 1093 - 1111
  • [8] Bertsekas D., 2012, Dynamic programming and optimal control, V1
  • [9] QUANTILE REGRESSION-OPPORTUNITIES AND CHALLENGES FROM A USER'S PERSPECTIVE
    Beyerlein, Andreas
    [J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 2014, 180 (03) : 330 - 331
  • [10] Carpin S, 2016, IEEE INT CONF ROBOT, P335, DOI 10.1109/ICRA.2016.7487152