Quantile Markov Decision Processes

被引：6

作者：

Li, Xiaocheng ^{[1
]}

Zhong, Huaiyang ^{[1
]}

Brandeau, Margaret L. ^{[1
]}

机构：

[1] Stanford Univ, Dept Management Sci & Engn, Stanford, CA 94305 USA

来源：

OPERATIONS RESEARCH | 2021年 / 70卷 / 03期

关键词：

Markov decision process; dynamic programming; quantile; risk measure; medical decision making; VALUE-AT-RISK; TIME; APPROXIMATIONS; REGRESSION; THERAPY;

D O I：

10.1287/opre.2021.2123

中图分类号：

C93 [管理学];

学科分类号：

12 ; 1201 ; 1202 ; 120202 ;

摘要：

The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly infinite). In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward instead of its expectation. In this paper, we consider the problem of optimizing the quantiles of the cumulative rewards of an MDP, which we refer to as a quantile Markov decision process (QMDP). We provide analytical results characterizing the optimal QMDP value function and present a dynamic programming-based algorithm to solve for the optimal policy. The algorithm also extends to the MDP problem with a conditional value-at-risk objective. We illustrate the practical relevance of our model by evaluating it on an HIV treatment initiation problem, in which patients aim to balance the potential benefits and risks of the treatment.

引用

页码：1428 / 1447

页数：21

共 47 条

[1] Altman E., 1999, STOCH MODEL SER, V1st, DOI 10.1201/9781315140223
[2] [Anonymous], 1989, PROBAB ENG INF SCI, V3, P247
[3] Markov Decision Problems Where Means Bound Variances
Arlotto, Alessandro
Gans, Noah
Steele, J. Michael
[J]. OPERATIONS RESEARCH, 2014, 62 (04) : 864 - 875
[4] The use of quantile regression in health care research: a case study examining gender differences in the timeliness of thrombolytic therapy
Austin, PC
Tu, JV
Daly, PA
Alter, DA
[J]. STATISTICS IN MEDICINE, 2005, 24 (05) : 791 - 816
[5] Markov Decision Processes with Average-Value-at-Risk criteria
Baeuerle, Nicole
Ott, Jonathan
[J]. MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2011, 74 (03) : 361 - 379
[6] Bellemare MG, 2017, PR MACH LEARN RES, V70
[7] How accurate are value-at-risk models at commercial banks?
Berkowitz, J
O'Brien, J
[J]. JOURNAL OF FINANCE, 2002, 57 (03) : 1093 - 1111
[8] Bertsekas D., 2012, Dynamic programming and optimal control, V1
[9] QUANTILE REGRESSION-OPPORTUNITIES AND CHALLENGES FROM A USER'S PERSPECTIVE
Beyerlein, Andreas
[J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 2014, 180 (03) : 330 - 331
[10] Carpin S, 2016, IEEE INT CONF ROBOT, P335, DOI 10.1109/ICRA.2016.7487152

← 1 2 3 4 5 →