Policy-based branch-and-bound for infinite-horizon Multi-model Markov decision processes

被引：5

作者：

Ahluwalia, Vinayak S. ^{[1
]}

Steimle, Lauren N. ^{[2
]}

Denton, Brian T. ^{[3
]}

机构：

[1] Univ Michigan, Dept Elect Engn & Comp Sci, Ann Arbor, MI 48109 USA

[2] Georgia Inst Technol, H Milton Stewart Sch Ind & Syst Engn, Atlanta, GA 30332 USA

[3] Univ Michigan, Dept Ind & Operat Engn, Ann Arbor, MI 48109 USA

来源：

COMPUTERS & OPERATIONS RESEARCH | 2021年 / 126卷 / 126期

基金：

美国国家科学基金会;

关键词：

Markov decision processes; Parameter uncertainty; Branch-and-bound;

D O I：

10.1016/j.cor.2020.105108

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Markov decision processes (MDPs) are models for sequential decision-making that inform decision making in many fields, including healthcare, manufacturing, and others. However, the optimal policy for an MDP may be sensitive to the reward and transition parameters which are often uncertain because parameters are typically estimated from data or rely on expert opinion. To address parameter uncertainty in MDPs, it has been proposed that multiple models of the parameters be incorporated into the solution process, but solving these problems can be computationally challenging. In this article, we propose a policy based branch-and-bound approach that leverages the structure of these problems and numerically compare several important algorithmic designs. We demonstrate that our approach outperforms existing methods on test cases from the literature including randomly generated MDPs, a machine maintenance MDP, and an MDP for medical decision making. (C) 2020 Elsevier Ltd. All rights reserved.

引用

页数：13

共 29 条

[1] Cosine Policy Iteration for Solving Infinite-Horizon Markov Decision Processes
Frausto-Solis, Juan
Santiago, Elizabeth
Mora-Vargas, Jaime
MICAI 2009: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, 5845 : 75 - +
[2] MULTI-OBJECTIVE INFINITE-HORIZON DISCOUNTED MARKOV DECISION-PROCESSES
WHITE, DJ
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1982, 89 (02) : 639 - 647
[3] Model-Based Reinforcement Learning for Infinite-Horizon Discounted Constrained Markov Decision Processes
HasanzadeZonuzy, Aria
Kalathil, Dileep
Shakkottai, Srinivas
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2519 - 2525
[4] Fast Approximate Dynamic Programming for Infinite-Horizon Markov Decision Processes
Kolarijani, M. A. S.
Max, G. F.
Esfahani, P. Mohajerin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[5] Solution and forecast horizons for infinite-horizon nonhornogeneous Markov decision processes
Cheevaprawatdomrong, Torpong
Schochetman, Irwin E.
Smith, Robert L.
Garcia, Alfredo
MATHEMATICS OF OPERATIONS RESEARCH, 2007, 32 (01) : 51 - 72
[6] A Linear Programming Approach to Nonstationary Infinite-Horizon Markov Decision Processes
Ghate, Archis
Smith, Robert L.
OPERATIONS RESEARCH, 2013, 61 (02) : 413 - 425
[7] Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints
Chen, Liyu
Jain, Rahul
Luo, Haipeng
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[8] Multi-model Markov decision processes
Steimle, Lauren N.
Kaufman, David L.
Denton, Brian T.
IISE TRANSACTIONS, 2021, 53 (10) : 1124 - 1139
[9] On Supervised Online Rolling-Horizon Control for Infinite-Horizon Discounted Markov Decision Processes
Chang, Hyeong Soo
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (02) : 1060 - 1065
[10] Economic MPC of Markov Decision Processes: Dissipativity in undiscounted infinite-horizon optimal control
Gros, Sebastien
Zanon, Mario
AUTOMATICA, 2022, 146

← 1 2 3 →