Policy-based branch-and-bound for infinite-horizon Multi-model Markov decision processes

被引：5

作者：

Ahluwalia, Vinayak S. ^{[1
]}

Steimle, Lauren N. ^{[2
]}

Denton, Brian T. ^{[3
]}

机构：

[1] Univ Michigan, Dept Elect Engn & Comp Sci, Ann Arbor, MI 48109 USA

[2] Georgia Inst Technol, H Milton Stewart Sch Ind & Syst Engn, Atlanta, GA 30332 USA

[3] Univ Michigan, Dept Ind & Operat Engn, Ann Arbor, MI 48109 USA

来源：

COMPUTERS & OPERATIONS RESEARCH | 2021年 / 126卷 / 126期

基金：

美国国家科学基金会;

关键词：

Markov decision processes; Parameter uncertainty; Branch-and-bound;

D O I：

10.1016/j.cor.2020.105108

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Markov decision processes (MDPs) are models for sequential decision-making that inform decision making in many fields, including healthcare, manufacturing, and others. However, the optimal policy for an MDP may be sensitive to the reward and transition parameters which are often uncertain because parameters are typically estimated from data or rely on expert opinion. To address parameter uncertainty in MDPs, it has been proposed that multiple models of the parameters be incorporated into the solution process, but solving these problems can be computationally challenging. In this article, we propose a policy based branch-and-bound approach that leverages the structure of these problems and numerically compare several important algorithmic designs. We demonstrate that our approach outperforms existing methods on test cases from the literature including randomly generated MDPs, a machine maintenance MDP, and an MDP for medical decision making. (C) 2020 Elsevier Ltd. All rights reserved.

引用

页数：13

共 29 条

[21] A Provably-Efficient Model-Free Algorithm for Infinite-Horizon Average-Reward Constrained Markov Decision Processes
Wei, Honghao
Liu, Xin
Ying, Lei
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3868 - 3876
[22] FINITE-STATE APPROXIMATIONS FOR DENUMERABLE-STATE INFINITE-HORIZON DISCOUNTED MARKOV DECISION-PROCESSES
WHITE, DJ
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1980, 74 (01) : 292 - 295
[23] On Optimal Control of Discounted Cost Infinite-Horizon Markov Decision Processes Under Local State Information Structures
Peng, Guanze
Kavitha, Veeraruna
Zhu, Quanyan
IFAC PAPERSONLINE, 2020, 53 (02): : 6881 - 6886
[24] Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes
Bai, Qinbo
Mondal, Washim Uddin
Aggarwal, Vaneet
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 10, 2024, : 10980 - 10988
[25] Approximate Robust Policy Iteration Using Multilayer Perceptron Neural Networks for Discounted Infinite-Horizon Markov Decision Processes With Uncertain Correlated Transition Matrices
Li, Baohua
Si, Jennie
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2010, 21 (08): : 1270 - 1280
[26] A SAMPLED FICTITIOUS PLAY BASED LEARNING ALGORITHM FOR INFINITE HORIZON MARKOV DECISION PROCESSES
Sisikoglu, Esra
Epelman, Marina A.
Smith, Robert L.
PROCEEDINGS OF THE 2011 WINTER SIMULATION CONFERENCE (WSC), 2011, : 4086 - 4097
[27] Policy-Based Primal-Dual Methods for Convex Constrained Markov Decision Processes
Ying, Donghao
Guo, Mengzi Amy
Ding, Yuhao
Lavaei, Javad
Shen, Zuo-Jun
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 10963 - 10971
[28] FINITE STATE APPROXIMATIONS FOR DENUMERABLE-STATE INFINITE HORIZON CONTRACTED MARKOV DECISION-PROCESSES - POLICY SPACE METHOD
WHITE, DJ
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1979, 72 (02) : 512 - 523
[29] Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes
Mondal, Washim Uddin
Aggarwal, Vaneet
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238

← 1 2 3 →