On the Convex Formulations of Robust Markov Decision Processes

被引:0
|
作者
Grand-Clement, Julien [1 ]
Petrik, Marek [2 ]
机构
[1] Ecole Hautes Etud Commerciales HEC Paris, Informat Syst & Operat Management Dept, Jouy En Josas, France
[2] Univ New Hampshire, Dept Comp Sci, Durham, NH 03824 USA
关键词
Markov decision processes; robust optimization; conic optimization; POLICY-ITERATION; PAYOFF GAMES; RISK; COMPLEXITY; PROGRAMS;
D O I
10.1287/moor.2022.0284
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
Robust Markov decision processes (MDPs) are used for applications of dynamic optimization in uncertain environments and have been studied extensively. Many of the main properties and algorithms of MDPs, such as value iteration and policy iteration, extend directly to RMDPs. Surprisingly, there is no known analog of the MDP convex optimization formulation for solving RMDPs. This work describes the first convex optimization formulation of RMDPs under the classical sa-rectangularity and s-rectangularity assumptions. By using entropic regularization and exponential change of variables, we derive a convex formulation with a number of variables and constraints polynomial in the number of states and actions, but with large coefficients in the constraints. We further simplify the formulation for RMDPs with polyhedral, ellipsoidal, or entropy-based uncertainty sets, showing that, in these cases, RMDPs can be reformulated as conic programs based on exponential cones, quadratic cones, and nonnegative orthants. Our work opens a new research direction for RMDPs and can serve as a first step toward obtaining a tractable convex formulation of RMDPs.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Robust Markov Decision Processes
    Wiesemann, Wolfram
    Kuhn, Daniel
    Rustem, Berc
    MATHEMATICS OF OPERATIONS RESEARCH, 2013, 38 (01) : 153 - 183
  • [2] A NOTE ON OPTIMIZATION FORMULATIONS OF MARKOV DECISION PROCESSES
    Ying, Lexing
    Zhu, Yuhua
    COMMUNICATIONS IN MATHEMATICAL SCIENCES, 2022, 20 (03) : 727 - 745
  • [3] Distributionally Robust Markov Decision Processes
    Xu, Huan
    Mannor, Shie
    MATHEMATICS OF OPERATIONS RESEARCH, 2012, 37 (02) : 288 - 300
  • [4] Online Convex Optimization in Adversarial Markov Decision Processes
    Rosenberg, Aviv
    Mansour, Yishay
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [5] A CONVEX ANALYTIC APPROACH TO MARKOV DECISION-PROCESSES
    BORKAR, VS
    PROBABILITY THEORY AND RELATED FIELDS, 1988, 78 (04) : 583 - 602
  • [6] Robust Anytime Learning of Markov Decision Processes
    Suilen, Marnix
    Simao, Thiago D.
    Parker, David
    Jansen, Nils
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [7] Distributionally Robust Counterpart in Markov Decision Processes
    Yu, Pengqian
    Xu, Huan
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2016, 61 (09) : 2538 - 2543
  • [8] Robust Markov Decision Processes: Beyond Rectangularity
    Goyal, Vineet
    Grand-Clement, Julien
    MATHEMATICS OF OPERATIONS RESEARCH, 2023, 48 (01) : 203 - 226
  • [9] Reinforcement Learning in Robust Markov Decision Processes
    Lim, Shiau Hong
    Xu, Huan
    Mannor, Shie
    MATHEMATICS OF OPERATIONS RESEARCH, 2016, 41 (04) : 1325 - 1353
  • [10] LINEAR-PROGRAMMING FORMULATIONS OF MARKOV DECISION-PROCESSES
    NAZARETH, JL
    KULKARNI, RB
    OPERATIONS RESEARCH LETTERS, 1986, 5 (01) : 13 - 16