DUAL ASCENT AND PRIMAL-DUAL ALGORITHMS FOR INFINITE-HORIZON NONSTATIONARY MARKOV DECISION PROCESSES

被引:0
|
作者
Ghate, Archis [1 ]
机构
[1] Univ Washington, Dept Ind & Syst Engn, Seattle, WA 98195 USA
关键词
dynamic programming; Bellman's equations; value convergence; TIME-VARYING SYSTEMS; FORECAST HORIZONS; SIMPLEX-METHOD; CONVEX PRODUCTION; POLICY-ITERATION; APPROXIMATIONS; OPTIMALITY; COST;
D O I
10.1137/22M149185X
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Infinite-horizon nonstationary Markov decision processes (MDPs) extend their stationary counterparts by allowing temporal variations in immediate costs and transition probabilities. Bellman's characterization of optimality and equivalent primal-dual linear programming formulations for these MDPs include a countably infinite number of variables and equations. Simple policy iteration, also viewed as a primal simplex algorithm, is the state of the art in solving these MDPs. It produces a sequence of policies whose costs-to-go converge monotonically from above to optimal. This suffers from two limitations. A cost-improving policy update is computationally expensive and an optimality gap is missing. We propose two dual-based approaches to address these concerns. The first, called dual ascent, maintains approximate costs-to-go (dual variables) and corresponding non-negative errors in Bellman's equations. The dual variables are iteratively increased such that errors vanish asymptotically. This guarantees that dual variables converge monotonically from below to optimal. This has two limitations. It does not maintain a sequence of policies (primal variables). Hence, it does not provide a decision-making strategy at termination and does not offer an upper bound on the optimal costs-to-go. The second approach, termed the primal-dual method, addresses these limitations. It maintains a primal policy, dual approximations of its costs-to-go, the corresponding nonegative Bellman's errors, and inherits monotonic dual value convergence. The key is a so-called rebalancing step, which leads to a duality gap--based stopping criterion and also primal value convergence. Computational experiments demonstrate the benefits of primal-dual over dual ascent and that primal-dual is orders of magnitude faster than simple policy iteration.
引用
收藏
页码:1391 / 1415
页数:25
相关论文
共 50 条
  • [1] Primal-Dual Algorithms for Discounted Markov Decision Processes
    Cogill, Randy
    2015 EUROPEAN CONTROL CONFERENCE (ECC), 2015, : 260 - 265
  • [2] A Linear Programming Approach to Nonstationary Infinite-Horizon Markov Decision Processes
    Ghate, Archis
    Smith, Robert L.
    OPERATIONS RESEARCH, 2013, 61 (02) : 413 - 425
  • [3] An Online Primal-Dual Method for Discounted Markov Decision Processes
    Wang, Mengdi
    Chen, Yichen
    2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC), 2016, : 4516 - 4521
  • [4] ACCELERATING PRIMAL-DUAL METHODS FOR REGULARIZED MARKOV DECISION PROCESSES
    Li, Haoya
    Yu, Hsiang-Fu
    Ying, Lexing
    Dhillon, Inderjit S.
    SIAM JOURNAL ON OPTIMIZATION, 2024, 34 (01) : 764 - 789
  • [5] ON PRIMAL-DUAL ALGORITHMS
    BELL, EJ
    JENNINGS, C
    COMMUNICATIONS OF THE ACM, 1966, 9 (09) : 653 - &
  • [6] Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes
    Ding, Dongsheng
    Zhang, Kaiqing
    Basar, Tamer
    Jovanovic, Mihailo R.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [7] Stochastic Primal-Dual Method for Learning Mixture Policies in Markov Decision Processes
    Khuzani, Masoud Badiei
    Vasudevan, Varun
    Ren, Hongyi
    Xing, Lei
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 1293 - 1300
  • [8] Extreme point characterization of constrained nonstationary infinite-horizon Markov decision processes with finite state space
    Lee, Ilbin
    Epelman, Marina A.
    Romeijn, H. Edwin
    Smith, Robert L.
    OPERATIONS RESEARCH LETTERS, 2014, 42 (03) : 238 - 245
  • [9] Convergence and optimality of policy gradient primal-dual method for constrained Markov decision processes
    Ding, Dongsheng
    Zhang, Kaiqing
    Basar, Tamer
    Jovanovic, Mihailo R.
    2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 2851 - 2856
  • [10] Policy-Based Primal-Dual Methods for Convex Constrained Markov Decision Processes
    Ying, Donghao
    Guo, Mengzi Amy
    Ding, Yuhao
    Lavaei, Javad
    Shen, Zuo-Jun
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 10963 - 10971