A study of value iteration and policy iteration for Markov decision processes in Deterministic systems

被引:0
作者
Zheng, Haifeng [1 ]
Wang, Dan [1 ]
机构
[1] Jinan Univ, Sch Econ, Guangzhou 510632, Guangdong, Peoples R China
来源
AIMS MATHEMATICS | 2024年 / 9卷 / 12期
关键词
Markov decision processes; Deterministic system; value iteration; policy iteration; average cost criterion;
D O I
10.3934/math.20241613
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In the context of deterministic discrete-time control systems, we examined the implementation of value iteration (VI) and policy (PI) algorithms in Markov decision processes (MDPs) situated within Borel spaces. The deterministic nature of the system's transfer function plays a pivotal role, as the convergence criteria of these algorithms are deeply interconnected with the inherent characteristics of the probability function governing state transitions. For VI, convergence is contingent upon verifying that the cost difference function stabilizes to a constant k ensuring uniformity across iterations. In contrast, PI achieves convergence when the value function maintains consistent values over successive iterations. Finally, a detailed example demonstrates the conditions under which convergence of the algorithm is achieved, underscoring the practicality of these methods in deterministic settings.
引用
收藏
页码:33818 / 33842
页数:25
相关论文
共 50 条
  • [21] Generalized Second-Order Value Iteration in Markov Decision Processes
    Kamanchi, Chandramouli
    Diddigi, Raghuram Bharadwaj
    Bhatnagar, Shalabh
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (08) : 4241 - 4247
  • [22] IntervalMDP. jl: Accelerated Value Iteration for Interval Markov Decision Processes
    Mathiesen, Frederik Baymler
    Lahijanian, Morteza
    Laurenti, Luca
    IFAC PAPERSONLINE, 2024, 58 (11): : 1 - 6
  • [23] The policy iteration algorithm for average reward Markov decision processes with general state space
    Meyn, SP
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1997, 42 (12) : 1663 - 1680
  • [25] SERIAL AND PARALLEL VALUE-ITERATION ALGORITHMS FOR DISCOUNTED MARKOV DECISION-PROCESSES
    ARCHIBALD, TW
    MCKINNON, KIM
    THOMAS, LC
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 1993, 67 (02) : 188 - 203
  • [26] An optimistic value iteration for mean-variance optimization in discounted Markov decision processes
    Ma, Shuai
    Ma, Xiaoteng
    Xia, Li
    RESULTS IN CONTROL AND OPTIMIZATION, 2022, 8
  • [27] MULTIPLY ACCELERATED VALUE ITERATION FOR NONSYMMETRIC AFFINE FIXED POINT PROBLEMS AND APPLICATION TO MARKOV DECISION PROCESSES
    Akian, Marianne
    Gaubert, Stephane
    Qu, Zheng
    Saadi, Omar
    SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2022, 43 (01) : 199 - 232
  • [28] Acceleration Operators in the Value Iteration Algorithms for Markov Decision Processes (vol 58, pg 193, 2010)
    Shlakhter, Oleksandr
    Lee, Chi-Guhn
    Khmelev, Dmitry
    Jaber, Nasser
    OPERATIONS RESEARCH, 2010, 58 (01) : 202 - 202
  • [29] Accelerating Interval Iteration for Expected Rewards in Markov Decision Processes
    Mohagheghi, Mohammadsadegh
    Salehi, Khayyam
    ICSOFT: PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGIES, 2020, : 39 - 50
  • [30] Efficient Strategy Iteration for Mean Payoff in Markov Decision Processes
    Kretinsky, Jan
    Meggendorfer, Tobias
    AUTOMATED TECHNOLOGY FOR VERIFICATION AND ANALYSIS (ATVA 2017), 2017, 10482 : 380 - 399