A study of value iteration and policy iteration for Markov decision processes in Deterministic systems

被引：0

作者：

Zheng, Haifeng ^{[1
]}

Wang, Dan ^{[1
]}

机构：

[1] Jinan Univ, Sch Econ, Guangzhou 510632, Guangdong, Peoples R China

来源：

AIMS MATHEMATICS | 2024年 / 9卷 / 12期

关键词：

Markov decision processes; Deterministic system; value iteration; policy iteration; average cost criterion;

D O I：

10.3934/math.20241613

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

In the context of deterministic discrete-time control systems, we examined the implementation of value iteration (VI) and policy (PI) algorithms in Markov decision processes (MDPs) situated within Borel spaces. The deterministic nature of the system's transfer function plays a pivotal role, as the convergence criteria of these algorithms are deeply interconnected with the inherent characteristics of the probability function governing state transitions. For VI, convergence is contingent upon verifying that the cost difference function stabilizes to a constant k ensuring uniformity across iterations. In contrast, PI achieves convergence when the value function maintains consistent values over successive iterations. Finally, a detailed example demonstrates the conditions under which convergence of the algorithm is achieved, underscoring the practicality of these methods in deterministic settings.

引用

页码：33818 / 33842

页数：25

共 50 条

[21] Generalized Second-Order Value Iteration in Markov Decision Processes
Kamanchi, Chandramouli
Diddigi, Raghuram Bharadwaj
Bhatnagar, Shalabh
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (08) : 4241 - 4247
[22] IntervalMDP. jl: Accelerated Value Iteration for Interval Markov Decision Processes
Mathiesen, Frederik Baymler
Lahijanian, Morteza
Laurenti, Luca
IFAC PAPERSONLINE, 2024, 58 (11): : 1 - 6
[23] The policy iteration algorithm for average reward Markov decision processes with general state space
Meyn, SP
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1997, 42 (12) : 1663 - 1680
[24] A NEW POLICY ITERATION SCHEME FOR MARKOV DECISION-PROCESSES USING SCHWEITZER FORMULA
LASSERRE, JB
JOURNAL OF APPLIED PROBABILITY, 1994, 31 (01) : 268 - 273
[25] SERIAL AND PARALLEL VALUE-ITERATION ALGORITHMS FOR DISCOUNTED MARKOV DECISION-PROCESSES
ARCHIBALD, TW
MCKINNON, KIM
THOMAS, LC
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 1993, 67 (02) : 188 - 203
[26] An optimistic value iteration for mean-variance optimization in discounted Markov decision processes
Ma, Shuai
Ma, Xiaoteng
Xia, Li
RESULTS IN CONTROL AND OPTIMIZATION, 2022, 8
[27] MULTIPLY ACCELERATED VALUE ITERATION FOR NONSYMMETRIC AFFINE FIXED POINT PROBLEMS AND APPLICATION TO MARKOV DECISION PROCESSES
Akian, Marianne
Gaubert, Stephane
Qu, Zheng
Saadi, Omar
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2022, 43 (01) : 199 - 232
[28] Acceleration Operators in the Value Iteration Algorithms for Markov Decision Processes (vol 58, pg 193, 2010)
Shlakhter, Oleksandr
Lee, Chi-Guhn
Khmelev, Dmitry
Jaber, Nasser
OPERATIONS RESEARCH, 2010, 58 (01) : 202 - 202
[29] Accelerating Interval Iteration for Expected Rewards in Markov Decision Processes
Mohagheghi, Mohammadsadegh
Salehi, Khayyam
ICSOFT: PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGIES, 2020, : 39 - 50
[30] Efficient Strategy Iteration for Mean Payoff in Markov Decision Processes
Kretinsky, Jan
Meggendorfer, Tobias
AUTOMATED TECHNOLOGY FOR VERIFICATION AND ANALYSIS (ATVA 2017), 2017, 10482 : 380 - 399

← 1 2 3 4 5 →