Potential-based online policy iteration algorithms for Markov decision processes

被引：26

作者：

Fang, HT ^{[1
]}

Cao, XR

机构：

[1] Chinese Acad Sci, Acad Math & Syst Sci, Lab Syst & Control, Beijing 100080, Peoples R China

[2] Hong Kong Univ Sci & Technol, Kowloon, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2004年 / 49卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Markov decision process; potential; recursive optimization;

D O I：

10.1109/TAC.2004.825647

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Performance potentials play a crucial role in performance sensitivity analysis and policy iteration of Markov decision processes. The potentials can be estimated on a single sample path of a Markov process. In this paper, we propose two potential-based online policy iteration algorithms for performance optimization of Markov systems. The algorithms are based on online estimation of potentials and stochastic approximation. We prove that with these two algorithms the optimal. policy can be attained after it finite number of iterations. A simulation example,is given to illustrate the main ideas and the convergence rates of the algorithms.

引用

页码：493 / 505

页数：13

共 50 条

[41] Average optimality for continuous-time Markov decision processes with a policy iteration approach
Zhu, Quanxin
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2008, 339 (01) : 691 - 704
[42] Potential-Based Least-Squares Policy Iteration for a Parameterized Feedback Control System
Cheng, Kang
Zhang, Kanjian
Fei, Shumin
Wei, Haikun
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2016, 169 (02) : 692 - 704
[43] Potential-Based Least-Squares Policy Iteration for a Parameterized Feedback Control System
Kang Cheng
Kanjian Zhang
Shumin Fei
Haikun Wei
Journal of Optimization Theory and Applications, 2016, 169 : 692 - 704
[44] Topological Value Iteration Algorithm for Markov Decision Processes
Dai, Peng
Goldsmith, Judy
20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1860 - 1865
[45] New prioritized value iteration for Markov decision processes
de Guadalupe Garcia-Hernandez, Ma.
Ruiz-Pinales, Jose
Onaindia, Eva
Gabriel Avina-Cervantes, J.
Ledesma-Orozco, Sergio
Alvarado-Mendez, Edgar
Reyes-Ballesteros, Alberto
ARTIFICIAL INTELLIGENCE REVIEW, 2012, 37 (02) : 157 - 167
[46] New prioritized value iteration for Markov decision processes
Ma. de Guadalupe Garcia-Hernandez
Jose Ruiz-Pinales
Eva Onaindia
J. Gabriel Aviña-Cervantes
Sergio Ledesma-Orozco
Edgar Alvarado-Mendez
Alberto Reyes-Ballesteros
Artificial Intelligence Review, 2012, 37 : 157 - 167
[47] Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces
Zhu, Quanxin
Yang, Xinsong
Huang, Chuangxia
ABSTRACT AND APPLIED ANALYSIS, 2009,
[48] ON THE CONVERGENCE OF POLICY ITERATION IN FINITE STATE UNDISCOUNTED MARKOV DECISION-PROCESSES - THE UNICHAIN CASE
HORDIJK, A
PUTERMAN, ML
MATHEMATICS OF OPERATIONS RESEARCH, 1987, 12 (01) : 163 - 176
[49] An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions
Ma, Yao
Zhao, Tingting
Hatano, Kohei
Sugiyama, Masashi
NEURAL COMPUTATION, 2016, 28 (03) : 563 - 593
[50] A K-step look-ahead analysis of value iteration algorithms for Markov decision processes
Herzberg, M
Yechiali, U
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 1996, 88 (03) : 622 - 636

← 1 2 3 4 5 →