Potential-based online policy iteration algorithms for Markov decision processes

被引:26
|
作者
Fang, HT [1 ]
Cao, XR
机构
[1] Chinese Acad Sci, Acad Math & Syst Sci, Lab Syst & Control, Beijing 100080, Peoples R China
[2] Hong Kong Univ Sci & Technol, Kowloon, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Markov decision process; potential; recursive optimization;
D O I
10.1109/TAC.2004.825647
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Performance potentials play a crucial role in performance sensitivity analysis and policy iteration of Markov decision processes. The potentials can be estimated on a single sample path of a Markov process. In this paper, we propose two potential-based online policy iteration algorithms for performance optimization of Markov systems. The algorithms are based on online estimation of potentials and stochastic approximation. We prove that with these two algorithms the optimal. policy can be attained after it finite number of iterations. A simulation example,is given to illustrate the main ideas and the convergence rates of the algorithms.
引用
收藏
页码:493 / 505
页数:13
相关论文
共 50 条
  • [41] Average optimality for continuous-time Markov decision processes with a policy iteration approach
    Zhu, Quanxin
    JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2008, 339 (01) : 691 - 704
  • [42] Potential-Based Least-Squares Policy Iteration for a Parameterized Feedback Control System
    Cheng, Kang
    Zhang, Kanjian
    Fei, Shumin
    Wei, Haikun
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2016, 169 (02) : 692 - 704
  • [43] Potential-Based Least-Squares Policy Iteration for a Parameterized Feedback Control System
    Kang Cheng
    Kanjian Zhang
    Shumin Fei
    Haikun Wei
    Journal of Optimization Theory and Applications, 2016, 169 : 692 - 704
  • [44] Topological Value Iteration Algorithm for Markov Decision Processes
    Dai, Peng
    Goldsmith, Judy
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1860 - 1865
  • [45] New prioritized value iteration for Markov decision processes
    de Guadalupe Garcia-Hernandez, Ma.
    Ruiz-Pinales, Jose
    Onaindia, Eva
    Gabriel Avina-Cervantes, J.
    Ledesma-Orozco, Sergio
    Alvarado-Mendez, Edgar
    Reyes-Ballesteros, Alberto
    ARTIFICIAL INTELLIGENCE REVIEW, 2012, 37 (02) : 157 - 167
  • [46] New prioritized value iteration for Markov decision processes
    Ma. de Guadalupe Garcia-Hernandez
    Jose Ruiz-Pinales
    Eva Onaindia
    J. Gabriel Aviña-Cervantes
    Sergio Ledesma-Orozco
    Edgar Alvarado-Mendez
    Alberto Reyes-Ballesteros
    Artificial Intelligence Review, 2012, 37 : 157 - 167
  • [47] Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces
    Zhu, Quanxin
    Yang, Xinsong
    Huang, Chuangxia
    ABSTRACT AND APPLIED ANALYSIS, 2009,
  • [48] ON THE CONVERGENCE OF POLICY ITERATION IN FINITE STATE UNDISCOUNTED MARKOV DECISION-PROCESSES - THE UNICHAIN CASE
    HORDIJK, A
    PUTERMAN, ML
    MATHEMATICS OF OPERATIONS RESEARCH, 1987, 12 (01) : 163 - 176
  • [49] An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions
    Ma, Yao
    Zhao, Tingting
    Hatano, Kohei
    Sugiyama, Masashi
    NEURAL COMPUTATION, 2016, 28 (03) : 563 - 593
  • [50] A K-step look-ahead analysis of value iteration algorithms for Markov decision processes
    Herzberg, M
    Yechiali, U
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 1996, 88 (03) : 622 - 636