Hessian matrix distribution for Bayesian policy gradient reinforcement learning

被引:24
|
作者
Ngo Anh Vien [1 ,2 ]
Yu, Hwanjo [1 ]
Chung, TaeChoong [2 ]
机构
[1] Pohang Univ Sci & Technol POSTECH, Dept Comp Sci & Engn, Data Min Lab, Pohang, South Korea
[2] Kyung Hee Univ, Sch Elect & Informat, Dept Comp Engn, Artificial Intelligence Lab, Yongin 446701, Gyeonggi, South Korea
关键词
Markov decision process; Reinforcement learning; Bayesian policy gradient; Monte-Carlo policy gradient; Policy gradient; Hessian matrix distribution;
D O I
10.1016/j.ins.2011.01.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bayesian policy gradient algorithms have been recently proposed for modeling the policy gradient of the performance measure in reinforcement learning as a Gaussian process. These methods were known to reduce the variance and the number of samples needed to obtain accurate gradient estimates in comparison to the conventional Monte-Carlo policy gradient algorithms. In this paper, we propose an improvement over previous Bayesian frameworks for the policy gradient. We use the Hessian matrix distribution as a learning rate schedule to improve the performance of the Bayesian policy gradient algorithm in terms of the variance and the number of samples. As in computing the policy gradient distributions, the Bayesian quadrature method is used to estimate the Hessian matrix distributions. We prove that the posterior mean of the Hessian distribution estimate is symmetric, one of the important properties of the Hessian matrix. Moreover, we prove that with an appropriate choice of kernel, the computational complexity of Hessian distribution estimate is equal to that of the policy gradient distribution estimates. Using simulations, we show encouraging experimental results comparing the proposed algorithm to the Bayesian policy gradient and the Bayesian policy natural gradient algorithms described in Ghavamzadeh and Engel [10]. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:1671 / 1685
页数:15
相关论文
共 50 条
  • [31] Active structural control framework using policy-gradient reinforcement learning
    Eshkevari, Soheila Sadeghi
    Eshkevari, Soheil Sadeghi
    Sen, Debarshi
    Pakzad, Shamim N.
    ENGINEERING STRUCTURES, 2022, 274
  • [32] Continuous Parameter Control in Genetic Algorithms using Policy Gradient Reinforcement Learning
    de Miguel Gomez, Alejandro
    Toosi, Farshad Ghassemi
    PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL INTELLIGENCE (IJCCI), 2021, : 115 - 122
  • [33] Control Randomisation Approach for Policy Gradient and Application to Reinforcement Learning in Optimal Switching
    Denkert, Robert
    Pham, Huyen
    Warin, Xavier
    APPLIED MATHEMATICS AND OPTIMIZATION, 2025, 91 (01):
  • [34] Practical Critic Gradient based Actor Critic for On-Policy Reinforcement Learning
    Gurumurthy, Swaminathan
    Manchester, Zachary
    Kolter, J. Zico
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [35] QSOD: Hybrid Policy Gradient for Deep Multi-agent Reinforcement Learning
    Rehman, Hafiz Muhammad Raza Ur
    On, Byung-Won
    Ningombam, Devarani Devi
    Yi, Sungwon
    Choi, Gyu Sang
    IEEE ACCESS, 2021, 9 : 129728 - 129741
  • [36] Reinforcement Learning for Mobile Robot Obstacle Avoidance with Deep Deterministic Policy Gradient
    Chen, Miao
    Li, Wenna
    Fei, Shihan
    Wei, Yufei
    Tu, Mingyang
    Li, Jiangbo
    INTELLIGENT ROBOTICS AND APPLICATIONS (ICIRA 2022), PT III, 2022, 13457 : 197 - 204
  • [37] Gradient Monitored Reinforcement Learning
    Abdul Hameed, Mohammed Sharafath
    Chadha, Gavneet Singh
    Schwung, Andreas
    Ding, Steven X.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 4106 - 4119
  • [38] Learning Heuristics for the TSP by Policy Gradient
    Deudon, Michel
    Cournut, Pierre
    Lacoste, Alexandre
    Adulyasak, Yossiri
    Rousseau, Louis-Martin
    INTEGRATION OF CONSTRAINT PROGRAMMING, ARTIFICIAL INTELLIGENCE, AND OPERATIONS RESEARCH, CPAIOR 2018, 2018, 10848 : 170 - 181
  • [39] An Information-Theoretic Analysis of Bayesian Reinforcement Learning
    Gouverneur, Amaury
    Rodriguez-Galvez, Borja
    Oechtering, Tobias J.
    Skoglund, Mikael
    2022 58TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2022,
  • [40] Bayesian reinforcement learning for navigation planning in unknown environments
    Alali, Mohammad
    Imani, Mahdi
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 7