Hessian matrix distribution for Bayesian policy gradient reinforcement learning

被引:24
|
作者
Ngo Anh Vien [1 ,2 ]
Yu, Hwanjo [1 ]
Chung, TaeChoong [2 ]
机构
[1] Pohang Univ Sci & Technol POSTECH, Dept Comp Sci & Engn, Data Min Lab, Pohang, South Korea
[2] Kyung Hee Univ, Sch Elect & Informat, Dept Comp Engn, Artificial Intelligence Lab, Yongin 446701, Gyeonggi, South Korea
关键词
Markov decision process; Reinforcement learning; Bayesian policy gradient; Monte-Carlo policy gradient; Policy gradient; Hessian matrix distribution;
D O I
10.1016/j.ins.2011.01.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bayesian policy gradient algorithms have been recently proposed for modeling the policy gradient of the performance measure in reinforcement learning as a Gaussian process. These methods were known to reduce the variance and the number of samples needed to obtain accurate gradient estimates in comparison to the conventional Monte-Carlo policy gradient algorithms. In this paper, we propose an improvement over previous Bayesian frameworks for the policy gradient. We use the Hessian matrix distribution as a learning rate schedule to improve the performance of the Bayesian policy gradient algorithm in terms of the variance and the number of samples. As in computing the policy gradient distributions, the Bayesian quadrature method is used to estimate the Hessian matrix distributions. We prove that the posterior mean of the Hessian distribution estimate is symmetric, one of the important properties of the Hessian matrix. Moreover, we prove that with an appropriate choice of kernel, the computational complexity of Hessian distribution estimate is equal to that of the policy gradient distribution estimates. Using simulations, we show encouraging experimental results comparing the proposed algorithm to the Bayesian policy gradient and the Bayesian policy natural gradient algorithms described in Ghavamzadeh and Engel [10]. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:1671 / 1685
页数:15
相关论文
共 50 条
  • [21] Policy Gradient Reinforcement Learning for I/O Reordering on Storage Servers
    Dheenadayalan, Kumar
    Srinivasaraghavan, Gopalakrishnan
    Muralidhara, V. N.
    NEURAL INFORMATION PROCESSING, ICONIP 2017, PT I, 2017, 10634 : 849 - 859
  • [22] A Collaborative Multiagent Reinforcement Learning Method Based on Policy Gradient Potential
    Zhang, Zhen
    Ong, Yew-Soon
    Wang, Dongqing
    Xue, Binqiang
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (02) : 1015 - 1027
  • [23] Performance Improvement of Linux CPU Scheduler Using Policy Gradient Reinforcement Learning for Android Smartphones
    Han, Junyeong
    Lee, Sungyoung
    IEEE ACCESS, 2020, 8 : 11031 - 11045
  • [24] Decentralized multi-task reinforcement learning policy gradient method with momentum over networks
    Shi Junru
    Wang Qiong
    Liu Muhua
    Ji Zhihang
    Zheng Ruijuan
    Wu Qingtao
    APPLIED INTELLIGENCE, 2023, 53 (09) : 10365 - 10379
  • [25] Decentralized multi-task reinforcement learning policy gradient method with momentum over networks
    Shi Junru
    Wang Qiong
    Liu Muhua
    Ji Zhihang
    Zheng Ruijuan
    Wu Qingtao
    Applied Intelligence, 2023, 53 : 10365 - 10379
  • [26] Efficient Bayesian Policy Reuse With a Scalable Observation Model in Deep Reinforcement Learning
    Liu, Jinmei
    Wang, Zhi
    Chen, Chunlin
    Dong, Daoyi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14797 - 14809
  • [27] Gradient-Enhanced Bayesian Optimization via Acquisition Ensembles with Application to Reinforcement Learning
    Makrygiorgos, Georgios
    Paulson, Joel A.
    Mesbah, Ali
    IFAC PAPERSONLINE, 2023, 56 (02): : 638 - 643
  • [28] Reinforced knowledge distillation: Multi-class imbalanced classifier based on policy gradient reinforcement learning
    Fan, Saite
    Zhang, Xinmin
    Song, Zhihuan
    NEUROCOMPUTING, 2021, 463 : 422 - 436
  • [29] REINFORCEMENT LEARNING OF SPEECH RECOGNITION SYSTEM BASED ON POLICY GRADIENT AND HYPOTHESIS SELECTION
    Kato, Taku
    Shinozaki, Takahiro
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5759 - 5763
  • [30] Model-free Reinforcement Learning of Semantic Communication by Stochastic Policy Gradient
    Beck, Edgar
    Bockelmann, Carsten
    Dekorsy, Armin
    2024 IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING FOR COMMUNICATION AND NETWORKING, ICMLCN 2024, 2024, : 367 - 373