Variational Bayesian Parameter-Based Policy Exploration

被引:1
作者
Hosino, Tikara [1 ]
机构
[1] Nihon Unisys Ltd, Technol Res & Innovat, Koto Ku, 1-1-1 Toyosu, Tokyo, Japan
来源
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2020年
关键词
Reinforcement Learning; Parameter-Based method; Bayesian Learning; Variational Approximation; Continuous Control; Exploration and Exploitation Trade-Off; GRADIENTS;
D O I
10.1109/ijcnn48605.2020.9207091
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning has shown success in many tasks that cannot provide explicit training samples and can only provide rewards. However, because of a lack of robustness and the need for hard hyperparameter tuning, reinforcement learning is not easily applicable in many new situations. One reason for this problem is that the existing methods do not account for the uncertainties of rewards and policy parameters. In this paper, for parameter-based policy exploration, we use a Bayesian method to define an objective function that explicitly accounts for reward uncertainty. In addition, we provide an algorithm that uses a Bayesian method to optimize this function under the uncertainty of policy parameters in continuous state and action spaces. The results of numerical experiments show that the proposed method is more robust than comparing method against estimation errors on finite samples, because our proposal balances reward acquisition and exploration.
引用
收藏
页数:7
相关论文
共 25 条
  • [1] [Anonymous], 2016, CoRR abs/1606.01540
  • [2] [Anonymous], 2007, CONJUGATE BAYESIAN A
  • [3] Bengio Y., 2016, INT C LEARN REPR
  • [4] Bernardo Jose M., 2009, BAYESIAN THEORY, V405
  • [5] Fujimoto S, 2018, PR MACH LEARN RES, V80
  • [6] Haarnoja T, 2018, PR MACH LEARN RES, V80
  • [7] Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation
    Hansen, M
    Ostermeier, A
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON EVOLUTIONARY COMPUTATION (ICEC '96), PROCEEDINGS OF, 1996, : 312 - 317
  • [8] Honda J, 2014, JMLR WORKSH CONF PRO, V33, P375
  • [9] Imaizumi M, 2017, PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P4346
  • [10] Kong A., 1992, TECH REP