Variation-resistant Q-learning: Controlling and Utilizing Estimation Bias in Reinforcement Learning for Better Performance

被引:3
|
作者
Pentaliotis, Andreas [1 ]
Wiering, Marco [1 ]
机构
[1] Univ Groningen, Bernoulli Inst, Dept Artificial Intelligence, Nijenborgh 9, Groningen, Netherlands
关键词
Reinforcement Learning; Q-learning; Double Q-learning; Estimation Bias; Variation-resistant Q-learning;
D O I
10.5220/0010168000170028
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Q-learning is a reinforcement learning algorithm that has overestimation bias, because it learns the optimal action values by using a target that maximizes over uncertain action-value estimates. Although the overestimation bias of Q-learning is generally considered harmful, a recent study suggests that it could be either harmful or helpful depending on the reinforcement learning problem. In this paper, we propose a new Q-learning variant, called Variation-resistant Q-learning, to control and utilize estimation bias for better performance. Firstly, we present the tabular version of the algorithm and mathematically prove its convergence. Secondly, we combine the algorithm with function approximation. Finally, we present empirical results from three different experiments, in which we compared the performance of Variation-resistant Q-learning, Q-learning, and Double Q-learning. The empirical results show that Variation-resistant Q-learning can control and utilize estimation bias for better performance in the experimental tasks.
引用
收藏
页码:17 / 28
页数:12
相关论文
共 50 条
  • [41] Reinforcement Learning for Automatic Parameter Tuning in Apache Spark: A Q-Learning Approach
    Deng, Mei
    Huang, Zirui
    Ren, Zhigang
    2024 14TH ASIAN CONTROL CONFERENCE, ASCC 2024, 2024, : 13 - 18
  • [42] Deep Q-Learning Based Reinforcement Learning Approach for Network Intrusion Detection
    Alavizadeh, Hooman
    Alavizadeh, Hootan
    Jang-Jaccard, Julian
    COMPUTERS, 2022, 11 (03)
  • [43] Designing a Fuzzy Q-Learning Power Energy System Using Reinforcement Learning
    J A.
    Konduru S.
    Kura V.
    NagaJyothi G.
    Dudi B.P.
    Mani Naidu S.
    International Journal of Fuzzy System Applications, 2022, 11 (03)
  • [44] Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning
    Omura, Motoki
    Osa, Takayuki
    Mukuta, Yusuke
    Harada, Tatsuya
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14474 - 14481
  • [45] Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity
    Shi, Laixi
    Li, Gen
    Wei, Yuting
    Chen, Yuxin
    Chi, Yuejie
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [46] Heuristically accelerated Q-learning: A new approach to speed up reinforcement learning
    Bianchi, RAC
    Ribeiro, CHC
    Costa, AHR
    ADVANCES IN ARTIFICIAL INTELLIGENCE - SBIA 2004, 2004, 3171 : 245 - 254
  • [47] Reinforcement Learning-Based Multihop Relaying: A Decentralized Q-Learning Approach
    Wang, Xiaowei
    Wang, Xin
    ENTROPY, 2021, 23 (10)
  • [48] Toward ensuring better learning performance in reinforcement learning
    Zhang, Menglei
    Nakamoto, Yukikazu
    2022 TENTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING WORKSHOPS, CANDARW, 2022, : 134 - 139
  • [49] Analysis of the influence of the rate of learning and the factor of discount on the performance of Q-learning and SARSA algorithms: application of learning by reinforcement in autonomous navigation
    Carvalho Ottoni, Andre Luiz
    Nepomuceno, Erivelton Geraldo
    de Oliveira, Marcos Santos
    Cordeiro, Lara Toledo
    Lamperti, Rubisson Duarte
    REVISTA BRASILEIRA DE COMPUTACAO APLICADA, 2016, 8 (02): : 44 - 59
  • [50] Performance Investigation of UCB Policy in Q-Learning
    Saito, Koki
    Notsu, Akira
    Ubukata, Seiki
    Honda, Katsuhiro
    2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 777 - 780