Variation-resistant Q-learning: Controlling and Utilizing Estimation Bias in Reinforcement Learning for Better Performance

被引:3
|
作者
Pentaliotis, Andreas [1 ]
Wiering, Marco [1 ]
机构
[1] Univ Groningen, Bernoulli Inst, Dept Artificial Intelligence, Nijenborgh 9, Groningen, Netherlands
关键词
Reinforcement Learning; Q-learning; Double Q-learning; Estimation Bias; Variation-resistant Q-learning;
D O I
10.5220/0010168000170028
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Q-learning is a reinforcement learning algorithm that has overestimation bias, because it learns the optimal action values by using a target that maximizes over uncertain action-value estimates. Although the overestimation bias of Q-learning is generally considered harmful, a recent study suggests that it could be either harmful or helpful depending on the reinforcement learning problem. In this paper, we propose a new Q-learning variant, called Variation-resistant Q-learning, to control and utilize estimation bias for better performance. Firstly, we present the tabular version of the algorithm and mathematically prove its convergence. Secondly, we combine the algorithm with function approximation. Finally, we present empirical results from three different experiments, in which we compared the performance of Variation-resistant Q-learning, Q-learning, and Double Q-learning. The empirical results show that Variation-resistant Q-learning can control and utilize estimation bias for better performance in the experimental tasks.
引用
收藏
页码:17 / 28
页数:12
相关论文
共 50 条
  • [1] On the Estimation Bias in Double Q-Learning
    Ren, Zhizhou
    Zhu, Guangxiang
    Hu, Hao
    Han, Beining
    Chen, Jianglun
    Zhang, Chongjie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [2] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
    Tan, Fuxiao
    Yan, Pengfei
    Guan, Xinping
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
  • [3] A controlling estimation bias method: Max_Mix_Min estimator for Q-learning
    Abliz, Patigul
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (13): : 19248 - 19273
  • [4] A Meta-Learning Approach to Mitigating the Estimation Bias of Q-Learning
    Tan, Tao
    Xie, Hong
    Shi, Xiaoyu
    Shang, Mingsheng
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (09)
  • [5] Fuzzy Q-Learning for generalization of reinforcement learning
    Berenji, HR
    FUZZ-IEEE '96 - PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, 1996, : 2208 - 2214
  • [6] Deep Reinforcement Learning with Double Q-Learning
    van Hasselt, Hado
    Guez, Arthur
    Silver, David
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2094 - 2100
  • [7] Reinforcement learning guidance law of Q-learning
    Zhang Q.
    Ao B.
    Zhang Q.
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2020, 42 (02): : 414 - 419
  • [8] Bias-Corrected Q-Learning to Control Max-Operator Bias in Q-Learning
    Lee, Donghun
    Defourny, Boris
    Powell, Warren B.
    PROCEEDINGS OF THE 2013 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL), 2013, : 93 - 99
  • [9] Feasible Q-Learning for Average Reward Reinforcement Learning
    Jin, Ying
    Blanchet, Jose
    Gummadi, Ramki
    Zhou, Zhengyuan
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [10] Mildly Conservative Q-Learning for Offline Reinforcement Learning
    Lyu, Jiafei
    Ma, Xiaoteng
    Li, Xiu
    Lu, Zongqing
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,