Variation-resistant Q-learning: Controlling and Utilizing Estimation Bias in Reinforcement Learning for Better Performance

被引:3
|
作者
Pentaliotis, Andreas [1 ]
Wiering, Marco [1 ]
机构
[1] Univ Groningen, Bernoulli Inst, Dept Artificial Intelligence, Nijenborgh 9, Groningen, Netherlands
关键词
Reinforcement Learning; Q-learning; Double Q-learning; Estimation Bias; Variation-resistant Q-learning;
D O I
10.5220/0010168000170028
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Q-learning is a reinforcement learning algorithm that has overestimation bias, because it learns the optimal action values by using a target that maximizes over uncertain action-value estimates. Although the overestimation bias of Q-learning is generally considered harmful, a recent study suggests that it could be either harmful or helpful depending on the reinforcement learning problem. In this paper, we propose a new Q-learning variant, called Variation-resistant Q-learning, to control and utilize estimation bias for better performance. Firstly, we present the tabular version of the algorithm and mathematically prove its convergence. Secondly, we combine the algorithm with function approximation. Finally, we present empirical results from three different experiments, in which we compared the performance of Variation-resistant Q-learning, Q-learning, and Double Q-learning. The empirical results show that Variation-resistant Q-learning can control and utilize estimation bias for better performance in the experimental tasks.
引用
收藏
页码:17 / 28
页数:12
相关论文
共 50 条
  • [21] Monte Carlo Bias Correction in Q-Learning
    Papadimitriou, Dimitris
    ARTIFICIAL GENERAL INTELLIGENCE, AGI 2022, 2023, 13539 : 343 - 352
  • [22] Enhanced Machine Learning Algorithms: Deep Learning, Reinforcement Learning, ana Q-Learning
    Park, Ji Su
    Park, Jong Hyuk
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2020, 16 (05): : 1001 - 1007
  • [23] Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback
    Wang, Hang
    Lin, Sen
    Zhang, Junshan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [24] Safe Q-Learning Approaches for Human-in-Loop Reinforcement Learning
    Veerabathraswamy, Swathi
    Bhatt, Nirav
    2023 NINTH INDIAN CONTROL CONFERENCE, ICC, 2023, : 16 - 21
  • [25] Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics
    Weissenbacher, Matthias
    Sinha, Samarth
    Garg, Animesh
    Kawahara, Yoshinobu
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [26] Comparing NARS and Reinforcement Learning: An Analysis of ONA and Q-Learning Algorithms
    Beikmohammadi, Ali
    Magnusson, Sindri
    ARTIFICIAL GENERAL INTELLIGENCE, AGI 2023, 2023, 13921 : 21 - 31
  • [27] Multi-Agent Reinforcement Learning - An Exploration Using Q-Learning
    Graham, Caoimhin
    Bell, David
    Luo, Zhihui
    RESEARCH AND DEVELOPMENT IN INTELLIGENT SYSTEMS XXVI: INCORPORATING APPLICATIONS AND INNOVATIONS IN INTELLIGENT SYSTEMS XVII, 2010, : 293 - 298
  • [28] An Investigation Into the Effect of the Learning Rate on Overestimation Bias of Connectionist Q-learning
    Chen, Yifei
    Schomaker, Lambert
    Wiering, Marco
    ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2, 2021, : 107 - 118
  • [29] Improving the efficiency of reinforcement learning for a spacecraft powered descent with Q-learning
    Wilson, Callum
    Riccardi, Annalisa
    OPTIMIZATION AND ENGINEERING, 2023, 24 (01) : 223 - 255
  • [30] Autonomous Driving in Roundabout Maneuvers Using Reinforcement Learning with Q-Learning
    Garcia Cuenca, Laura
    Puertas, Enrique
    Fernandez Andres, Javier
    Aliane, Nourdine
    ELECTRONICS, 2019, 8 (12)