Variation-resistant Q-learning: Controlling and Utilizing Estimation Bias in Reinforcement Learning for Better Performance

被引：3

作者：

Pentaliotis, Andreas ^{[1
]}

Wiering, Marco ^{[1
]}

机构：

[1] Univ Groningen, Bernoulli Inst, Dept Artificial Intelligence, Nijenborgh 9, Groningen, Netherlands

来源：

ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2 | 2021年

关键词：

Reinforcement Learning; Q-learning; Double Q-learning; Estimation Bias; Variation-resistant Q-learning;

D O I：

10.5220/0010168000170028

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Q-learning is a reinforcement learning algorithm that has overestimation bias, because it learns the optimal action values by using a target that maximizes over uncertain action-value estimates. Although the overestimation bias of Q-learning is generally considered harmful, a recent study suggests that it could be either harmful or helpful depending on the reinforcement learning problem. In this paper, we propose a new Q-learning variant, called Variation-resistant Q-learning, to control and utilize estimation bias for better performance. Firstly, we present the tabular version of the algorithm and mathematically prove its convergence. Secondly, we combine the algorithm with function approximation. Finally, we present empirical results from three different experiments, in which we compared the performance of Variation-resistant Q-learning, Q-learning, and Double Q-learning. The empirical results show that Variation-resistant Q-learning can control and utilize estimation bias for better performance in the experimental tasks.

引用

页码：17 / 28

页数：12

共 50 条

[21] Monte Carlo Bias Correction in Q-Learning
Papadimitriou, Dimitris
ARTIFICIAL GENERAL INTELLIGENCE, AGI 2022, 2023, 13539 : 343 - 352
[22] Enhanced Machine Learning Algorithms: Deep Learning, Reinforcement Learning, ana Q-Learning
Park, Ji Su
Park, Jong Hyuk
JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2020, 16 (05): : 1001 - 1007
[23] Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback
Wang, Hang
Lin, Sen
Zhang, Junshan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[24] Safe Q-Learning Approaches for Human-in-Loop Reinforcement Learning
Veerabathraswamy, Swathi
Bhatt, Nirav
2023 NINTH INDIAN CONTROL CONFERENCE, ICC, 2023, : 16 - 21
[25] Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics
Weissenbacher, Matthias
Sinha, Samarth
Garg, Animesh
Kawahara, Yoshinobu
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[26] Comparing NARS and Reinforcement Learning: An Analysis of ONA and Q-Learning Algorithms
Beikmohammadi, Ali
Magnusson, Sindri
ARTIFICIAL GENERAL INTELLIGENCE, AGI 2023, 2023, 13921 : 21 - 31
[27] Multi-Agent Reinforcement Learning - An Exploration Using Q-Learning
Graham, Caoimhin
Bell, David
Luo, Zhihui
RESEARCH AND DEVELOPMENT IN INTELLIGENT SYSTEMS XXVI: INCORPORATING APPLICATIONS AND INNOVATIONS IN INTELLIGENT SYSTEMS XVII, 2010, : 293 - 298
[28] An Investigation Into the Effect of the Learning Rate on Overestimation Bias of Connectionist Q-learning
Chen, Yifei
Schomaker, Lambert
Wiering, Marco
ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2, 2021, : 107 - 118
[29] Improving the efficiency of reinforcement learning for a spacecraft powered descent with Q-learning
Wilson, Callum
Riccardi, Annalisa
OPTIMIZATION AND ENGINEERING, 2023, 24 (01) : 223 - 255
[30] Autonomous Driving in Roundabout Maneuvers Using Reinforcement Learning with Q-Learning
Garcia Cuenca, Laura
Puertas, Enrique
Fernandez Andres, Javier
Aliane, Nourdine
ELECTRONICS, 2019, 8 (12)

← 1 2 3 4 5 →