Variation-resistant Q-learning: Controlling and Utilizing Estimation Bias in Reinforcement Learning for Better Performance

被引：3

作者：

Pentaliotis, Andreas ^{[1
]}

Wiering, Marco ^{[1
]}

机构：

[1] Univ Groningen, Bernoulli Inst, Dept Artificial Intelligence, Nijenborgh 9, Groningen, Netherlands

来源：

ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2 | 2021年

关键词：

Reinforcement Learning; Q-learning; Double Q-learning; Estimation Bias; Variation-resistant Q-learning;

D O I：

10.5220/0010168000170028

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Q-learning is a reinforcement learning algorithm that has overestimation bias, because it learns the optimal action values by using a target that maximizes over uncertain action-value estimates. Although the overestimation bias of Q-learning is generally considered harmful, a recent study suggests that it could be either harmful or helpful depending on the reinforcement learning problem. In this paper, we propose a new Q-learning variant, called Variation-resistant Q-learning, to control and utilize estimation bias for better performance. Firstly, we present the tabular version of the algorithm and mathematically prove its convergence. Secondly, we combine the algorithm with function approximation. Finally, we present empirical results from three different experiments, in which we compared the performance of Variation-resistant Q-learning, Q-learning, and Double Q-learning. The empirical results show that Variation-resistant Q-learning can control and utilize estimation bias for better performance in the experimental tasks.

引用

页码：17 / 28

页数：12

共 50 条

[1] On the Estimation Bias in Double Q-Learning
Ren, Zhizhou
Zhu, Guangxiang
Hu, Hao
Han, Beining
Chen, Jianglun
Zhang, Chongjie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[2] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Tan, Fuxiao
Yan, Pengfei
Guan, Xinping
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
[3] A controlling estimation bias method: Max_Mix_Min estimator for Q-learning
Abliz, Patigul
JOURNAL OF SUPERCOMPUTING, 2024, 80 (13): : 19248 - 19273
[4] A Meta-Learning Approach to Mitigating the Estimation Bias of Q-Learning
Tan, Tao
Xie, Hong
Shi, Xiaoyu
Shang, Mingsheng
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (09)
[5] Fuzzy Q-Learning for generalization of reinforcement learning
Berenji, HR
FUZZ-IEEE '96 - PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, 1996, : 2208 - 2214
[6] Deep Reinforcement Learning with Double Q-Learning
van Hasselt, Hado
Guez, Arthur
Silver, David
THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2094 - 2100
[7] Reinforcement learning guidance law of Q-learning
Zhang Q.
Ao B.
Zhang Q.
Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2020, 42 (02): : 414 - 419
[8] Bias-Corrected Q-Learning to Control Max-Operator Bias in Q-Learning
Lee, Donghun
Defourny, Boris
Powell, Warren B.
PROCEEDINGS OF THE 2013 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL), 2013, : 93 - 99
[9] Feasible Q-Learning for Average Reward Reinforcement Learning
Jin, Ying
Blanchet, Jose
Gummadi, Ramki
Zhou, Zhengyuan
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[10] Mildly Conservative Q-Learning for Offline Reinforcement Learning
Lyu, Jiafei
Ma, Xiaoteng
Li, Xiu
Lu, Zongqing
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,

← 1 2 3 4 5 →