Variation-resistant Q-learning: Controlling and Utilizing Estimation Bias in Reinforcement Learning for Better Performance

被引：3

作者：

Pentaliotis, Andreas ^{[1
]}

Wiering, Marco ^{[1
]}

机构：

[1] Univ Groningen, Bernoulli Inst, Dept Artificial Intelligence, Nijenborgh 9, Groningen, Netherlands

来源：

ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2 | 2021年

关键词：

Reinforcement Learning; Q-learning; Double Q-learning; Estimation Bias; Variation-resistant Q-learning;

D O I：

10.5220/0010168000170028

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Q-learning is a reinforcement learning algorithm that has overestimation bias, because it learns the optimal action values by using a target that maximizes over uncertain action-value estimates. Although the overestimation bias of Q-learning is generally considered harmful, a recent study suggests that it could be either harmful or helpful depending on the reinforcement learning problem. In this paper, we propose a new Q-learning variant, called Variation-resistant Q-learning, to control and utilize estimation bias for better performance. Firstly, we present the tabular version of the algorithm and mathematically prove its convergence. Secondly, we combine the algorithm with function approximation. Finally, we present empirical results from three different experiments, in which we compared the performance of Variation-resistant Q-learning, Q-learning, and Double Q-learning. The empirical results show that Variation-resistant Q-learning can control and utilize estimation bias for better performance in the experimental tasks.

引用

页码：17 / 28

页数：12

共 50 条

[41] Reinforcement Learning for Automatic Parameter Tuning in Apache Spark: A Q-Learning Approach
Deng, Mei
Huang, Zirui
Ren, Zhigang
2024 14TH ASIAN CONTROL CONFERENCE, ASCC 2024, 2024, : 13 - 18
[42] Deep Q-Learning Based Reinforcement Learning Approach for Network Intrusion Detection
Alavizadeh, Hooman
Alavizadeh, Hootan
Jang-Jaccard, Julian
COMPUTERS, 2022, 11 (03)
[43] Designing a Fuzzy Q-Learning Power Energy System Using Reinforcement Learning
J A.
Konduru S.
Kura V.
NagaJyothi G.
Dudi B.P.
Mani Naidu S.
International Journal of Fuzzy System Applications, 2022, 11 (03)
[44] Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning
Omura, Motoki
Osa, Takayuki
Mukuta, Yusuke
Harada, Tatsuya
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14474 - 14481
[45] Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity
Shi, Laixi
Li, Gen
Wei, Yuting
Chen, Yuxin
Chi, Yuejie
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[46] Heuristically accelerated Q-learning: A new approach to speed up reinforcement learning
Bianchi, RAC
Ribeiro, CHC
Costa, AHR
ADVANCES IN ARTIFICIAL INTELLIGENCE - SBIA 2004, 2004, 3171 : 245 - 254
[47] Reinforcement Learning-Based Multihop Relaying: A Decentralized Q-Learning Approach
Wang, Xiaowei
Wang, Xin
ENTROPY, 2021, 23 (10)
[48] Toward ensuring better learning performance in reinforcement learning
Zhang, Menglei
Nakamoto, Yukikazu
2022 TENTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING WORKSHOPS, CANDARW, 2022, : 134 - 139
[49] Analysis of the influence of the rate of learning and the factor of discount on the performance of Q-learning and SARSA algorithms: application of learning by reinforcement in autonomous navigation
Carvalho Ottoni, Andre Luiz
Nepomuceno, Erivelton Geraldo
de Oliveira, Marcos Santos
Cordeiro, Lara Toledo
Lamperti, Rubisson Duarte
REVISTA BRASILEIRA DE COMPUTACAO APLICADA, 2016, 8 (02): : 44 - 59
[50] Performance Investigation of UCB Policy in Q-Learning
Saito, Koki
Notsu, Akira
Ubukata, Seiki
Honda, Katsuhiro
2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 777 - 780

← 1 2 3 4 5 →