Variation-resistant Q-learning: Controlling and Utilizing Estimation Bias in Reinforcement Learning for Better Performance

被引：3

作者：

Pentaliotis, Andreas ^{[1
]}

Wiering, Marco ^{[1
]}

机构：

[1] Univ Groningen, Bernoulli Inst, Dept Artificial Intelligence, Nijenborgh 9, Groningen, Netherlands

来源：

ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2 | 2021年

关键词：

Reinforcement Learning; Q-learning; Double Q-learning; Estimation Bias; Variation-resistant Q-learning;

D O I：

10.5220/0010168000170028

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Q-learning is a reinforcement learning algorithm that has overestimation bias, because it learns the optimal action values by using a target that maximizes over uncertain action-value estimates. Although the overestimation bias of Q-learning is generally considered harmful, a recent study suggests that it could be either harmful or helpful depending on the reinforcement learning problem. In this paper, we propose a new Q-learning variant, called Variation-resistant Q-learning, to control and utilize estimation bias for better performance. Firstly, we present the tabular version of the algorithm and mathematically prove its convergence. Secondly, we combine the algorithm with function approximation. Finally, we present empirical results from three different experiments, in which we compared the performance of Variation-resistant Q-learning, Q-learning, and Double Q-learning. The empirical results show that Variation-resistant Q-learning can control and utilize estimation bias for better performance in the experimental tasks.

引用

页码：17 / 28

页数：12

共 50 条

[31] An inverse reinforcement learning framework with the Q-learning mechanism for the metaheuristic algorithm
Zhao, Fuqing
Wang, Qiaoyun
Wang, Ling
KNOWLEDGE-BASED SYSTEMS, 2023, 265
[32] Improving the efficiency of reinforcement learning for a spacecraft powered descent with Q-learning
Callum Wilson
Annalisa Riccardi
Optimization and Engineering, 2023, 24 : 223 - 255
[33] Better value estimation in Q-learning-based multi-agent reinforcement learning
Ding, Ling
Du, Wei
Zhang, Jian
Guo, Lili
Zhang, Chenglong
Jin, Di
Ding, Shifei
SOFT COMPUTING, 2024, 28 (06) : 5625 - 5638
[34] Better value estimation in Q-learning-based multi-agent reinforcement learning
Ling Ding
Wei Du
Jian Zhang
Lili Guo
Chenglong Zhang
Di Jin
Shifei Ding
Soft Computing, 2024, 28 : 5625 - 5638
[35] Reinforcement distribution in a team of cooperative Q-learning agents
Abbasi, Zahra
Abbasi, Mohammad Ali
PROCEEDINGS OF NINTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, 2008, : 154 - +
[36] The Sample Complexity of Teaching-by-Reinforcement on Q-Learning
Zhang, Xuezhou
Bharti, Shubham Kumar
Ma, Yuzhe
Singla, Adish
Zhu, Xiaojin
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10939 - 10947
[37] Controlling Sequential Hybrid Evolutionary Algorithm by Q-Learning
Zhang, Haotian
Sun, Jianyong
Back, Thomas
Zhang, Qingfu
Xu, Zongben
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2023, 18 (01) : 84 - 103
[38] Bias-Corrected Q-Learning With Multistate Extension
Lee, Donghun
Powell, Warren B.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (10) : 4011 - 4023
[39] Random Graphs Estimation using Q-Learning
Babahaji, Mina
Blouin, Stephane
Lucia, Walter
Asadi, M. Mehdi
Mahboubi, Hamid
Aghdam, Amir G.
2021 IEEE INTERNATIONAL CONFERENCE ON WIRELESS FOR SPACE AND EXTREME ENVIRONMENTS (WISEE), 2021,
[40] Adaptive Estimation Q-learning with Uncertainty and Familiarity
Gong, Xiaoyu
Lu, Shuai
Yu, Jiayu
Zhu, Sheng
Li, Zongze
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 3750 - 3758

← 1 2 3 4 5 →