Variation-resistant Q-learning: Controlling and Utilizing Estimation Bias in Reinforcement Learning for Better Performance

被引:3
|
作者
Pentaliotis, Andreas [1 ]
Wiering, Marco [1 ]
机构
[1] Univ Groningen, Bernoulli Inst, Dept Artificial Intelligence, Nijenborgh 9, Groningen, Netherlands
关键词
Reinforcement Learning; Q-learning; Double Q-learning; Estimation Bias; Variation-resistant Q-learning;
D O I
10.5220/0010168000170028
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Q-learning is a reinforcement learning algorithm that has overestimation bias, because it learns the optimal action values by using a target that maximizes over uncertain action-value estimates. Although the overestimation bias of Q-learning is generally considered harmful, a recent study suggests that it could be either harmful or helpful depending on the reinforcement learning problem. In this paper, we propose a new Q-learning variant, called Variation-resistant Q-learning, to control and utilize estimation bias for better performance. Firstly, we present the tabular version of the algorithm and mathematically prove its convergence. Secondly, we combine the algorithm with function approximation. Finally, we present empirical results from three different experiments, in which we compared the performance of Variation-resistant Q-learning, Q-learning, and Double Q-learning. The empirical results show that Variation-resistant Q-learning can control and utilize estimation bias for better performance in the experimental tasks.
引用
收藏
页码:17 / 28
页数:12
相关论文
共 50 条
  • [31] An inverse reinforcement learning framework with the Q-learning mechanism for the metaheuristic algorithm
    Zhao, Fuqing
    Wang, Qiaoyun
    Wang, Ling
    KNOWLEDGE-BASED SYSTEMS, 2023, 265
  • [32] Improving the efficiency of reinforcement learning for a spacecraft powered descent with Q-learning
    Callum Wilson
    Annalisa Riccardi
    Optimization and Engineering, 2023, 24 : 223 - 255
  • [33] Better value estimation in Q-learning-based multi-agent reinforcement learning
    Ding, Ling
    Du, Wei
    Zhang, Jian
    Guo, Lili
    Zhang, Chenglong
    Jin, Di
    Ding, Shifei
    SOFT COMPUTING, 2024, 28 (06) : 5625 - 5638
  • [34] Better value estimation in Q-learning-based multi-agent reinforcement learning
    Ling Ding
    Wei Du
    Jian Zhang
    Lili Guo
    Chenglong Zhang
    Di Jin
    Shifei Ding
    Soft Computing, 2024, 28 : 5625 - 5638
  • [35] Reinforcement distribution in a team of cooperative Q-learning agents
    Abbasi, Zahra
    Abbasi, Mohammad Ali
    PROCEEDINGS OF NINTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, 2008, : 154 - +
  • [36] The Sample Complexity of Teaching-by-Reinforcement on Q-Learning
    Zhang, Xuezhou
    Bharti, Shubham Kumar
    Ma, Yuzhe
    Singla, Adish
    Zhu, Xiaojin
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10939 - 10947
  • [37] Controlling Sequential Hybrid Evolutionary Algorithm by Q-Learning
    Zhang, Haotian
    Sun, Jianyong
    Back, Thomas
    Zhang, Qingfu
    Xu, Zongben
    IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2023, 18 (01) : 84 - 103
  • [38] Bias-Corrected Q-Learning With Multistate Extension
    Lee, Donghun
    Powell, Warren B.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (10) : 4011 - 4023
  • [39] Random Graphs Estimation using Q-Learning
    Babahaji, Mina
    Blouin, Stephane
    Lucia, Walter
    Asadi, M. Mehdi
    Mahboubi, Hamid
    Aghdam, Amir G.
    2021 IEEE INTERNATIONAL CONFERENCE ON WIRELESS FOR SPACE AND EXTREME ENVIRONMENTS (WISEE), 2021,
  • [40] Adaptive Estimation Q-learning with Uncertainty and Familiarity
    Gong, Xiaoyu
    Lu, Shuai
    Yu, Jiayu
    Zhu, Sheng
    Li, Zongze
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 3750 - 3758