Controlling Underestimation Bias in Reinforcement Learning via Quasi-median Operation

被引:0
|
作者
Wei, Wei [1 ]
Zhang, Yujia [1 ]
Liang, Jiye [1 ]
Li, Lin [1 ]
Li, Yuze [1 ]
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Taiyuan 030006, Peoples R China
关键词
LEVEL; GAME; GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
How to get a good value estimation is one of the key problems in reinforcement learning (RL). Current off-policy methods, such as Maxmin Q-learning, TD3, and TADD, suffer from the underestimation problem when solving the overestimation problem. In this paper, we propose the Quasi-Median Operation, a novel way to mitigate the underestimation bias by selecting the quasi-median from multiple state-action values. Based on the quasi-median operation, we propose Quasi-Median Q-learning (QMQ) for the discrete action tasks and Quasi-Median Delayed Deep Deterministic Policy Gradient (QMD3) for the continuous action tasks. Theoretically, the underestimation bias of our method is improved while the estimation variance is significantly reduced compared to Maxmin Q-learning, TD3, and TADD. We conduct extensive experiments on the discrete and continuous action tasks, and results show that our method outperforms the state-of-the-art methods.
引用
收藏
页码:8621 / 8628
页数:8
相关论文
共 21 条
  • [1] Controlling underestimation bias in reinforcement learning via minmax operation
    Huang, Fanghui
    He, Yixin
    Zhang, Yu
    Deng, Xinyang
    Jiang, Wen
    CHINESE JOURNAL OF AERONAUTICS, 2024, 37 (07) : 406 - 417
  • [2] Controlling underestimation bias in reinforcement learning via minmax operation
    Fanghui HUANG
    Yixin HE
    Yu ZHANG
    Xinyang DENG
    Wen JIANG
    Chinese Journal of Aeronautics, 2024, 37 (07) : 406 - 417
  • [3] Controlling estimation error in reinforcement learning via Reinforced Operation
    Zhang, Yujia
    Li, Lin
    Wei, Wei
    You, Xiu
    Liang, Jiye
    INFORMATION SCIENCES, 2024, 675
  • [4] Controlling the Risk of Conversational Search via Reinforcement Learning
    Wang, Zhenduo
    Ai, Qingyao
    PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), 2021, : 1968 - 1977
  • [5] Controlling Rayleigh-Benard convection via reinforcement learning
    Beintema, Gerben
    Corbetta, Alessandro
    Biferale, Luca
    Toschi, Federico
    JOURNAL OF TURBULENCE, 2020, 21 (9-10): : 585 - 605
  • [6] Optimal Automatic Train Operation Via Deep Reinforcement Learning
    Zhou, Rui
    Song, Shiji
    PROCEEDINGS OF 2018 TENTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2018, : 103 - 108
  • [7] Smarter and Safer Traffic Signal Controlling via Deep Reinforcement Learning
    Yu, Bingquan
    Guo, Jinqiu
    Zhao, Qinpei
    Li, Jiangfeng
    Rao, Weixiong
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 3345 - 3348
  • [8] Controlling colloidal crystals via morphing energy landscapes and reinforcement learning
    Zhang, Jianli
    Yang, Junyan
    Zhang, Yuanxing
    Bevan, Michael A.
    SCIENCE ADVANCES, 2020, 6 (48)
  • [9] Community energy storage operation via reinforcement learning with eligibility traces
    Duque, Edgar Mauricio Salazar
    Giraldo, Juan S.
    Vergara, Pedro P.
    Nguyen, Phuong
    van der Molen, Anne
    Slootweg, Han
    ELECTRIC POWER SYSTEMS RESEARCH, 2022, 212
  • [10] Variation-resistant Q-learning: Controlling and Utilizing Estimation Bias in Reinforcement Learning for Better Performance
    Pentaliotis, Andreas
    Wiering, Marco
    ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2, 2021, : 17 - 28