Bias-Corrected Q-Learning With Multistate Extension

被引:12
|
作者
Lee, Donghun [1 ]
Powell, Warren B. [2 ]
机构
[1] Princeton Univ, Dept Comp Sci, Comp Sci, Princeton, NJ 08540 USA
[2] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08540 USA
关键词
Bias correction; electricity storage; Q-learning; smart grid; STOCHASTIC-APPROXIMATION; CONVERGENCE; RATES;
D O I
10.1109/TAC.2019.2912443
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Q-learning is a sample-based model-free algorithm that solves Markov decision problems asymptotically, but in finite time, it can perform poorly when random rewards and transitions result in large variance of value estimates. We pinpoint its cause to be the estimation bias due to the maximum operator in Q-learning algorithm, and present the evidence of max-operator bias in its Q value estimates. We then present an asymptotically optimal bias-correction strategy and construct an extension to bias-corrected Q-learning algorithm to multistate Markov decision processes, with asymptotic convergence properties as strong as those from Q-learning. We report the empirical performance of the bias-corrected Q-learning algorithm with multistate extension in two model problems: A multiarmed bandit version of Roulette and an electricity storage control simulation. The bias-corrected Q-learning algorithm with multistate extension is shown to control max-operator bias effectively, where the bias-resistance can be tuned predictably by adjusting a correction parameter.
引用
收藏
页码:4011 / 4023
页数:13
相关论文
共 50 条
  • [21] Bias-corrected score decomposition for generalized quantiles
    Ehm, W.
    Ovcharov, E. Y.
    BIOMETRIKA, 2017, 104 (02) : 473 - 480
  • [22] A BIAS-CORRECTED NONPARAMETRIC ENVELOPMENT ESTIMATOR OF FRONTIERS
    Badin, Luiza
    Simar, Leopold
    ECONOMETRIC THEORY, 2009, 25 (05) : 1289 - 1318
  • [23] Comparing Alternative Corrections for Bias in the Bias-Corrected Bootstrap Test of Mediation
    Chen, Donna
    Fritz, Matthew S.
    EVALUATION & THE HEALTH PROFESSIONS, 2021, 44 (04) : 416 - 427
  • [24] A Meta-Learning Approach to Mitigating the Estimation Bias of Q-Learning
    Tan, Tao
    Xie, Hong
    Shi, Xiaoyu
    Shang, Mingsheng
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (09)
  • [25] An Investigation Into the Effect of the Learning Rate on Overestimation Bias of Connectionist Q-learning
    Chen, Yifei
    Schomaker, Lambert
    Wiering, Marco
    ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2, 2021, : 107 - 118
  • [26] Bias-corrected estimation in dynamic panel data models
    Bun, MJG
    Carree, MA
    JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2005, 23 (02) : 200 - 210
  • [27] Stable and bias-corrected estimation for nonparametric regression models
    Lin, Lu
    Li, Feng
    JOURNAL OF NONPARAMETRIC STATISTICS, 2008, 20 (04) : 283 - 303
  • [28] Bias-corrected estimators for monotone and concave frontier functions
    Peng, L
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2004, 119 (02) : 263 - 275
  • [29] A Bias-Corrected Estimator of Competitive Balance in Sports Leagues
    Lee, Young Hoon
    Kim, Yongdai
    Kim, Sara
    JOURNAL OF SPORTS ECONOMICS, 2019, 20 (04) : 479 - 508
  • [30] Bias-corrected estimation of stable tail dependence function
    Beirlant, Jan
    Escobar-Bach, Mikael
    Goegebeur, Yuri
    Guillou, Armelle
    JOURNAL OF MULTIVARIATE ANALYSIS, 2016, 143 : 453 - 466