Bias-Corrected Q-Learning With Multistate Extension

被引:12
|
作者
Lee, Donghun [1 ]
Powell, Warren B. [2 ]
机构
[1] Princeton Univ, Dept Comp Sci, Comp Sci, Princeton, NJ 08540 USA
[2] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08540 USA
关键词
Bias correction; electricity storage; Q-learning; smart grid; STOCHASTIC-APPROXIMATION; CONVERGENCE; RATES;
D O I
10.1109/TAC.2019.2912443
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Q-learning is a sample-based model-free algorithm that solves Markov decision problems asymptotically, but in finite time, it can perform poorly when random rewards and transitions result in large variance of value estimates. We pinpoint its cause to be the estimation bias due to the maximum operator in Q-learning algorithm, and present the evidence of max-operator bias in its Q value estimates. We then present an asymptotically optimal bias-correction strategy and construct an extension to bias-corrected Q-learning algorithm to multistate Markov decision processes, with asymptotic convergence properties as strong as those from Q-learning. We report the empirical performance of the bias-corrected Q-learning algorithm with multistate extension in two model problems: A multiarmed bandit version of Roulette and an electricity storage control simulation. The bias-corrected Q-learning algorithm with multistate extension is shown to control max-operator bias effectively, where the bias-resistance can be tuned predictably by adjusting a correction parameter.
引用
收藏
页码:4011 / 4023
页数:13
相关论文
共 50 条
  • [31] Performance Comparison of Bias-Corrected Satellite Precipitation Products by Various Deep Learning Schemes
    Le, Xuan-Hien
    Nguyen, Duc Hai
    Lee, Giha
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [32] Bias-corrected bootstrap prediction regions for vector autoregression
    Kim, JH
    JOURNAL OF FORECASTING, 2004, 23 (02) : 141 - 154
  • [33] Bias-Corrected Matching Estimators for Average Treatment Effects
    Abadie, Alberto
    Imbens, Guido W.
    JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2011, 29 (01) : 1 - 11
  • [34] Bias-corrected estimators for dispersion models with dispersion covariates
    Simas, Alexandre B.
    Rocha, Andrea V.
    Barreto-Souza, Wagner
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2011, 141 (09) : 3063 - 3074
  • [35] Two bias-corrected Kaplan-Meier estimators
    Jiang, Renyan
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2022, 38 (06) : 2939 - 2952
  • [36] Robust and bias-corrected estimation of the coefficient of tail dependence
    Dutang, Christophe
    Goegebeur, Yuri
    Guillou, Armelle
    INSURANCE MATHEMATICS & ECONOMICS, 2014, 57 : 46 - 57
  • [37] Siberian Snow Forcing in a Dynamically Bias-Corrected Model
    Tyrrell, Nicholas L.
    Karpechko, Alexey Yu
    Rast, Sebastian
    JOURNAL OF CLIMATE, 2020, 33 (24) : 10455 - 10467
  • [38] Bias-corrected estimation for speculative bubbles in stock prices
    Kruse, Robinson
    Kaufmann, Hendrik
    Wegener, Christoph
    ECONOMIC MODELLING, 2018, 73 : 354 - 364
  • [39] MRBEE: A bias-corrected multivariable Mendelian randomization method
    Lorincz-Comi, Noah
    Yang, Yihe
    Li, Gen
    Zhu, Xiaofeng
    HUMAN GENETICS AND GENOMICS ADVANCES, 2024, 5 (03):
  • [40] Bias-corrected confidence intervals for wildlife abundance estimation
    Mack, YP
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2002, 31 (07) : 1107 - 1122