Bias-Corrected Q-Learning With Multistate Extension

被引:12
|
作者
Lee, Donghun [1 ]
Powell, Warren B. [2 ]
机构
[1] Princeton Univ, Dept Comp Sci, Comp Sci, Princeton, NJ 08540 USA
[2] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08540 USA
关键词
Bias correction; electricity storage; Q-learning; smart grid; STOCHASTIC-APPROXIMATION; CONVERGENCE; RATES;
D O I
10.1109/TAC.2019.2912443
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Q-learning is a sample-based model-free algorithm that solves Markov decision problems asymptotically, but in finite time, it can perform poorly when random rewards and transitions result in large variance of value estimates. We pinpoint its cause to be the estimation bias due to the maximum operator in Q-learning algorithm, and present the evidence of max-operator bias in its Q value estimates. We then present an asymptotically optimal bias-correction strategy and construct an extension to bias-corrected Q-learning algorithm to multistate Markov decision processes, with asymptotic convergence properties as strong as those from Q-learning. We report the empirical performance of the bias-corrected Q-learning algorithm with multistate extension in two model problems: A multiarmed bandit version of Roulette and an electricity storage control simulation. The bias-corrected Q-learning algorithm with multistate extension is shown to control max-operator bias effectively, where the bias-resistance can be tuned predictably by adjusting a correction parameter.
引用
收藏
页码:4011 / 4023
页数:13
相关论文
共 50 条
  • [41] Croon's Bias-Corrected Estimation of Latent Interactions
    Cox, Kyle
    Kelcey, Benjamin
    STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 2021, 28 (06) : 863 - 874
  • [42] A bias-corrected estimator in multiple imputation for missing data
    Tomita, Hiroaki
    Fujisawa, Hironori
    Henmi, Masayuki
    STATISTICS IN MEDICINE, 2018, 37 (23) : 3373 - 3386
  • [43] A Bias-Corrected Estimator for the Crosswise Model with Inattentive Respondents
    Atsusaka, Yuki
    Stevenson, Randolph T.
    POLITICAL ANALYSIS, 2023, 31 (01) : 134 - 148
  • [44] A Bias-Corrected Net Reclassification Improvement for Clinical Subgroups
    Paynter, Nina P.
    Cook, Nancy R.
    MEDICAL DECISION MAKING, 2013, 33 (02) : 154 - 162
  • [45] Bias-Corrected Estimation of Price Impact in Securities Litigation
    Dove, Taylor
    Heath, Davidson
    Heaton, J. B.
    AMERICAN LAW AND ECONOMICS REVIEW, 2019, 21 (01) : 184 - 208
  • [46] A bias-corrected histogram estimator for line transect sampling
    Eidous, Omar
    Al-Eibood, Fahid
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2018, 47 (15) : 3675 - 3686
  • [47] Bias-corrected maximum likelihood estimation for the beta distribution
    Cordeiro, GM
    DaRocha, EC
    DaRocha, JGC
    CribariNeto, F
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 1997, 58 (01) : 21 - 35
  • [48] Risk Aversion Operator for Addressing Maximization Bias in Q-Learning
    Wang, Bi
    Li, Xuelian
    Gao, Zhiqiang
    Zhong, Yangjun
    IEEE ACCESS, 2020, 8 : 43098 - 43110
  • [49] UCB Momentum Q-learning: Correcting the bias without forgetting
    Menard, Pierre
    Domingues, Omar Darwiche
    Shang, Xuedong
    Valko, Michal
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [50] Q-LEARNING
    WATKINS, CJCH
    DAYAN, P
    MACHINE LEARNING, 1992, 8 (3-4) : 279 - 292