Bias-Corrected Q-Learning With Multistate Extension

被引:12
|
作者
Lee, Donghun [1 ]
Powell, Warren B. [2 ]
机构
[1] Princeton Univ, Dept Comp Sci, Comp Sci, Princeton, NJ 08540 USA
[2] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08540 USA
关键词
Bias correction; electricity storage; Q-learning; smart grid; STOCHASTIC-APPROXIMATION; CONVERGENCE; RATES;
D O I
10.1109/TAC.2019.2912443
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Q-learning is a sample-based model-free algorithm that solves Markov decision problems asymptotically, but in finite time, it can perform poorly when random rewards and transitions result in large variance of value estimates. We pinpoint its cause to be the estimation bias due to the maximum operator in Q-learning algorithm, and present the evidence of max-operator bias in its Q value estimates. We then present an asymptotically optimal bias-correction strategy and construct an extension to bias-corrected Q-learning algorithm to multistate Markov decision processes, with asymptotic convergence properties as strong as those from Q-learning. We report the empirical performance of the bias-corrected Q-learning algorithm with multistate extension in two model problems: A multiarmed bandit version of Roulette and an electricity storage control simulation. The bias-corrected Q-learning algorithm with multistate extension is shown to control max-operator bias effectively, where the bias-resistance can be tuned predictably by adjusting a correction parameter.
引用
收藏
页码:4011 / 4023
页数:13
相关论文
共 50 条
  • [1] Bias-Corrected Q-Learning to Control Max-Operator Bias in Q-Learning
    Lee, Donghun
    Defourny, Boris
    Powell, Warren B.
    PROCEEDINGS OF THE 2013 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL), 2013, : 93 - 99
  • [2] Bias-corrected realized variance
    Yeh, Jin-Huei
    Wang, Jying-Nan
    ECONOMETRIC REVIEWS, 2019, 38 (02) : 170 - 192
  • [3] On the Estimation Bias in Double Q-Learning
    Ren, Zhizhou
    Zhu, Guangxiang
    Hu, Hao
    Han, Beining
    Chen, Jianglun
    Zhang, Chongjie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [4] Bias-corrected bootstrap and model uncertainty
    Steck, H
    Jaakkola, TS
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 16, 2004, 16 : 521 - 528
  • [5] A bias-corrected precipitation climatology for China
    Ye, BS
    Yang, DQ
    Ding, YJ
    Han, TD
    Koike, T
    JOURNAL OF HYDROMETEOROLOGY, 2004, 5 (06) : 1147 - 1160
  • [6] Bias-corrected random forests in regression
    Zhang, Guoyi
    Lu, Yan
    JOURNAL OF APPLIED STATISTICS, 2012, 39 (01) : 151 - 160
  • [7] Bias-corrected estimates of GED returns
    Heckman, James J.
    LaFontaine, Paul A.
    JOURNAL OF LABOR ECONOMICS, 2006, 24 (03) : 661 - 700
  • [8] A bias-corrected decomposition of the Brier score
    Ferro, C. A. T.
    Fricker, T. E.
    QUARTERLY JOURNAL OF THE ROYAL METEOROLOGICAL SOCIETY, 2012, 138 (668) : 1954 - 1960
  • [9] DOUBLE JACKKNIFE BIAS-CORRECTED ESTIMATORS
    BERG, BA
    COMPUTER PHYSICS COMMUNICATIONS, 1992, 69 (01) : 7 - 14
  • [10] Comparison of bias-corrected multisatellite precipitation products by deep learning framework
    Le, Xuan-Hien
    Van, Linh Nguyen
    Nguyen, Duc Hai
    Nguyen, Giang V.
    Jung, Sungho
    Lee, Giha
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2023, 116