Evaluation of six methods for correcting bias in estimates from ensemble tree machine learning regression models

被引:57
作者
Belitz, K. [1 ]
Stackelberg, P. E. [2 ]
机构
[1] US Geol Survey, Carlisle, MA 01863 USA
[2] US Geol Survey, Troy, NY USA
关键词
Machine learning; Bias correction; Ensemble-tree methods; Groundwater; Water quality;
D O I
10.1016/j.envsoft.2021.105006
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Ensemble-tree machine learning (ML) regression models can be prone to systematic bias: small values are overestimated and large values are underestimated. Additional bias can be introduced if the dependent variable is a transform of the original data. Six methods were evaluated for their ability to correct systematic and introduced bias. Method performance was evaluated using four case studies of groundwater quality: the units of the dependent variable were pH in two and log-concentration in the others. When performance metrics (bias and RMSE for both points and the CDF) were computed using the same units as those in the ML model, empirical distribution matching (EDM) provided the best results. When the metrics were computed using retransformed concentration, EDM and a method incorporating Duan?s smearing estimate were both effective. A method based on the Z-score transform approximates EDM if the correlation coefficient between rank-ordered ML estimates and rank-ordered observations approaches one.
引用
收藏
页数:12
相关论文
共 45 条
  • [31] Phillips A., 2020, THESIS U TORONTO
  • [32] Statistical bias correction of global simulated daily precipitation and temperature for the application of hydrological models
    Piani, C.
    Weedon, G. P.
    Best, M.
    Gomes, S. M.
    Viterbo, P.
    Hagemann, S.
    Haerter, J. O.
    [J]. JOURNAL OF HYDROLOGY, 2010, 395 (3-4) : 199 - 215
  • [33] Raats M. M., 1991, Food Quality and Preference, V3, P89, DOI 10.1016/0950-3293(91)90028-D
  • [34] A hybrid machine learning model to predict and visualize nitrate concentration throughout the Central Valley aquifer, California, USA
    Ransom, Katherine M.
    Nolan, Bernard T.
    Traum, Jonathan A.
    Faunt, Claudia C.
    Bell, Andrew M.
    Gronberg, Jo Ann M.
    Wheeler, David C.
    Rosecrans, Celia Z.
    Jurgens, Bryant
    Schwarz, Gregory E.
    Belitz, Kenneth
    Eberts, Sandra M.
    Kourakos, George
    Harter, Thomas
    [J]. SCIENCE OF THE TOTAL ENVIRONMENT, 2017, 601 : 1160 - 1172
  • [35] Bias reduction in short records of satellite soil moisture
    Reichle, RH
    Koster, RD
    [J]. GEOPHYSICAL RESEARCH LETTERS, 2004, 31 (19) : L195011 - 4
  • [37] Machine Learning Predictions of pH in the Glacial Aquifer System, Northern USA
    Stackelberg, Paul E.
    Belitz, Kenneth
    Brown, Craig J.
    Erickson, Melinda L.
    Elliott, Sarah M.
    Kauffman, Leon J.
    Ransom, Katherine M.
    Reddy, James E.
    [J]. GROUNDWATER, 2021, 59 (03) : 352 - 368
  • [38] Building a landslide hazard indicator with machine learning and land surface models
    Stanley, T. A.
    Kirschbaum, D. B.
    Sobieszczyk, S.
    Jasinski, M. F.
    Borak, J. S.
    Slaughter, S. L.
    [J]. ENVIRONMENTAL MODELLING & SOFTWARE, 2020, 129
  • [39] U.S. Environmental Protection Agency, 2020, 2 DRINK WAT STAND GU
  • [40] U.S. Environmental Protection Agency, 2020, NAT PRIM DRINK WAT R