Evaluation of six methods for correcting bias in estimates from ensemble tree machine learning regression models

被引:57
作者
Belitz, K. [1 ]
Stackelberg, P. E. [2 ]
机构
[1] US Geol Survey, Carlisle, MA 01863 USA
[2] US Geol Survey, Troy, NY USA
关键词
Machine learning; Bias correction; Ensemble-tree methods; Groundwater; Water quality;
D O I
10.1016/j.envsoft.2021.105006
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Ensemble-tree machine learning (ML) regression models can be prone to systematic bias: small values are overestimated and large values are underestimated. Additional bias can be introduced if the dependent variable is a transform of the original data. Six methods were evaluated for their ability to correct systematic and introduced bias. Method performance was evaluated using four case studies of groundwater quality: the units of the dependent variable were pH in two and log-concentration in the others. When performance metrics (bias and RMSE for both points and the CDF) were computed using the same units as those in the ML model, empirical distribution matching (EDM) provided the best results. When the metrics were computed using retransformed concentration, EDM and a method incorporating Duan?s smearing estimate were both effective. A method based on the Z-score transform approximates EDM if the correlation coefficient between rank-ordered ML estimates and rank-ordered observations approaches one.
引用
收藏
页数:12
相关论文
共 45 条
  • [1] Belitz K., 2021, **DATA OBJECT**, DOI [10.5066/P9LCTYI2, DOI 10.5066/P9LCTYI2]
  • [2] Multiorder Hydrologic Position in the Conterminous United States: A Set of Metrics in Support of Groundwater Mapping at Regional and National Scales
    Belitz, Kenneth
    Moore, Richard B.
    Arnold, Terri L.
    Sharpe, Jennifer B.
    Starn, J. J.
    [J]. WATER RESOURCES RESEARCH, 2019, 55 (12) : 11188 - 11207
  • [3] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [4] Object-based correction of LiDAR DEMs using RTK-GPS data and machine learning modeling in the coastal Everglades
    Cooper, Hannah M.
    Zhang, Caiyun
    Davis, Stephen E.
    Troxler, Tiffany G.
    [J]. ENVIRONMENTAL MODELLING & SOFTWARE, 2019, 112 : 179 - 191
  • [5] Machine-learning models to map pH and redox conditions in groundwater in a layered aquifer system, Northern Atlantic Coastal Plain, eastern USA
    DeSimone, Leslie A.
    Pope, Jason P.
    Ransom, Katherine M.
    [J]. JOURNAL OF HYDROLOGY-REGIONAL STUDIES, 2020, 30
  • [7] Forecasting urban household water demand with statistical and machine learning methods using large space-time data: A Comparative study
    Duerr, Isaac
    Merrill, Hunter R.
    Wang, Chuan
    Bai, Ray
    Boyer, Mackenzie
    Dukes, Michael D.
    Bliznyuk, Nikolay
    [J]. ENVIRONMENTAL MODELLING & SOFTWARE, 2018, 102 : 29 - 38
  • [8] Machine learning-based integration of remotely-sensed drought factors can improve the estimation of agricultural drought in South-Eastern Australia
    Feng, Puyu
    Wang, Bin
    Liu, De Li
    Yu, Qiang
    [J]. AGRICULTURAL SYSTEMS, 2019, 173 : 303 - 316
  • [9] Metamodeling for Groundwater Age Forecasting in the Lake Michigan Basin
    Fienen, Michael N.
    Nolan, B. Thomas
    Kauffman, Leon J.
    Feinstein, Daniel T.
    [J]. WATER RESOURCES RESEARCH, 2018, 54 (07) : 4750 - 4766
  • [10] Evaluating the sources of water to wells: Three techniques for metamodeling of a groundwater flow model
    Fienen, Michael N.
    Nolan, Bernard T.
    Feinstein, Daniel T.
    [J]. ENVIRONMENTAL MODELLING & SOFTWARE, 2016, 77 : 95 - 107