Estimating extremely large amounts of missing precipitation data

被引:31
|
作者
Aguilera, Hector [1 ]
Guardiola-Albert, Carolina [1 ]
Serrano-Hidalgo, Carmen [1 ,2 ]
机构
[1] Geol Survey Spain, Res Geol Resources, Rios Roses 23, Madrid 28003, Spain
[2] Tech Univ Madrid, Sch Min Engn Madrid, Rios Rosas 21, Madrid 28003, Spain
关键词
evaluation; large missing precipitation; multiple imputation; random forest; spatio-temporal kriging; MULTIPLE IMPUTATION; RAINFALL DATA; SPATIAL INTERPOLATION; RECORDS; SERIES; VALUES;
D O I
10.2166/hydro.2020.127
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Accurate estimation of missing daily precipitation data remains a difficult task. A wide variety of methods exists for infilling missing values, but the percentage of gaps is one of the main factors limiting their applicability. The present study compares three techniques for filling in large amounts of missing daily precipitation data: spatio-temporal kriging (STK), multiple imputation by chained equations through predictive mean matching (PMM), and the random forest (RF) machine learning algorithm. To our knowledge, this is the first time that extreme missingness (>90%) has been considered. Different percentages of missing data and missing patterns are tested in a large dataset drawn from 112 rain gauges in the period 1975-2017. The results show that both STK and RF can handle extreme missingness, while PMM requires larger observed sample sizes. STK is the most robust method, suitable for chronological missing patterns. RF is efficient under random missing patterns. Model evaluation is usually based on performance and error measures. However, this study outlines the risk of just relying on these measures without checking for consistency. The RF algorithm overestimated daily precipitation outside the validation period in some cases due to the overdetection of rainy days under time-dependent missing patterns.
引用
收藏
页码:578 / 592
页数:15
相关论文
共 50 条
  • [1] Imputation of missing precipitation data using KNN, SOM, RF, and FNN
    Sahoo, Abinash
    Ghose, Dillip Kumar
    SOFT COMPUTING, 2022, 26 (12) : 5919 - 5936
  • [2] An efficient empirical likelihood approach for estimating equations with missing data
    Tang, Cheng Yong
    Qin, Yongsong
    BIOMETRIKA, 2012, 99 (04) : 1001 - 1007
  • [3] The (Ir)Responsibility of (Under)Estimating Missing Data
    Fernandez-Garcia, Maria P.
    Vallejo-Seco, Guillermo
    Livacic-Rojas, Pablo
    Tuero-Herrero, Ellian
    FRONTIERS IN PSYCHOLOGY, 2018, 9
  • [4] A Bayesian Approach for Estimating Mediation Effects With Missing Data
    Enders, Craig K.
    Fairchild, Amanda J.
    MacKinnon, David P.
    MULTIVARIATE BEHAVIORAL RESEARCH, 2013, 48 (03) : 340 - 369
  • [5] Cross Assessment of Twenty-One Different Methods for Missing Precipitation Data Estimation
    Armanuos, Asaad M.
    Al-Ansari, Nadhir
    Yaseen, Zaher Mundher
    ATMOSPHERE, 2020, 11 (04)
  • [6] Estimating propensity scores with missing covariate data using general location mixture models
    Mitra, Robin
    Reiter, Jerome P.
    STATISTICS IN MEDICINE, 2011, 30 (06) : 627 - 641
  • [7] Sequential Imputation of Missing Spatio-Temporal Precipitation Data Using Random Forests
    Mital, Utkarsh
    Dwivedi, Dipankar
    Brown, James B.
    Faybishenko, Boris
    Painter, Scott L.
    Steefel, Carl I.
    FRONTIERS IN WATER, 2020, 2
  • [8] A Novel Nonparametric Multiple Imputation Algorithm for Estimating Missing Data
    Gheyas, Iffat A.
    Smith, Leslie S.
    WORLD CONGRESS ON ENGINEERING 2009, VOLS I AND II, 2009, : 1281 - 1286
  • [9] Interpolation of Missing Precipitation Data Using Kernel Estimations for Hydrologic Modeling
    Lee, Hyojin
    Kang, Kwangmin
    ADVANCES IN METEOROLOGY, 2015, 2015
  • [10] The generalized estimating equation approach when data are not missing completely at random
    Paik, MC
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1997, 92 (440) : 1320 - 1329