Investigation of using missing data imputation methodologies effect on the SARIMA model performance: application to average monthly flows

被引:0
作者
Bleidorn, Michel Trarbach [1 ]
Schmidt, Isamara Maria [1 ]
dos Reis, Jose Antonio Tosta [1 ]
Pani, Deysilara Figueira [1 ]
Pinto, Wanderson de Paula [2 ]
Solci, Carlo Correa [1 ]
Mendonca, Antonio Sergio Ferreira [1 ]
Brasil, Gutemberg Hespanha [1 ]
机构
[1] Univ Fed Espirito Santo, Vitoria, ES, Brazil
[2] Ctr Univ FAVENI, Guarulhos, SP, Brazil
来源
RBRH-REVISTA BRASILEIRA DE RECURSOS HIDRICOS | 2024年 / 29卷
关键词
Missing data imputation methodologies; Forecast; SARIMA; MONTHLY STREAMFLOW; TIME-SERIES; RIVER;
D O I
10.1590/2318-0331.292420230131
中图分类号
TV21 [水资源调查与水利规划];
学科分类号
081501 ;
摘要
Accuracy in river flows forecasts is crucial for Hydrology, but is challenged by fluviometric data quality. This study investigates the impact of different missing data imputation methods on the Seasonal Autoregressive Integrated Moving Average (SARIMA) model performance. SARIMA (1,1,1)(0,1,1)(12) was selected using semi-automated criteria, such as lowest AIC, significant parameters (p-value < 0.05) and residuals adequacy. This model was then compared with reconstructed series using different imputation methods such as Mean (AM), Median (M), Spline and Stinemann Interpolations, Regional Weighting (RW), Multiple Linear Regression (MLR), Multiple Imputation (MI) and Maximum Likelihood (ML). The data were analyzed considering scenarios of 5, 20 and 40% missing data, following random and block patterns, using data from the Doce River, in Southeast Brazil. Results obtained by the performance indicators and, their respective relative differences, indicated that, univariate (AM and M) and multivariate (PW and RLM) methods limited the model's performance, while univariate Spline and Stine and multivariate IM and ML methods didn't present significant limitations, except Spline for the block pattern. It is concluded that, future predictions accuracy depends, not only on a well-trained and validated model, but also on the appropriate use of missing data imputation methods.
引用
收藏
页数:18
相关论文
共 46 条
[1]   Comparison of performance of statistical models in forecasting monthly streamflow of Kizil River, China [J].
Abudu, Shalamu ;
Cui, Chun-liang ;
King, James Phillip ;
Abudukadeer, Kaiser .
WATER SCIENCE AND ENGINEERING, 2010, 3 (03) :269-281
[2]   Hydrological drought forecasting using multi-scalar streamflow drought index, stochastic models and machine learning approaches, in northern Iran [J].
Aghelpour, Pouya ;
Bahrami-Pichaghchi, Hadigheh ;
Varshavian, Vahid .
STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2021, 35 (08) :1615-1635
[3]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[4]  
Akaike H., 1978, Selected papers of Hirotugu Akaike, P275
[5]  
Allison P., 2002, QUANTITATIVE APPL SO
[6]  
[Anonymous], 2007, Missing Data: A Gentle Introduction
[7]   Artificial intelligence modelling integrated with Singular Spectral analysis and Seasonal-Trend decomposition using Loess approaches for streamflow predictions [J].
Apaydin, Halit ;
Sattari, Mohammad Taghi ;
Falsafian, Kambiz ;
Prasad, Ramendra .
JOURNAL OF HYDROLOGY, 2021, 600
[8]   Automatic gap-filling of daily streamflow time series in data-scarce regions using a machine learning algorithm [J].
Arriagada, Pedro ;
Karelovic, Bruno ;
Link, Oscar .
JOURNAL OF HYDROLOGY, 2021, 598
[9]  
Bayer D.M., 2012, Revista Brasileira de Recursos Hidricos, V17, P229, DOI 10.21168/rbrh.v17n2.p229-239
[10]   Multivariate missing data in hydrology - Review and applications [J].
Ben Aissia, Mohamed-Aymen ;
Chebana, Fateh ;
Ouarda, Taha B. M. J. .
ADVANCES IN WATER RESOURCES, 2017, 110 :299-309