Comparison of missing data imputation methods using weather data

被引:2
作者
Nida, Hafiza [1 ]
Kashif, Muhammad [1 ]
Khan, Muhammad Imran [1 ]
Ghamkhar, Madiha [1 ]
机构
[1] Univ Agr Faisalabad, Fac Sci, Dept Math & Stat, Faisalabad, Pakistan
来源
PAKISTAN JOURNAL OF AGRICULTURAL SCIENCES | 2023年 / 60卷 / 02期
关键词
Rainfall; temperature; missing data; imputation methods; root mean square error; TEMPERATURE; PAKISTAN; CLIMATE; CROP;
D O I
10.21162/PAKJAS/23.228
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
Researchers and data analysts commonly experience challenges while dealing with missing data for analyzing large data sets in their respective field of studies. It is necessary to handle missing data properly to obtain better and more reliable outcomes about any research. The objective of this research is to evaluate different imputation techniques for handling missing observations occurred in the weather data. For this purpose, weather data of the variables: daily rainfall, maximum temperature (Tmax) and minimum temperature (Tmin) of 23 stations of Pakistan have been taken from Pakistan Metrological department for the years 1981 to 2020. There are about 14610 total observations of each variable while each variable has different number of missing observations, called as size of missingness, at different stations. The techniques: mean imputation, k nearest neighbors (KNN) imputation, predictive mean matching (PMM) imputation and sample imputation have been considered for the estimation of missing observations found while analyzing data of each station. The minimal value of root mean square error (RMSE) is considered to decide about station-wise imputation technique because the size of missingness varied from station to station. The KNN technique is the most appropriate to estimate the missing observations of the rainfall variables for all the stations while mean imputation technique is recommended for Tmax and Tmin data; as compared to other imputation methods.
引用
收藏
页码:327 / 336
页数:10
相关论文
共 50 条
  • [21] Spectral methods for imputation of missing air quality data
    Shai Moshenberg
    Uri Lerner
    Barak Fishbain
    Environmental Systems Research, 4 (1)
  • [22] Evaluating Imputation Methods for Missing Data in a MCI Dataset
    Gomez-Valades Batanero, Alba
    Rincon Zamorano, Mariano
    Martinez Tomas, Rafael
    Guerrero Martin, Juan
    ARTIFICIAL INTELLIGENCE IN NEUROSCIENCE: AFFECTIVE ANALYSIS AND HEALTH APPLICATIONS, PT I, 2022, 13258 : 446 - 454
  • [23] Imputation of missing ages in pedigree data
    Balise, Raymond R.
    Chen, Yu
    Dite, Gillian
    Felberg, Anna
    Sun, Limei
    Ziogas, Argyrios
    Whittemore, Alice S.
    HUMAN HEREDITY, 2007, 63 (3-4) : 168 - 174
  • [24] A Comparison of the Effects of Data Imputation Methods on Model Performance
    Kim, Wooyoung
    Cho, Wonwoong
    Choi, Jangho
    Kim, Jiyong
    Park, Cheonbok
    Choo, Jaegul
    2019 21ST INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT): ICT FOR 4TH INDUSTRIAL REVOLUTION, 2019, : 592 - 599
  • [25] Missing data in longitudinal studies: Comparison of multiple imputation methods in a real clinical setting
    Rosato, Rosalba
    Pagano, Eva
    Testa, Silvia
    Zola, Paolo
    di Cuonzo, Daniela
    JOURNAL OF EVALUATION IN CLINICAL PRACTICE, 2021, 27 (01) : 34 - 41
  • [26] Missing data imputation: focusing on single imputation
    Zhang, Zhongheng
    ANNALS OF TRANSLATIONAL MEDICINE, 2016, 4 (01)
  • [27] Missing data, imputation, and endogeneity
    McDonough, Ian K.
    Millimet, Daniel L.
    JOURNAL OF ECONOMETRICS, 2017, 199 (02) : 141 - 155
  • [28] Influence of Data Distribution in Missing Data Imputation
    Santos, Miriam Seoane
    Soares, Jastin Pompeu
    Abreu, Pedro Henriques
    Araujo, Helder
    Santos, Joao
    ARTIFICIAL INTELLIGENCE IN MEDICINE, AIME 2017, 2017, 10259 : 285 - 294
  • [29] Improved imputation methods for missing data in two-occasion successive sampling
    Singh, Garib Nath
    Jaiswal, Ashok Kumar
    Pandey, Awadhesh K.
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2023, 52 (06) : 2010 - 2029
  • [30] Evaluation of Missing Data Imputation Methods for an Enhanced Distributed PV Generation Prediction
    Sundararajan, Aditya
    Sarwat, Arif I.
    PROCEEDINGS OF THE FUTURE TECHNOLOGIES CONFERENCE (FTC) 2019, VOL 1, 2020, 1069 : 590 - 609