Imputation Analysis for Time Series Air Quality (PM10) Data Set: A Comparison of Several Methods

被引:8
作者
Shaadan, N. [1 ,2 ]
Rahim, N. A. M. [1 ]
机构
[1] Univ Teknol MARA, Fac Comp & Math Sci, Ctr Stat & Decis Sci Studies, Shah Alam 40450, Selangor, Malaysia
[2] Univ Teknol MARA, Fac Comp & Math Sci, Adv Analyt Engn Ctr, Shah Alam 40450, Selangor, Malaysia
来源
2ND INTERNATIONAL CONFERENCE ON APPLIED & INDUSTRIAL MATHEMATICS AND STATISTICS | 2019年 / 1366卷
关键词
MISSING VALUES;
D O I
10.1088/1742-6596/1366/1/012107
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Good quality data is important to guarantee for the best quality results of research analysis. However, the quality of the data often being impacted by the existence of missing values that bring bad implication on the accuracy of analysis and subsequently lead to biased results. In air quality data set, missing values problem often caused by various reasons, for example machine malfunction and errors, computer system crashes, human error and insufficient sampling used. In the case for time series modelling, complete series of data is very important to enable for the model construction. This paper aims to highlight a systematic statistical procedure and analysis on how to investigate the performance of several missing values imputation methods to solve for the problem of missing value existence when data are time series. The knowledge could help researchers to implement a comprehensive procedure in deciding a type of imputation method that suits with their data. A case study was conducted using real data set from Shah Alam air quality monitoring station. The results have shown that the missing data at the monitoring station is completely at random (MCAR). Among six imputation methods compared and based on the performance of indicators such as RMSE, MAE, AI and R-2 it is shown that imputation using Kalman Filter using ARIMA model is the best appropriate method for the data set.
引用
收藏
页数:10
相关论文
共 20 条
[1]  
Abd Razak N, 2014, SAINS MALAYS, V43, P1599
[2]  
Beck MW, 2018, R J, V10, P218
[3]  
Bertsimas D, 2018, J MACHINE LEARNING R, V18, P1
[4]  
Burhanuddin S. N. Z. A., 2016, SCI RES J, V13, P84
[5]  
Burhanuddin SNZA, 2017, INT J GEOMATE, V13, P131, DOI 10.21660/2017.36.2760
[6]   Recurrent Neural Networks for Multivariate Time Series with Missing Values [J].
Che, Zhengping ;
Purushotham, Sanjay ;
Cho, Kyunghyun ;
Sontag, David ;
Liu, Yan .
SCIENTIFIC REPORTS, 2018, 8
[7]  
Dermirhan H, 2018, APPL ENERG, V225, P998
[8]   Imputation of missing data in time series for air pollutants [J].
Junger, W. L. ;
de Leon, A. Ponce .
ATMOSPHERIC ENVIRONMENT, 2015, 102 :96-104
[9]   Methods for imputation of missing values in air quality data sets [J].
Junninen, H ;
Niska, H ;
Tuppurainen, K ;
Ruuskanen, J ;
Kolehmainen, M .
ATMOSPHERIC ENVIRONMENT, 2004, 38 (18) :2895-2907
[10]  
Mazlan N, 2015, INT J APPL MATH STAT, V53, P209