Mean Imputation Techniques for Filling the Missing Observations in Air Pollution Dataset
被引:12
作者:
Noor, M. N.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Malaysia Perlis, Sch Environm Engn, P Box 77, Kangar 01000, Perlis, Malaysia
Univ Sains Malaysia, Sch Civil Engn, Environm & Sustainable Dev Sect, Clean Air Res Grp, Penang, Malaysia
Univ Malaysia Perlis, Sch Mat Engn, Ctr Excellence Geopolymer & Green Technol CEGeoG, Kangar 01000, Perlis, MalaysiaUniv Malaysia Perlis, Sch Environm Engn, P Box 77, Kangar 01000, Perlis, Malaysia
Noor, M. N.
[1
,2
,3
]
Yahaya, A. S.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Sains Malaysia, Sch Civil Engn, Environm & Sustainable Dev Sect, Clean Air Res Grp, Penang, MalaysiaUniv Malaysia Perlis, Sch Environm Engn, P Box 77, Kangar 01000, Perlis, Malaysia
Yahaya, A. S.
[2
]
Ramli, N. A.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Sains Malaysia, Sch Civil Engn, Environm & Sustainable Dev Sect, Clean Air Res Grp, Penang, MalaysiaUniv Malaysia Perlis, Sch Environm Engn, P Box 77, Kangar 01000, Perlis, Malaysia
Ramli, N. A.
[2
]
Al Bakri, A. M. Mustafa
论文数: 0引用数: 0
h-index: 0
机构:
Univ Malaysia Perlis, Sch Mat Engn, Ctr Excellence Geopolymer & Green Technol CEGeoG, Kangar 01000, Perlis, MalaysiaUniv Malaysia Perlis, Sch Environm Engn, P Box 77, Kangar 01000, Perlis, Malaysia
Al Bakri, A. M. Mustafa
[3
]
机构:
[1] Univ Malaysia Perlis, Sch Environm Engn, P Box 77, Kangar 01000, Perlis, Malaysia
[2] Univ Sains Malaysia, Sch Civil Engn, Environm & Sustainable Dev Sect, Clean Air Res Grp, Penang, Malaysia
[3] Univ Malaysia Perlis, Sch Mat Engn, Ctr Excellence Geopolymer & Green Technol CEGeoG, Kangar 01000, Perlis, Malaysia
来源:
ADVANCED MATERIALS ENGINEERING AND TECHNOLOGY II
|
2014年
/
594-595卷
关键词:
air pollution;
imputation;
performance indicators;
PM10;
D O I:
10.4028/www.scientific.net/KEM.594-595.902
中图分类号:
T [工业技术];
学科分类号:
08 ;
摘要:
Almost all real life datasets consist missing values. These are usually due to machine failure, routine maintenance, changes in siting monitors and human error. The occurence of missing values requires special attention on analysing the data. Incomplete datasets can cause bias due to systematic differences between observed and unobserved data. Therefore, the need to find the best way in estimating missing values is very important so that the data analysed is ensured of high quality. In this research, three types of mean imputation techniques that are mean, mean above and mean above below methods were used to replace the missing values. Annual hourly monitoring data for PM10 were used to generate missing values. Four randomly simulated missing data were evaluated in order to test the efficiency of the methods used. They are 5%, 10%, 15%, 25% and 40%. Three types of performance indicators that are mean absolute error (MAE), root mean square error (RMSE) and coefficient of determination (R-2) were calculated to describe the goodness of fit for all the method. From all the method applied, it was found that mean above below method is the best method for estimating data for all percentages of simulated missing values.