Missing Data Imputation in the Internet of Things Sensor Networks

被引:22
作者
Agbo, Benjamin [1 ]
Al-Aqrabi, Hussain [1 ]
Hill, Richard [1 ]
Alsboui, Tariq [1 ]
机构
[1] Univ Huddersfield, Sch Comp & Engn, Dept Comp Sci, Huddersfield HD1 3DH, W Yorkshire, England
关键词
IoT; low cost sensor; missing data; imputation; FIELD CALIBRATION; INCOMPLETE DATA; EM ALGORITHM;
D O I
10.3390/fi14050143
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Internet of Things (IoT) has had a tremendous impact on the evolution and adoption of information and communication technology. In the modern world, data are generated by individuals and collected automatically by physical objects that are fitted with electronics, sensors, and network connectivity. IoT sensor networks have become integral aspects of environmental monitoring systems. However, data collected from IoT sensor devices are usually incomplete due to various reasons such as sensor failures, drifts, network faults and various other operational issues. The presence of incomplete or missing values can substantially affect the calibration of on-field environmental sensors. The aim of this study is to identify efficient missing data imputation techniques that will ensure accurate calibration of sensors. To achieve this, we propose an efficient and robust imputation technique based on k-means clustering that is capable of selecting the best imputation technique for missing data imputation. We then evaluate the accuracy of our proposed technique against other techniques and test their effect on various calibration processes for data collected from on-field low-cost environmental sensors in urban air pollution monitoring stations. To test the efficiency of the imputation techniques, we simulated missing data rates at 10-40% and also considered missing values occurring over consecutive periods of time (1 day, 1 week and 1 month). Overall, our proposed BFMVI model recorded the best imputation accuracy (0.011758 RMSE for 10% missing data and 0.169418 RMSE at 40% missing data) compared to the other techniques (kNearest-Neighbour (kNN), Regression Imputation (RI), Expectation Maximization (EM) and MissForest techniques) when evaluated using different performance indicators. Moreover, the results show a trade-off between imputation accuracy and computational complexity with benchmark techniques showing a low computational complexity at the expense of accuracy when compared with our proposed technique.
引用
收藏
页数:16
相关论文
共 48 条
[1]   Best Fit Missing Value Imputation (BFMVI) Algorithm for Incomplete Data in the Internet of Things [J].
Agbo, Benjamin ;
Qin, Yongrui ;
Hill, Richard .
PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY (IOTBDS), 2020, :130-137
[2]   Securing Manufacturing Intelligence for the Industrial Internet of Things [J].
Al-Aqrabi, Hussain ;
Hill, Richard ;
Lane, Phil ;
Aagela, Hamza .
FOURTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY, VOL 2, 2020, 1027 :267-282
[3]   Hardware-Intrinsic Multi-Layer Security: A New Frontier for 5G Enabled IIoT [J].
Al-Aqrabi, Hussain ;
Johnson, Anju P. ;
Hill, Richard ;
Lane, Phil ;
Alsboui, Tariq .
SENSORS, 2020, 20 (07)
[4]   A Multi-layer Hierarchical Inter-Cloud Connectivity Model for Sequential Packet Inspection of Tenant Sessions Accessing BI as a Service [J].
Al-Aqrabi, Hussain ;
Liu, Lu ;
Hill, Richard ;
Antonopoulos, Nick .
2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, :498-505
[5]   Handling Complex Missing Data Using Random Forest Approach for an Air Quality Monitoring Dataset: A Case Study of Kuwait Environmental Data (2012 to 2018) [J].
Alsaber, Ahmad R. ;
Pan, Jiazhu ;
Al-Hurban, Adeeba .
INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2021, 18 (03) :1-26
[6]   A Euclidean distance-based measure of efficiency in data envelopment analysis [J].
Amirteimoori, Alireza ;
Kordrostami, Sohrab .
OPTIMIZATION, 2010, 59 (07) :985-996
[7]  
[Anonymous], **DATA OBJECT**
[8]  
[Anonymous], 2015, arXiv
[9]  
[Anonymous], 1998, Technical report
[10]  
[Anonymous], 2002, Hierarchical linear models: Applications and data analysis methods