Sensor data quality: a systematic review

被引:141
作者
Teh, Hui Yie [1 ]
Kempa-Liehr, Andreas W. [2 ,3 ]
Wang, Kevin I-Kai [1 ]
机构
[1] Univ Auckland, Dept Elect Comp & Software Engn, Auckland, New Zealand
[2] Univ Freiburg, Freiburg Mat Res Ctr, Freiburg, Germany
[3] Univ Auckland, Dept Engn Sci, Auckland, New Zealand
关键词
Systematic review; Sensor data quality; Sensor data error detection; Sensor data error correction; Datasets; PRINCIPAL COMPONENT ANALYSIS; FAULT-DIAGNOSIS; ANOMALY DETECTION; DATA VALIDATION; NEURAL-NETWORKS; UNCERTAIN DATA; SOFT-SENSOR; INTERNET; RECONSTRUCTION; MACHINE;
D O I
10.1186/s40537-020-0285-1
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Sensor data quality plays a vital role in Internet of Things (IoT) applications as they are rendered useless if the data quality is bad. This systematic review aims to provide an introduction and guide for researchers who are interested in quality-related issues of physical sensor data. The process and results of the systematic review are presented which aims to answer the following research questions: what are the different types of physical sensor data errors, how to quantify or detect those errors, how to correct them and what domains are the solutions in. Out of 6970 literatures obtained from three databases (ACM Digital Library, IEEE Xplore and ScienceDirect) using the search string refined via topic modelling, 57 publications were selected and examined. Results show that the different types of sensor data errors addressed by those papers are mostly missing data and faults e.g. outliers, bias and drift. The most common solutions for error detection are based on principal component analysis (PCA) and artificial neural network (ANN) which accounts for about 40% of all error detection papers found in the study. Similarly, for fault correction, PCA and ANN are among the most common, along with Bayesian Networks. Missing values on the other hand, are mostly imputed using Association Rule Mining. Other techniques include hybrid solutions that combine several data science methods to detect and correct the errors. Through this systematic review, it is found that the methods proposed to solve physical sensor data errors cannot be directly compared due to the non-uniform evaluation process and the high use of non-publicly available datasets. Bayesian data analysis done on the 57 selected publications also suggests that publications using publicly available datasets for method evaluation have higher citation rates.
引用
收藏
页数:49
相关论文
共 103 条
[1]  
Abuaitah GR, 2012, P 2012 IEEE 9 INT C, P1, DOI 10.1109/MASS.2012.6708514
[2]   Robust Preprocessing for Health Care Monitoring Framework [J].
Ahmad, Nor Faizah ;
Hoang, Doan B. ;
Phung, M. Hoang .
2009 11TH INTERNATIONAL CONFERENCE ON E-HEALTH NETWORKING, APPLICATIONS AND SERVICES (HEALTHCOM 2009), 2009, :169-174
[3]   Unsupervised real-time anomaly detection for streaming data [J].
Ahmad, Subutai ;
Lavin, Alexander ;
Purdy, Scott ;
Agha, Zuha .
NEUROCOMPUTING, 2017, 262 :134-147
[4]  
Ahmed Alawi, 2006, FAULT DETECTION SUPE, V2007, P908, DOI [10.1016/B978-008044485-7/50153-6, DOI 10.1016/B978-008044485-7/50153-6]
[5]   The role of big data analytics in Internet of Things [J].
Ahmed, Ejaz ;
Yaqoob, Ibrar ;
Hashem, Ibrahim Abaker Targio ;
Khan, Imran ;
Ahmed, Abdelmuttlib Ibrahim Abdalla ;
Imran, Muhammad ;
Vasilakos, Athanasios V. .
COMPUTER NETWORKS, 2017, 129 :459-471
[6]  
[Anonymous], 2013, OUTLIER ANAL, DOI DOI 10.1007/978-1-4614-6396-2_11291.68004
[7]  
[Anonymous], **DATA OBJECT**, DOI DOI 10.5281/ZENODO.2654726
[8]  
[Anonymous], 2015, Prisma Transparent Reporting Of Systematic Reviews And Meta-Analyses
[9]  
[Anonymous], 1992, BREAKTHROUGHS STAT
[10]  
[Anonymous], 2018, C11738085 CISC SYST