An Incorrect Data Detection Method for Big Data Cleaning of Machinery Condition Monitoring

被引:100
作者
Xu, Xuefang [1 ]
Lei, Yaguo [1 ]
Li, Zeda [1 ]
机构
[1] Xi An Jiao Tong Univ, Key Lab Educ, Minist Modern Design & Rotor Bearing Syst, Xian 710049, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Condition-monitoring big data; data cleaning; data quality; incorrect data; local outlier factor (LOF); OUTLIER DETECTION; NETWORK;
D O I
10.1109/TIE.2019.2903774
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The presence of incorrect data leads to the decrease of condition-monitoring big data quality. As a result, unreliable or misleading results are probably obtained by analyzing these poor-quality data. In this paper, to improve the data quality, an incorrect data detection method based on an improved local outlier factor (LOF) is proposed for data cleaning. First, a sliding window technique is used to divide data into different segments. These segments are considered as different objects and their attributes consist of time-domain statistical features extracted from each segment, such as mean, maximum and peak-to-peak value. Second, a kernel-based LOF (KLOF) is calculated using these attributes to evaluate the degree of each segment being incorrect data. Third, according to these KLOF values and a threshold value, incorrect data are detected. Finally, a simulation of vibration data generated by a defective rolling element bearing and three real cases concerning a fixed-axle gearbox, a wind turbine, and a planetary gearbox are used to verify the effectiveness of the proposed method, respectively. The results demonstrate that the proposed method is able to detect both missing segments and abnormal segments, which are two typical incorrect data, effectively, and thus is helpful for big data cleaning of machinery condition monitoring.
引用
收藏
页码:2326 / 2336
页数:11
相关论文
共 37 条
  • [1] Abu Bakar Z, 2006, CONF CYBERN INTELL S, P360
  • [2] [Anonymous], P IEEE AIAA 30 DIG A
  • [3] LOF: Identifying density-based local outliers
    Breunig, MM
    Kriegel, HP
    Ng, RT
    Sander, J
    [J]. SIGMOD RECORD, 2000, 29 (02) : 93 - 104
  • [4] Anomaly Detection: A Survey
    Chandola, Varun
    Banerjee, Arindam
    Kumar, Vipin
    [J]. ACM COMPUTING SURVEYS, 2009, 41 (03)
  • [5] Robust support vector data description for outlier detection with noise or uncertain data
    Chen, Guijun
    Zhang, Xueying
    Wang, Zizhong John
    Li, Fenglian
    [J]. KNOWLEDGE-BASED SYSTEMS, 2015, 90 : 129 - 137
  • [6] Big Data: A Survey
    Chen, Min
    Mao, Shiwen
    Liu, Yunhao
    [J]. MOBILE NETWORKS & APPLICATIONS, 2014, 19 (02) : 171 - 209
  • [7] Enhancements on local outlier detection
    Chiu, ALM
    Fu, AWC
    [J]. SEVENTH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2003, : 298 - 307
  • [8] Railway Track Circuit Fault Diagnosis Using Recurrent Neural Networks
    de Bruin, Tim
    Verbert, Kim
    Babuska, Robert
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) : 523 - 533
  • [9] Deep learning for big data applications in CAD and PLM - Research review, opportunities and case study
    Dekhtiar, Jonathan
    Durupt, Alexandre
    Bricogne, Matthieu
    Eynard, Benoit
    Rowson, Harvey
    Kiritsis, Dimitris
    [J]. COMPUTERS IN INDUSTRY, 2018, 100 : 227 - 243
  • [10] Outlier detection for compositional data using robust methods
    Filzmoser, Peter
    Hron, Karel
    [J]. MATHEMATICAL GEOSCIENCES, 2008, 40 (03) : 233 - 248