Impact of Data Leakage in Vibration Signals Used for Bearing Fault Diagnosis

被引:0
作者
Wheat, Lesley [1 ,2 ]
Mohrenschildt, Martin V. [1 ]
Habibi, Saeid [2 ]
Al-Ani, Dhafar [3 ]
机构
[1] McMaster Univ, Dept Comp & Software, Hamilton, ON L8S 4L7, Canada
[2] McMaster Univ, CMHT, Dept Mech Engn, Hamilton, ON L8S 4L8, Canada
[3] McMaster Univ, Dept Elect & Comp Engn, Hamilton, ON L8S 4L7, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Vibrations; Fault diagnosis; Training; Soft sensors; Rotating machines; Vibration measurement; Data models; Robustness; Time factors; Principal component analysis; Bearing; fault diagnosis; vibration analysis; domain shift; data leakage; machine learning; train-test split; CONVOLUTIONAL NEURAL-NETWORK;
D O I
10.1109/ACCESS.2024.3497716
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bearing fault diagnosis is a well-developed field and an active area of research in which the combination of model-free machine learning techniques with vibration data has become a popular approach. However, vibration data from rotating machines has the potential to contain domain shifts beyond the accepted causes in this research area (different part models, operating conditions and sensor locations) which can enable data leakage between training and test datasets. To demonstrate the impact of data leakage, six common bearing diagnosis methods are applied to two datasets using three data splitting methods to compare classification performance. Diagnosis is preformed using Principal Component Analysis (PCA), Supervised Principal Component Analysis (SPCA) and Linear Discriminant Analysis (LDA) in combination with frequency analysis and envelope analysis feature extraction methods. Datasets from McMaster University and Paderborn University are used as experimental data sources, and produce vastly differing results (over a 40% drop in accuracy) depending on the selected dataset splitting method, revealing a previously unknown domain shift. Despite great results for diagnosis methods using frequency response analysis on the data from McMaster, these results are not expected to generalize due to possible data leakage. Out of fifty-five previous works using the Paderborn dataset, ten are identified as likely to be affected and only six properly address the problem. Recommendations are given for future experiment design, model creation and model evaluation.
引用
收藏
页码:169879 / 169895
页数:17
相关论文
共 83 条
[1]  
Aburakhia S, 2023, Arxiv, DOI arXiv:2310.01718
[2]   Modulated Gabor filter based deep convolutional network for electrical motor bearing fault classification and diagnosis [J].
Afrasiabi, Shahabodin ;
Mohammadi, Mohammad ;
Afrasiabi, Mousa ;
Parang, Benyamin .
IET SCIENCE MEASUREMENT & TECHNOLOGY, 2021, 15 (02) :154-162
[3]   Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds [J].
Barshan, Elnaz ;
Ghodsi, Ali ;
Azimifar, Zohreh ;
Jahromi, Mansoor Zolghadri .
PATTERN RECOGNITION, 2011, 44 (07) :1357-1371
[4]   Data-Centric Perspective on Explainability Versus Performance Trade-Off [J].
Berenji, Amirhossein ;
Nowaczyk, Slawomir ;
Taghiyarrenani, Zahra .
ADVANCES IN INTELLIGENT DATA ANALYSIS XXI, IDA 2023, 2023, 13876 :42-54
[5]   Increased efficiency versus increased reliability - A comparison of pre-EPAct, EPAct, and premium-efficient motors [J].
Bonnett, Austin H. ;
Yung, Chuck .
IEEE INDUSTRY APPLICATIONS MAGAZINE, 2008, 14 (01) :29-36
[6]   Fibro-inflammatory recovery and type 2 diabetes remission following a low calorie diet but not exercise training: A secondary analysis of the DIASTOLIC randomised controlled trial [J].
Brady, Emer M. ;
Gulsin, Gaurav S. ;
Mirkes, Evgeny M. ;
Parke, Kelly ;
Kanagala, Prathap ;
Ng, Leong L. ;
Graham-Brown, Matthew P. M. ;
Athithan, Lavanya ;
Henson, Joseph ;
Redman, Emma ;
Yang, Jang ;
Zhao, Lei ;
Argyridou, Stavroula ;
Gray, Laura J. ;
Yates, Thomas ;
Davies, Melanie J. ;
McCann, Gerry P. .
DIABETIC MEDICINE, 2022, 39 (08)
[7]   Finding the optimal multilayer network structure through reinforcement learning in fault diagnosis [J].
Cao, Jie ;
Ma, Jialin ;
Huang, Dailin ;
Yu, Ping .
MEASUREMENT, 2022, 188
[8]   TFPred: Learning discriminative representations from unlabeled data forfew-label rotating machinery fault diagnosis [J].
Chen, Xiaohan ;
Yang, Rui ;
Xue, Yihao ;
Song, Baoye ;
Wang, Zidong .
CONTROL ENGINEERING PRACTICE, 2024, 146
[9]   Open-Set Fault Recognition and Inference for Rolling Bearing Based on Open Fault Semantic Subspace [J].
Chen, Yu ;
Tao, Laifa ;
Liu, Xue ;
Ma, Jian ;
Lu, Chen ;
Liu, Hongmei .
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 :1-11
[10]   ACDIN: Bridging the gap between artificial and real bearing damages for bearing fault diagnosis [J].
Chen, Yuanhang ;
Peng, Gaoliang ;
Xie, Chaohao ;
Zhang, Wei ;
Li, Chuanhao ;
Liu, Shaohui .
NEUROCOMPUTING, 2018, 294 :61-71