Compressor Fault Diagnosis Knowledge: A Benchmark Dataset for Knowledge Extraction From Maintenance Log Sheets Based on Sequence Labeling

被引:9
作者
Chen, Tao [1 ]
Zhu, Jiang [1 ]
Zeng, Zhiqiang [1 ]
Jia, Xudong [1 ,2 ]
机构
[1] Wuyi Univ, Fac Intelligent Mfg, Jiangmen 529020, Peoples R China
[2] Calif State Univ Northridge, Coll Engn & Comp Sci, Northridge, CA 91330 USA
关键词
Fault diagnosis; Labeling; Maintenance engineering; Hidden Markov models; Feature extraction; Deep learning; Benchmark testing; Compressor fault diagnosis; dataset; named entity recognition; sequence labeling; NAMED ENTITY RECOGNITION;
D O I
10.1109/ACCESS.2021.3072927
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Compressor fault diagnosis requires expert knowledge. Using the sequence labeling technology, this expert knowledge can be automatically extracted from compressor maintenance log sheets. Previous studies indicate that sequence labeling methods often need a substantial amount of annotation data for knowledge extraction, Unfortunately, the annotation data are very scarce in the field of compressor fault diagnosis. In this paper, we introduce a benchmark dataset for extraction of knowledge suitable for air compressor fault diagnosis. First, we collected 11,418 pieces of information from air compressor maintenance log sheets. Fault description, service requests, causes and troubleshooting solutions were stored in a dataset for data preprocessing and masking. In addition, 6196 valid text pairs were developed after the "noises" in the raw dataset were cleaned. Second, five kinds of entities and sequences, such as equipment, faults, service requests, causes and troubleshooting solutions, were annotated by three subject experts. The annotation consistency was assessed with F1 scores. Furthermore, our proposed baseline model (or the BERT-BI-LSTM-CRF model) was compared against other five sequence labeling models (BI-LSTM-CRF, Lattice LSTM, BERT NER, ZEN, and ERNIE). The BERT-BI-LSTM-CRF model gives superior performance in extracting expert knowledge from the subject dataset. Although the baseline model is not the most cutting-edge model in the sequence labeling and named entity recognition fields, it indeed presents a great potential for compressor fault diagnosis. The dataset is available at https://github.com/chentao1999/CFDK.
引用
收藏
页码:59394 / 59405
页数:12
相关论文
共 43 条
[1]   Transfer Learning for Arabic Named Entity Recognition With Deep Neural Networks [J].
Al-Smadi, Mohammad ;
Al-Zboon, Saad ;
Jararweh, Yaser ;
Juola, Patrick .
IEEE ACCESS, 2020, 8 :37736-37745
[2]  
Beller C., 2014, P 2 WORKSH EVENTS DE, P45
[3]   An algorithm that learns what's in a name [J].
Bikel, DM ;
Schwartz, R ;
Weischedel, RM .
MACHINE LEARNING, 1999, 34 (1-3) :211-231
[4]  
Bod M., 2001, Rnn Dan Bpnn, V2, P1
[5]  
Brown Tom, 2020, ADV NEURAL INFORM PR
[6]   A Set Space Model to Capture Structural Information of a Sentence [J].
Chen, Yanping ;
Wang, Guorong ;
Zheng, Qinghua ;
Qin, Yongbin ;
Huang, Ruizhang ;
Chen, Ping .
IEEE ACCESS, 2019, 7 :142515-142530
[7]  
Collins M., P JOINT SIGDAT C EMP, P1
[8]  
Collobert R, 2011, J MACH LEARN RES, V12, P2493
[9]   Research on fault diagnosis for reciprocating compressor valve using information entropy and SVM method [J].
Cui, Houxi ;
Zhang, Laibin ;
Kang, Rongyu ;
Lan, Xinyang .
JOURNAL OF LOSS PREVENTION IN THE PROCESS INDUSTRIES, 2009, 22 (06) :864-867
[10]  
De Meulder F, 2003, NAACL