Entity Extraction of Electrical Equipment Malfunction Text by a Hybrid Natural Language Processing Algorithm

被引:14
作者
Kong, Zhe [1 ]
Yue, Changxi [2 ]
Shi, Ying [1 ]
Yu, Jicheng [2 ]
Xie, Changjun [1 ]
Xie, Lingyun [3 ]
机构
[1] Wuhan Univ Technol, Sch Automat, Wuhan 430070, Peoples R China
[2] China Elect Power Res Inst, Wuhan 430070, Peoples R China
[3] China Elect Technol Grp Corp, Res Inst 5, Shanghai 200331, Peoples R China
关键词
Power systems; Dictionaries; Maintenance engineering; Data mining; Natural language processing; Tools; Substations; Electrical equipment malfunction text; natural language processing; entity extraction; BERT-CRF model; NEURAL-NETWORK; RECOGNITION; CLASSIFICATION; CRF;
D O I
10.1109/ACCESS.2021.3063354
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many electrical equipment malfunction text messages are collected during power system operation and maintenance procedures. These texts usually contain crucial information for maintenance and condition monitoring. Because these power system malfunction texts are characterized by multidomain vocabularies, complex-syntactic structures, and long sentences, it is challenging to for automated systems to capture their semantic meaning and essential information. To address this issue, we propose a hybrid natural language processing (hybrid-NLP) algorithm to extract entities that represent electrical equipment. This algorithm is composed of a dictionary-based method, a language technology platform (LTP) tool, and the bidirectional encoder representations from a transformers-conditional random field (BERT-CRF) model. Significantly, the softmax output layer of the bidirectional encoder representations from the transformers (BERT) model is replaced by the conditional random field (CRF) algorithm to strengthen the contextual relationships between words and thus solve the local optimization of the preferred word label. The effectiveness of the proposed hybrid-NLP method is verified on a realistic dataset. Moreover, a statistical analysis is conducted to provide a reference for the operation and maintenance of power systems.
引用
收藏
页码:40216 / 40226
页数:11
相关论文
共 32 条
[1]  
[Anonymous], 2016, P 2016 C EMPIRICAL M
[2]  
[Anonymous], 2012, DOCUMENT DLT 837
[3]  
Bao Wei, 2019, 2019 IEEE Innovative Smart Grid Technologies - Asia (ISGT Asia), P1181, DOI 10.1109/ISGT-Asia.2019.8881815
[4]  
Chen P., 2016, MICROWAVE S IMS, P1, DOI DOI 10.1049/CP.2016.1076
[5]  
Chen S., 2020, Radio Communications Technology, V46, P251
[6]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[7]   Recent Named Entity Recognition and Classification techniques: A systematic review [J].
Goyal, Archana ;
Gupta, Vishal ;
Kumar, Manish .
COMPUTER SCIENCE REVIEW, 2018, 29 :21-43
[8]   Deep Sentiment Classification and Topic Discovery on Novel Coronavirus or COVID-19 Online Discussions: NLP Using LSTM Recurrent Neural Network Approach [J].
Jelodar, Hamed ;
Wang, Yongli ;
Orji, Rita ;
Huang, Shucheng .
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2020, 24 (10) :2733-2742
[9]  
Ji Z., 2020, GLOBAL ENERGY INTERC, V3, P186, DOI [10.1016/j.gloei.2020.05.010, DOI 10.1016/J.GLOEI.2020.05.010]
[10]   LSTM-CRF Neural Network With Gated Self Attention for Chinese NER [J].
Jin, Yanliang ;
Xie, Jinfei ;
Guo, Weisi ;
Luo, Can ;
Wu, Dijia ;
Wang, Rui .
IEEE ACCESS, 2019, 7 :136694-136703