Electric Power Audit Text Classification With Multi-Grained Pre-Trained Language Model

被引：16

作者：

Meng, Qinglin ^{[1
]}

Song, Yan ^{[1
]}

Mu, Jian ^{[2
]}

Lv, Yuanxu ^{[3
]}

Yang, Jiachen ^{[4
]}

Xu, Liang ^{[5
]}

Zhao, Jin ^{[6
]}

Ma, Junwei ^{[7
]}

Yao, Wei ^{[8
]}

Wang, Rui ^{[9
]}

Xiao, Maoxiang ^{[10
]}

Meng, Qingyu ^{[11
]}

机构：

[1] State Grid Tianjin Elect Power Co, Comprehens Serv Ctr, Tianjin 300010, Peoples R China

[2] State Grid Tianjin Elect Power Co, Tianjin 300010, Peoples R China

[3] State Grid Corp China, Beijing 100031, Peoples R China

[4] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China

[5] Tianjin Tianyuan Power Engn Co Ltd, State Grid Tianjin Elect Power Co, Baodi Power Supply Branch, Tianjin 301800, Peoples R China

[6] State Grid Shanxi Elect Power Co, Elect Power Res Inst, Taiyuan 003001, Peoples R China

[7] State Grid Shanxi Elect Power Co, Informat & Commun Branch, Taiyuan 030012, Peoples R China

[8] State Grid Shanxi Elect Power Co, Taiyuan Power Supply Co, Taiyuan 003000, Shanxi, Peoples R China

[9] State Grid Tianjin Elect Power Co, Chengxi Power Supply Branch, Tianjin 300190, Peoples R China

[10] State Grid Tianjin Elect Power Co, Ningdongshengyuan Elect Power Engn Co Ltd, Ninghe Power Supply Branch, Tianjin 301500, Peoples R China

[11] State Grid Jibei Elect Power Co Ltd, Zhangjiakou Wanquan Dist Power Supply Branch, Zhangjiakou 076261, Heibei, Peoples R China

来源：

IEEE ACCESS | 2023年 / 11卷

关键词：

Power systems; Task analysis; Text categorization; Bit error rate; Data models; Computational modeling; Natural language processing; Pre-trained language model; text classification; electric power audit text; natural language processing; masked language model;

D O I：

10.1109/ACCESS.2023.3240162

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Electric power audit text classification is one of the important research problem in electric power systems. Recently, kinds of automatic classification methods for these texts based on machine learning or deep learning models have been applied. At present, the development of computing technology makes "pre-training and fine-tuning " the newest paradigm of text classification, which achieves better results than previous fully-supervised models. Based on pre-training theory, domain-related pre-training tasks can enhance the performance of downstream tasks in the specific domain. However, existing pre-training models usually use general corpus for pre-training, and do not use texts related to the field of electric power, especially electric power audit texts. This results in that the model does not learn too much electric-power-related morphology or semantics in the pre-training stage, so that less information can be used in the fine-tuning stage. Based on the research status, in this paper, we propose EPAT-BERT, a BERT-based model pre-trained by two-granularity pre-training tasks: word-level masked language model and entity-level masked language model. These two tasks predict word and entity in electric-power-related texts to learn abundant morphology and semantics about electric power. We then fine-tune EPAT-BERT for electric power audit text classification task. The experimental results show that, compared with fully supervised machine learning models, neural network models, and general pre-trained language models, EPAT-BERT can significantly outperform existing models in a variety of evaluation metrics. Therefore, EPAT-BERT can be further applied to electric power audit text classification. We also conduct ablation studies to prove the effectiveness of each component in EPAT-BERT to further illustrate our motivations.

引用

页码：13510 / 13518

页数：9

共 29 条

[1] Text Classification of Flu-related Tweets Using FastText with Sentiment and Keyword Features [J].

Alessa, Ali ;

Faezipour, Miad ;

Alhassan, Zakhriya .

2018 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2018, :366-367

[2]

[Anonymous], 2015, P 2015 C N AM CHAPT, DOI 10.3115/v1/N15-1142

[3]

Beltagy I, 2020, Arxiv, DOI arXiv:2004.05150

[4]

Beltagy I, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P3615

[5]

Budzianowski P, 2019, P 3 WORKSH NEUR GEN, P15

[6]

[陈晓娜 Chen Xiaona], 2022, [北京大学学报. 自然科学版, Acta Scientiarum Naturalium Universitatis Pekinensis], V58, P77

[7]

Devlin J., 2018, NAACLHLT

[8]

[丁禹 Ding Yu], 2020, [电力系统自动化, Automation of Electric Power Systems], V44, P161

[9]

Erhan D, 2010, J MACH LEARN RES, V11, P625

[10]

Ethayarajh K, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P55

← 1 2 3 →