A Chinese Named Entity Recognition Method Based on ERNIE-BiLSTM-CRF for Food Safety Domain

被引:6
作者
Yuan, Taiping [1 ]
Qin, Xizhong [1 ,2 ]
Wei, Chunji [1 ]
机构
[1] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi 830049, Peoples R China
[2] Xinjiang Signal Detect & Proc Key Lab, Urumqi 830049, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 05期
关键词
food safety supervision; named entity recognition; pre-trained language model; ERNIE; adversarial training; BiLSTM-CRF; self-attention;
D O I
10.3390/app13052849
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Food safety is closely related to human health. Therefore, named entity recognition technology is used to extract named entities related to food safety, and building a regulatory knowledge graph in the field of food safety can help relevant authorities to regulate food safety issues and mitigate the hazards caused by food safety problems. However, there is no publicly available named entity recognition dataset in the food safety domain. In contrast, the non-standardized Chinese short texts generated from user comments on the web contain rich implicit information that can help identify named entities in specific domains (e.g., food safety domain) where the corpus is scarce. Therefore, in this paper, named entities related to food safety are extracted from these unstandardized texts on the web. However, the existing Chinese named entity identification methods are mainly for standardized texts. Meanwhile, these unstandardized texts have the following problems: (1) their corpus size is small; (2) there are various new and wrong words and noise; (3) and they do not follow strict syntactic rules. These problems make the recognition of Chinese named entities for online texts more challenging. Therefore, this paper proposes the ERNIE-Adv-BiLSTM-Att-CRF model to improve the recognition of food safety domain entities in unstandardized texts. Specifically, adversarial training is added to the model training as a regularization method to alleviate the influence of noise on the model, while self-attention is added to the BiLSTM-CRF model to capture features that significant impact entity classification and improve the accuracy of entity classification. This paper conducts experiments on the public dataset Weibo NER and the self-built food domain dataset Food. The experimental results show that our model achieves a SOTA performance of 72.64% and a good performance of 69.68% for F1 values on the public and self-built datasets, respectively. The validity and reasonableness of our model are verified. In addition, the paper further analyses the impact of various components and settings on the model. The study has practical implications in the field of food safety.
引用
收藏
页数:16
相关论文
共 40 条
[1]  
Aguilar G, 2019, Arxiv, DOI arXiv:1906.04135
[2]  
Baevski A., 2019, ARXIV
[3]  
Cao PF, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P182
[4]  
Chiu Jason P. C., 2016, T ASS COMPUT LINGUIS, V4, P357
[5]  
Collobert R, 2011, J MACH LEARN RES, V12, P2493
[6]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[7]   Character-Based LSTM-CRF with Radical-Level Features for Chinese Named Entity Recognition [J].
Dong, Chuanhai ;
Zhang, Jiajun ;
Zong, Chengqing ;
Hattori, Masanori ;
Di, Hui .
NATURAL LANGUAGE UNDERSTANDING AND INTELLIGENT APPLICATIONS (NLPCC 2016), 2016, 10102 :239-250
[8]  
Dredze, 2015, P 2015 C EMP METH NA, P548, DOI DOI 10.18653/V1/D15-1064
[9]  
Hu D., 2020, arXiv
[10]  
Huang Z., 2015, ARXIV