Chinese agricultural diseases and pests named entity recognition with multi-scale local context features and self-attention mechanism

被引:58
作者
Guo, Xuchao [1 ]
Zhou, Han [1 ]
Su, Jie [1 ]
Hao, Xia [1 ]
Tang, Zhan [1 ]
Diao, Lei [1 ]
Li, Lin [1 ]
机构
[1] China Agr Univ, Coll Informat & Elect Engn, Beijing 100083, Peoples R China
关键词
Chinese agricultural diseases and pests named entity recognition; Corpus; Multi-scale local context features; Convolutional neural networks; Self-attention mechanism; NEURAL-NETWORK;
D O I
10.1016/j.compag.2020.105830
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
Chinese named entity recognition is a crucial initial step of information extraction in the field of agricultural diseases and pests. This step aims to identify named entities related to agricultural diseases and pests from unstructured texts but presents challenges. The available corpus in this domain is limited, and most existing named entity recognition methods only focus on the global context information but neglect potential local context features, which are also equally important for named entity recognition. To solve the above problems and tackle the named entity recognition task in this paper, an available corpus toward agricultural diseases and pests, namely AgCNER, which contains 11 categories and 34,952 samples, was established. Compared with the corpora in the same field, this corpus has additional categories and more sample sizes. Then, a novel Chinese named entity recognition model via joint multi-scale local context features and the self-attention mechanism was proposed. The original Bi-directional Long Short-Term Memory and Conditional Random Field model (BiLSTM-CRF) was improved by fusing the multi-scale local context features extracted by Convolutional Neural Network (CNN) with different kernel sizes. The self-attention mechanism was also used to break the limitation of BiLSTM-CRF in capturing long-distance dependencies and further improve the model performance. The performance of the proposed model was evaluated on three corpora, namely AgCNER, Resume, and MSRA, which achieved the optimal F1-values of 94.15%, 94.56%, and 90.55%, respectively. Experimental results in many aspects illustrated the effective performance of the proposed model in this paper.
引用
收藏
页数:10
相关论文
共 48 条
[1]  
Aguilar G., 2019, ARXIV190604135
[2]   Ontology-based data acquisition model development for agricultural open data platforms and implementation of OWL2MVC tool [J].
Aydin, Sahin ;
Aydin, Mehmet N. .
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2020, 175
[3]  
Chen H, 2019, AAAI CONF ARTIF INTE, P6236
[4]   Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition [J].
Cho, Minsoo ;
Ha, Jihwan ;
Park, Chihyun ;
Park, Sanghyun .
JOURNAL OF BIOMEDICAL INFORMATICS, 2020, 103
[5]  
Cordonnier Jean-Baptiste, 2019, On the relationship between self-attention and convolutional layers
[6]  
Deng J, 2020, ARXIV200200735
[7]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[8]  
dos Santos Ccero., 2015, Proceedings of NEWS 2015 The Fifth Named Entities Workshop, P25
[9]   A survey of the applications of text mining for agriculture [J].
Drury, Brett ;
Roche, Mathieu .
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2019, 163
[10]   End-to-end sequence labeling via deep learning for automatic extraction of agricultural regulations [J].
Espejo-Garcia, Borja ;
Lopez-Pellicer, Francisco J. ;
Lacasta, Javier ;
Piedrafita Moreno, Ramon ;
Javier Zarazaga-Soria, F. .
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2019, 162 :106-111