Named Entity Recognition by Using XLNet-BiLSTM-CRF

被引:46
作者
Yan, Rongen [1 ]
Jiang, Xue [2 ]
Dang, Depeng [1 ]
机构
[1] Beijing Normal Univ, Sch Artificial Intelligence, Beijing 100875, Peoples R China
[2] Univ Sci & Technol Beijing, Beijing Adv Innovat Ctr Mat Genome Engn, Beijing 100083, Peoples R China
基金
中国国家自然科学基金;
关键词
Natural language process; Named entity recognition; XLNet; Bi-directional long short-term memory; Tagging;
D O I
10.1007/s11063-021-10547-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Named entity recognition (NER) is the basis for many natural language processing (NLP) tasks such as information extraction and question answering. The accuracy of the NER directly affects the results of downstream tasks. Most of the relevant methods are implemented using neural networks, however, the word vectors obtained from a small data set cannot describe unusual, previously-unseen entities accurately and the results are not sufficiently accurate. Recently, the use of XLNet as a new pre-trained model has yielded satisfactory results in many NLP tasks, integration of XLNet embeddings in existent NLP tasks is not straightforward. In this paper, a new neural network model is proposed to improve the effectiveness of the NER by using a pre-trained XLNet, bi-directional long-short term memory (Bi-LSTM) and conditional random field (CRF). Pre-trained XLNet model is used to extract sentence features, then the classic NER neural network model is combined with the obtained features. In addition, the superiority of XLNet in NER tasks is demonstrated. We evaluate our model on the CoNLL-2003 English dataset and WNUT-2017 and show that the XLNet-BiLSTM-CRF obtains state-of-the-art results.
引用
收藏
页码:3339 / 3356
页数:18
相关论文
共 47 条
[31]   Semi-supervised sequence tagging with bidirectional language models [J].
Peters, Matthew E. ;
Ammar, Waleed ;
Bhagavatula, Chandra ;
Power, Russell .
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, :1756-1765
[32]  
Peters Matthew E, 2018, NAACL HLT, P2227
[33]   Pre-trained models for natural language processing: A survey [J].
Qiu XiPeng ;
Sun TianXiang ;
Xu YiGe ;
Shao YunFan ;
Dai Ning ;
Huang XuanJing .
SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2020, 63 (10) :1872-1897
[34]  
Radford A., 2018, Improving language understanding by generative pre-training
[35]  
Ratinov Lev, 2009, Proceedings of the Thirteenth Conference on Computational Natural Language Learning, P147
[36]  
Reimers N, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P567
[37]  
Salah RE., 2017, International Journal on Advanced Science Engineering and Information Technology, V7, P815, DOI 10.18517/ijaseit.7.3.1811
[38]  
Shen Y., 2017, P 2 WORKSHOP REPRESE, P252, DOI 10. 18653/v1/W17-2630
[39]  
Souza Fabio, 2019, Portuguese Named Entity Recognition Using BERT-CRF
[40]  
Strubell E., 2017, P 2017 C EMP METH NA, P1, DOI [10.18653/v1/D17-1283, 10.18653/v1/d17-1283, DOI 10.18653/V1/D17-1283]