Neural Chinese Named Entity Recognition via CNN-LSTM-CRF and Joint Training with Word Segmentation

被引:47
作者
Wu, Fangzhao [1 ]
Liu, Junxin [2 ]
Wu, Chuhan [2 ]
Huang, Yongfeng [2 ]
Xie, Xing [1 ]
机构
[1] Microsoft Res Asia, Beijing, Peoples R China
[2] Tsinghua Univ, Beijing, Peoples R China
来源
WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019) | 2019年
基金
中国国家自然科学基金;
关键词
Named Entity Recognition; Word Segmentation; Neural Network;
D O I
10.1145/3308558.3313743
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Chinese named entity recognition (CNER) is an important task in Chinese natural language processing field. However, CNER is very challenging since Chinese entity names are highly context-dependent. In addition, Chinese texts lack delimiters to separate words, making it difficult to identify the boundary of entities. Besides, the training data for CNER in many domains is usually insufficient, and annotating enough training data for CNER is very expensive and time-consuming. In this paper, we propose a neural approach for CNER. First, we introduce a CNN-LSTM-CRF neural architecture to capture both local and long-distance contexts for CNER. Second, we propose a unified framework to jointly train CNER and word segmentation models in order to enhance the ability of CNER model in identifying entity boundaries. Third, we introduce an automatic method to generate pseudo labeled samples from existing labeled data which can enrich the training data. Experiments on two benchmark datasets show that our approach can effectively improve the performance of Chinese named entity recognition, especially when training data is insufficient.
引用
收藏
页码:3342 / 3348
页数:7
相关论文
共 33 条
[1]  
[Anonymous], P 1 WORK SUBW CHAR L
[2]  
[Anonymous], 2015, Nature, DOI [10.1038/nature14539, DOI 10.1038/NATURE14539]
[3]  
[Anonymous], 2015, C EMP METH NAT LANG
[4]  
[Anonymous], 2016, P 2016 C N AM CHAPT
[5]  
Chen A., 2006, P 5 SIGHAN WORKSH CH, P173
[6]  
Chiu J.P.C., 2016, Trans. Assoc. Comput. Linguist., V4, P357
[7]  
Collobert R, 2011, J MACH LEARN RES, V12, P2493
[8]  
Dauphin YN, 2015, ADV NEUR IN, V28
[9]   Multichannel LSTM-CRF for Named Entity Recognition in Chinese Social Media [J].
Dong, Chuanhai ;
Wu, Huijia ;
Zhang, Jiajun ;
Zong, Chengqing .
CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2017, 2017, 10565 :197-208
[10]   Character-Based LSTM-CRF with Radical-Level Features for Chinese Named Entity Recognition [J].
Dong, Chuanhai ;
Zhang, Jiajun ;
Zong, Chengqing ;
Hattori, Masanori ;
Di, Hui .
NATURAL LANGUAGE UNDERSTANDING AND INTELLIGENT APPLICATIONS (NLPCC 2016), 2016, 10102 :239-250