Chinese Named Entity Recognition via Joint Identification and Categorization

被引:0
作者
Zhou Junsheng [1 ,2 ]
Qu Weiguang [1 ,2 ]
Zhang Fen [2 ]
机构
[1] Jiangsu Res Ctr Informat Secur & Privacy Technol, Nanjing 210046, Jiangsu, Peoples R China
[2] Nanjing Normal Univ, Sch Comp Sci & Technol, Nanjing 210046, Jiangsu, Peoples R China
来源
CHINESE JOURNAL OF ELECTRONICS | 2013年 / 22卷 / 02期
基金
中国国家自然科学基金;
关键词
Named entity recognition; Entity-level features; Sequence labeling approach; Joint identification and categorization;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Chinese Named entity recognition (NER) is an important task for Chinese information processing. Traditional sequence labeling approaches to Chinese NER cannot treat globally a string of continuous characters as a named entity candidate so that the entity-level features cannot be exploited in a natural way. To deal with this problem, we formulate Chinese NER as a joint identification and categorization task that performs the two subtasks simultaneously: boundary identification and entity categorization, together with segmentation. The proposed approach provides a natural formulation to treats pieces of continuous characters as named entity candidates, which allows for more accurate prediction by examining both the internal evidence and contextual information of the candidates. Within this framework, we explored a variety of effective feature representations for Chinese NER. Closed tests on two quite different corpora from the third SIGHAN bakeoff show that our approach significantly outperforms the best in the literature, achieving state-of-the-art performance.
引用
收藏
页码:225 / 230
页数:6
相关论文
共 16 条
[1]  
[Anonymous], 2006, P 5 SIGHAN WORKSHOP
[2]  
[Anonymous], 2009, P JOINT C 47 ANN M A
[3]  
[Anonymous], 2001, PROC 18 INT C MACH L
[4]  
Chen A., 2006, P 5 SIGHAN WORKSH CH, P173
[5]  
Chen W., 2006, P 5 SIGHAN WORKSH CH, P118
[6]  
Collins M, 2002, PROCEEDINGS OF THE 2002 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, P1
[7]  
Collins M., 2004, P ANN M ASS COMPUTAT, P111
[8]   An improved feature extraction approach based on rough sets for the medical diagnosis [J].
Jiang, Wei ;
Li, Yi-Jun ;
Pang, Xiu-Li .
PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, :385-390
[9]  
Levow G, 2006, P 5 SIGHAN WORKSH CH, P108
[10]  
McCallum A., 2003, Proceedings of CoNLL, P188