Chinese Named Entity Recognition using a Morpheme-based Chunking Tagger

被引:1
作者
Fu, Guohong [1 ]
机构
[1] Heilongjiang Univ, Sch Comp Sci & Technol, Harbin 150080, Peoples R China
来源
2009 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING | 2009年
关键词
named entity recognition; entity pattern rules; morpheme-based chunking; SEGMENTATION;
D O I
10.1109/IALP.2009.68
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most previous studies formalize Chinese named entity recognition (NER) as a chunking task with either characters or lexicon words as the basic tokens for chunking. However, it is difficult under this formulation to explore lexical information for NER. Furthermore, traditional NER chunking systems usually employ an exhaustive strategy for entity candidate generation, obviously resulting in efficiency loss during entity decoding. In this paper we propose a morpheme-based chunking framework for Chinese NER and implement an efficient three-stage tagger using the pipeline strategy. To tackle the problem of out-of-vocabulary words and to more effectively explore lexical cues for NER as well, we distinguish named entities from common words and choose morphemes as the basic tokens for entity chunking. To reduce the space of entity candidates and improve the efficiency of entity decoding, we employ internal entity formation pattern rules during entity candidate generation. Our experiments on different datasets show that our system can greatly improve NER efficiency without much degradation of performance.
引用
收藏
页码:289 / 292
页数:4
相关论文
共 12 条
[1]  
Baayen R.H., 1989, THESIS FREE U AMSTER
[2]  
Fu G., 2005, ACM SIGKDD Explorations Newsletter, V7, P19, DOI DOI 10.1145/1089815.1089819
[3]  
Fu GH, 2004, PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, P2612
[4]   Chinese word segmentation as morpheme-based lexical chunking [J].
Fu, Guohong ;
Kit, Chunyu ;
Webster, Jonathan J. .
INFORMATION SCIENCES, 2008, 178 (09) :2282-2296
[5]  
Jin Guangjin., 2008, Proceedings of Sixth SIGHAN Workshop on Chinese Language Processing, P69
[6]  
Lee S.-Z., 2000, P 18 C COMP LING COL, P481
[7]  
Ratinov L., 2009, P 13 CONLL, P147, DOI [DOI 10.3115/1596374.1596399, 10.3115/1596374.1596399]
[8]  
WU ZM, 1995, J AM SOC INFORM SCI, V46, P83, DOI 10.1002/(SICI)1097-4571(199503)46:2<83::AID-ASI2>3.0.CO
[9]  
2-0
[10]  
Xue N, 2003, International Journal of Computational Linguistics & Chinese Language Processing, V8, P29