Correcting word segmentation and part-of-speech tagging errors for Chinese named entity recognition

被引:0
作者
Yao, TF [1 ]
Wei, D [1 ]
Erbach, G [1 ]
机构
[1] Univ Saarland, Computat Linguist Dept, D-66041 Saarbrucken, Germany
来源
INTERNET CHALLENGE: TECHNOLOGY AND APPLICATIONS | 2002年
关键词
information extraction; named entity recognition; machine learning; finite-state cascades;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the exploration of Chinese named entity recognition for a specific domain, the authors found that the errors caused during word segmentation and part-of-speech (POS) tagging have obstructed the improvement of the recognition performance. In order to further enhance recognition recall and precision, the authors propose an error correction approach for Chinese named entity recognition. In the error correction component, transformation-based machine learning is adopted because it is suitable to fix Chinese word segmentation and POS tagging errors and produce effective correcting rules automatically. The Chinese named entity recognition component utilizes Finite-State Cascades which are. automatically constructed by POS rules with semantic constraints. A prototype system, CNERS (Chinese Named Entity Recognition System), has been implemented. The experimental result shows that the recognition performance of most named entities have significantly been improved. On the other hand, the system is also fast and reliable.
引用
收藏
页码:29 / 36
页数:8
相关论文
共 10 条
[1]  
ABNEY S, 1996, P ESSLLI 96 ROB PARS
[2]  
BRILL E, 1995, COMPUTATIONAL LINGUI, V21
[3]  
CHEN HH, 1998, P 7 MESS UND C FAIRF
[4]  
DONG ZD, 1999, HOWNET
[5]  
HOCKENMAIER J, 1998, COMMUNICATIONS COLIP, V8
[6]  
KAMEYAMA M, 1997, AAAI SPRING S CROSS
[7]  
LIN XG, 1994, DICT VERBS CONT CHIN
[8]  
LIU KY, 2000, AUTOMATIC SEGMENTATI
[9]  
PALMER D, 1997, AAAI SPRING S CROSS
[10]  
WILKS Y, 1997, LECT NOTES ARTIF INT, V1299, P1