Chinese word sense disambiguation based on maximum entropy model with feature selection

被引:6
作者
He J.-Z. [1 ,2 ]
Wang H.-F. [1 ,2 ]
机构
[1] Institute of Computational Linguistics, School of Electronic Engineering and Computer Science, Peking University
[2] Key Laboratory of Computational Linguistics (Ministry of Education), Peking University
来源
Ruan Jian Xue Bao/Journal of Software | 2010年 / 21卷 / 06期
关键词
Automatic feature selection; Chinese word sense disambiguation; Classification feature; Maximum entropy model;
D O I
10.3724/SP.J.1001.2010.03591
中图分类号
学科分类号
摘要
Word sense disambiguation (WSD) can be thought as a classification problem. Feature selection is of great importance in such a task. In general, features are selected manually, which requires a deep understanding of the task itself and the employed classification model. In this paper, the effect of feature template on Chinese WSD is studied, and an automatic feature selection algorithm based on maximum entropy model (MEM) is proposed, including uniform feature template selection for all ambiguous words and customized feature template selection for each word. Experimental result shows that automatic feature selection can reduce feature size and improve Chinese WSD performance. Compared with the best evaluation results of SemEval 2007: task #5, this method gets MicroAve (micro-average accuracy) increase 3.10% and MacroAve (macro-average accuracy) 2.96% respectively. © by Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:1287 / 1295
页数:8
相关论文
共 15 条
[1]  
Bar H., The present status of automatic translations of languages, Advances in Computers, 1, pp. 91-163, (1960)
[2]  
Pedersen T., A simple approach to building ensembles of naive Bayesian classifiers for word sense disambiguation, Proc. of the North American Chapter of the Association for Computational Linguistics (NAACL), pp. 63-69, (2000)
[3]  
Xing Y., SRCB-WSD: Supervised Chinese word sense disambiguation with key features, Proc. of the 4th Int'l Workshop on Semantic Evaluations (SemEval-2007), pp. 300-303, (2007)
[4]  
Yee K.O., CITYU-HIF: WSD with human-informed feature preference, Proc. of the 4th Int'l Workshop on Semantic Evaluations (SemEval-2007), pp. 109-112, (2007)
[5]  
Jin P., Wu Y.F., Yu S.W., SemEval-2007 Task 5: Multilingual Chinese-English lexical sample, Proc. of the 4th Int'l Workshop on Semantic Evaluations (SemEval-2007), pp. 19-23, (2007)
[6]  
Mihalcea R., Co-Training and self-training for word sense disambiguation, Proc. of the CoNLL 2004
[7]  
Mihalcea R., Chklovski T., Killgariff A., The Senseval-3 English lexical sample task, Proc. of the 3rd Int'l Workshop on the Evaluation of Systems for the Semantic Analysis of Text (Senseval-3), (2004)
[8]  
Pham T.P., Ng H.T., Lee W.S., Word sense disambiguation with semisupervised learning, Proc. of the 20th AAAI Conf. on Artificial Intelligence (AAAI-2005), (2005)
[9]  
Yarowsky D., Unsupervised word sense disambiguation rivaling supervised methods, Proc. of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL 1995), pp. 189-196, (1995)
[10]  
Quan C.Q., He T.T., Ji D.H., Yu S.W., Word sense disambiguation based on multi-classifier decision, Journal of Computer Research and Development, 43, 5, pp. 933-939, (2006)