Unsupervised word sense disambiguation with N-gram features

被引:0
作者
Daniel Preotiuc-Pietro
Florentina Hristea
机构
[1] University of Sheffield,Department of Computer Science
[2] University of Bucharest,Department of Computer Science
来源
Artificial Intelligence Review | 2014年 / 41卷
关键词
Bayesian classification; The EM algorithm; Word sense disambiguation; Unsupervised disambiguation; Web-scale N-grams;
D O I
暂无
中图分类号
学科分类号
摘要
The present paper concentrates on the issue of feature selection for unsupervised word sense disambiguation (WSD) performed with an underlying Naïve Bayes model. It introduces web N-gram features which, to our knowledge, are used for the first time in unsupervised WSD. While creating features from unlabeled data, we are “helping” a simple, basic knowledge-lean disambiguation algorithm to significantly increase its accuracy as a result of receiving easily obtainable knowledge. The performance of this method is compared to that of others that rely on completely different feature sets. Test results concerning nouns, adjectives and verbs show that web N-gram feature selection is a reliable alternative to previously existing approaches, provided that a “quality list” of features, adapted to the part of speech, is used.
引用
收藏
页码:241 / 260
页数:19
相关论文
共 22 条
[1]  
Dempster AP(1977)Maximum likelihood from incomplete data via the EM algorithm J R Stat Soc B 39 1-38
[2]  
Laird NM(1992)A method for disambiguating word senses in a large corpus Comput Humanit 26 415-439
[3]  
Rubin DB(2009)Recent advances concerning the usage of the naive bayes model in unsupervised word sense disambiguation Int Rev Comput Softw 4 58-67
[4]  
Gale W(2009)Adjective sense disambiguation at the border between unsupervised and knowledge-based techniques Fundam Inf 91 547-562
[5]  
Church K(2008)Performing word sense disambiguation at the border between unsupervised and knowledge-based techniques Artif Intell Rev 30 67-86
[6]  
Yarowsky D(2003)Using the web to obtain frequencies for unseen bigrams Comput Linguist 29 459-484
[7]  
Hristea F(1990)Nouns in wordnet: a lexical inheritance system Int J Lexicogr 3 245-264
[8]  
Hristea F(1995)Wordnet: a lexical database for English Commun ACM 38 39-41
[9]  
Popescu M(1990)Wordnet: an on-line lexical database Int J Lexicogr 3 235-244
[10]  
Hristea F(1998)Automatic word sense discrimination Comput Linguist 24 97-123