Word Sense Disambiguation in Bengali language using unsupervised methodology with modifications

被引:0
作者
Alok Ranjan Pal
Diganta Saha
机构
[1] College of Engineering and Management,Department of Computer Science and Engineering
[2] Jadavpur University,Department of Computer Science and Engineering
来源
Sādhanā | 2019年 / 44卷
关键词
Natural language processing; word sense disambiguation; principal component analysis; context expansion;
D O I
暂无
中图分类号
学科分类号
摘要
In this work, Word Sense Disambiguation (WSD) in Bengali language is implemented using unsupervised methodology. In the first phase of this experiment, sentence clustering is performed using Maximum Entropy method and the clusters are labelled with their innate senses by manual intervention, as these sense-tagged clusters could be used as sense inventories for further experiment. In the next phase, when a test data comes to be disambiguated, the Cosine Similarity Measure is used to find the closeness of that test data with the initially sense-tagged clusters. The minimum distance of that test data from a particular sense-tagged cluster assigns the same sense to the test data as that of the cluster it is assigned with. This strategy is considered as the baseline strategy, which produces 35% accurate result in WSD task. Next, two extensions are adopted over this baseline strategy: (a) Principal Component Analysis (PCA) over the feature vector, which produces 52% accuracy in WSD task and (b) Context Expansion of the sentences using Bengali WordNet coupled with PCA, which produces 61% accuracy in WSD task. The data sets that are used in this work are obtained from the Bengali corpus, developed under the Technology Development for the Indian Languages (TDIL) project of the Government of India, and the lexical knowledge base (i.e., the Bengali WordNet) used in the work is developed at the Indian Statistical Institute, Kolkata, under the Indradhanush Project of the DeitY, Government of India. The challenges and the pitfalls of this work are also described in detail in the pre-conclusion section.
引用
收藏
相关论文
共 39 条
[1]  
Ide N(1998)Word sense disambiguation: the state of the art Computational Linguistics 24 1-40
[2]  
Véronis J(2009)Word sense disambiguation: a survey ACM Computing Surveys 41 1-69
[3]  
Navigli R(2004)Unsupervised word sense disambiguation using WordNet relatives Computer Speech and Language 18 253-273
[4]  
Seo H(2012)A clustering-based approach for unsupervised word sense disambiguation Procesamiento del Lenguaje Natural 49 49-56
[5]  
Chung H(2014)A decision tree based word sense disambiguation system in Manipuri language Advanced Computing: An International Journal 5 17-22
[6]  
Rim H(2011)Natural language engineering: the study of word sense disambiguation in Punjabi Research Cell: An International Journal of Engineering Sciences 1 230-238
[7]  
Myaeng SH(2016)Decision tree based word sense disambiguation for Assamese International Journal of Computer Applications 141 42-48
[8]  
Kim S(2014)Study of Hindi word sense disambiguation based on Hindi WorldNet International Journal for Research in Applied Science and Engineering Technology 2 390-395
[9]  
Martin WT(2012)A graph-based approach to word sense disambiguation for Hindi language International Journal of Scientific Research Engineering & Technology 1 313-318
[10]  
Berlanga LR(2013)Mining association rules based approach to word sense disambiguation for Hindi language International Journal of Emerging Technology and Advanced Engineering 3 470-473