NAIVE BAYES CLASSIFIER FOR WORD SENSE DISAMBIGUATION OF PUNJABI LANGUAGE

被引:8
作者
Singh, Varinder Pal [1 ]
Kumar, Parteek [1 ]
机构
[1] Thapar Univ, Comp Sci & Engn Dept, Patiala 147004, Punjab, India
关键词
Word sense disambiguation; Bag of words model; Collocation model; Naive Bayes classifier;
D O I
10.22452/mjcs.vol31no3.2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Word Sense Disambiguation (WSD) is the process of identifying the correct sense of the word in the context. The most leading scheme used by WSD is machine learning approach, where a human expert provides examples of correctly disambiguated words, and a machine learning algorithm is used to induce a model from these examples. In this paper, Naive Bayes supervised classifier has been used to disambiguate words of Punjabi language. The feature extraction process plays a vital role in building the supervised machine learning models. For the proposed Punjabi WSD system, Bag of Words (BoW) and collocation models are used separately to extract relevant features. BoW model has used all words around target word while collocation model has used two words before and two words after the target word as features. Both the models have used a common training data set to build the model. It has been observed that the selection of smoothing parameter for Naive Bayes has a significant impact on its performance. This proposed work has been tested on 150 most ambiguous noun words selected form Punjabi WordNet having 6 or more senses. During the process of building the model, fine senses of ambiguous words have been merged to produce coarse sense on the basis of manual analysis of lexical relations of WordNet. The accuracy of the proposed system has been calculated independently for BoW and collocation model. The proposed WSD system achieves an accuracy of 89% for BoW model and 81% for collocation model. It has been concluded that BoW model performs better than the collocation model for WSD task for Punjabi language.
引用
收藏
页码:188 / 199
页数:12
相关论文
共 28 条
  • [1] AlvaroCuesta, 2014, MALAYS J COMPUT SCI, V27, P50
  • [2] [Anonymous], 2009, Ethnologue: languages of the world
  • [3] Trends in word sense disambiguation
    Bhala, R. V. Vidhu
    Abirami, S.
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2014, 42 (02) : 159 - 171
  • [4] Borah PP, 2014, 2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), P946, DOI 10.1109/IC3I.2014.7019726
  • [5] Chan Y.S., 2007, ANN M ASS COMP LING, V45, P33
  • [6] Chklovski T.A., 2004, SENSEVAL 3 MULTILING, P1
  • [7] Florian R., 2002, Natural Language Engineering, V8, P327, DOI 10.1017/S1351324902002978
  • [8] Guo Jiang, 2010, 2010 International Conference on Artificial Intelligence and Computational Intelligence (AICI 2010), P433, DOI 10.1109/AICI.2010.97
  • [9] Ide N, 1998, COMPUT LINGUIST, V24, P1
  • [10] Josan GS, 2008, INT J TRANSL, V20, P47