Lexeme connexion measure of cohesive lexical ambiguity revealing factor: a robust approach for word sense disambiguation of Bengali text

被引:2
作者
Das Dawn, Debapratim [1 ]
Khan, Abhinandan [1 ,2 ]
Shaikh, Soharab Hossain [3 ]
Pal, Rajat Kumar [1 ]
机构
[1] Univ Calcutta, Dept Comp Sci & Engn, Acharya Prafulla Chandra Roy Shiksha Prangan, JD-2,Sect 3, Kolkata 700106, India
[2] ARP Engn, Prod Dev & Diversificat, 147 Nilgunj Rd, Kolkata 700056, India
[3] BML Munjal Univ, Dept Comp Sci & Engn, Natl Highway 8,67KM Milestone, Gurugram 122413, Haryana, India
关键词
Word sense disambiguation; WSD of resource scaring languages; WSD of Indian languages; Polysemous word; Sense identification;
D O I
10.1007/s11042-023-14676-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Word sense disambiguation (WSD) is the process of finding out the appropriate meaning of a polysemous word based on any given context. The Bengali language inherently comprises a large number of polysemous words. Recently, researchers in the domain of linguistics have been attracted to the problem of WSD in Bengali text due to its numerous interesting applications, viz. machine translation, opinion polarity identification, question-answering systems, etc. In this paper, lexeme connexion measure of cohesive lexical ambiguity revealing factor has been proposed that takes a decision on the disambiguation of senses of a Bengali polysemous word. All the polysemous words have been treated as target words, and a context window of three different sizes, viz. five, seven, and ten are considered based on these target words. This paper has generated lexeme harmony measure for quantifying heuristically of syntactic belongings of a collection of lexemes in Bengali text. The proposed methodology has been extracted a feature vector by considering the cohesive lexical ambiguity revealing factor or CLARF, depending on frame lexeme harmony (FLH), sense lexeme harmony (SLH), polysemy singularity coherence (PSC), polysemy distribution factor (PDF), and relative polysemy singularity coherence (RPSC) factor of a lexeme. This Bengali WSD technique has been applied max-rule of integrated lexeme connexion measure (LCM) of each lexeme of both the testing and training cases score for sense recognition. The proposed algorithm has succeeded in eliminating the drawback of the Bengali WSD approaches, as it can focus on both the lexical and semantic relationships between words. The performance of this algorithm has been evaluated on a dataset that consists of 100 polysemous words of three/four senses. Various evaluation metrics have been used to analyse the results obtained by the proposed algorithm. The obtained results indicate the robustness of the proposed algorithm.
引用
收藏
页码:12939 / 12983
页数:45
相关论文
共 58 条
[1]  
Agirre E, 2006, TEXT SPEECH LANG TEC, V33, P1, DOI 10.1007/978-1-4020-4809-8
[2]  
Agirre E., 2007, Procs. of the 4th Intl. Workshop on Semantic Evaluations (SemEval-2007), P342
[3]  
Anand Kumar M., 2014, INT J APPL ENG RES, V9, P7609
[4]  
[Anonymous], 2004, INT S MACH TRANSL SU
[5]  
[Anonymous], 2003, P AUSTRALASIAN LANGU
[6]  
[Anonymous], 2009, P ACL IJCNLP C SHORT
[7]  
Banerjee Somnath, 2014, Text, Speech and Dialogue. 17th International Conference, TSD 2014. Proceedings: LNCS 8655, P217, DOI 10.1007/978-3-319-10816-2_27
[8]  
Biswas M, 2021, INT C MACH LEARN BIG, P22
[9]  
Dang HT, 2002, P 19 INT C COMPUTATI, V1, P1
[10]  
Das A, 2013, PROC INT C NATURAL L, V10, P20