Enhancing Word Sense Disambiguation for Amharic homophone words using Bidirectional Long Short-Term Memory network

被引：2

作者：

Belete, Mequanent Degu ^{[1
]}

Shiferaw, Lijalem Getanew ^{[2
]}

Alitasb, Girma Kassa ^{[1
]}

Tamir, Tariku Sinshaw ^{[1
]}

机构：

[1] Debre Markos Univ, Debre Markos Coll Technol, Dept Elect & Comp Engn, Debre Markos, Ethiopia

[2] Debre Markos Univ, Head ICT Dept, Lib Directorate, Debre Markos, Ethiopia

来源：

INTELLIGENT SYSTEMS WITH APPLICATIONS | 2024年 / 23卷

关键词：

Amharic language; Homophone; Machine learning; Deep learning; Bidirectional; BiLSTM; BiGRU; TFIDF; BoW; Word embedding; Amharic word sense disambiguation;

D O I：

10.1016/j.iswa.2024.200417

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Given the Amharic language has a lot of perplexing terminology since it features duplicate homophone letters, fidel's U, rh, and 7 (three of which are pronounced as HA), W and (sic) (both pronounced as SE), (sic) and 0 (both pronounced as AE), and R and 0 (both pronounced as TSE). The WSD (Word Sense Disambiguation) model, which tackles the issue of lexical ambiguity in the context of the Amharic language, is developed using a deep learning technique. Due to the unavailability of the Amharic wordnet, a total of 1756 examples of paired Amharic ambiguous homophonic words were collected. These words were (sic)U5(sic)(dhnet) and (sic)55(sic)(dhnet), 9 degrees U center dot(sic)(m'hur) and 9 degrees dn(sic)(m'hur), fl(sic)(sic)(be'al) and flh1 (be'al), (sic)(sic)C (abiy) and 0RC(abiy), with a total of 1756 examples. Following word preprocessing, word2vec, fasttext, Term Frequency-Inverse Document Frequency (TFIDF), and bag of words (BoW) were used to vectorize the text. The vectorized text was divided into train and test data. The train data was then analysed using Naive Bayes (NB), K-nearest neighbour (KNN), logistic regression (LG), decision trees (DT), random forests (RF), and random oversampling technique. Bidirectional Gate Recurrent Unit (BiGRU) and Bidirectional Long Short-Term Memory (BiLSTM) improved to 99.99 % accuracy even with limited datasets.

引用

页数：6

共 33 条

[1]

Agirre E., 2017, Knowledge Sources for Word Sense Disambiguation

[2]

Alemu A. A., 2020, Ethiopian Journal of Science and Sustainable Development, V8, P2021, DOI [10.20372/ejssdastu:V8.i1.2021.283, DOI 10.20372/EJSSDASTU:V8.I1.2021.283]

[3]

Amare G., 2001, Source: Journal of Ethiopian Studies, V34

[4]

[Anonymous], 2018, Word sense disambiguation and its approaches

[5]

[Anonymous], 2019, lexical ambiguity

[6]

Athiwaratkun B, 2018, Arxiv, DOI arXiv:1806.02901

[7] Contextual word sense tuning and disambiguation [J].