Bridging the Kuwaiti Dialect Gap in Natural Language Processing

被引:0
|
作者
Husain, Fatemah [1 ]
Alostad, Hana [2 ]
Omar, Halima [3 ]
机构
[1] Kuwait Univ, Sabah AlSalem Univ City Alshadadiya, Coll Life Sci, Informat Sci Dept, Safat 13060, Kuwait
[2] Gulf Univ Sci & Technol, Coll Arts & Sci, Comp Sci Dept, Hawally 32093, Kuwait
[3] Kuwait Univ, Sabah AlSalem Univ City Alshadadiya, Coll Life Sci, Commun Disorders Sci Dept, Safat 13060, Kuwait
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Natural language processing; Sentiment analysis; Labeling; Linguistics; Annotations; Cleaning; Text categorization; Zero-shot learning; Machine learning; weak supervision; zero-shot language model; sentiment analysis; Arabic language; machine learning; Kuwaiti dialect;
D O I
10.1109/ACCESS.2024.3364367
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The available dialectal Arabic linguistic resources are very limited in their coverage of Arabic dialects, particularly the Kuwaiti dialect. This shortage of linguistic resources creates struggles for researchers in the Natural Language Processing (NLP) field and limits the development of advanced linguistic analytical and processing tools for the Kuwaiti dialect. Many other low-resource Arabic dialects are still not explored in research due to the challenges faced during the annotators' recruitment process for dataset labeling. This paper proposes a weak supervised classification system to solve the problem of recruiting human annotators called "q8SentiLabeler". In addition, we developed a large dataset consisting of over 16.6k posts serving sentiment analysis in the Kuwaiti dialect. This dataset covers several themes and timeframes to remove any bias that might affect its content. Furthermore, we evaluated our dataset using multiple traditional machine-learning classifiers and advanced deep-learning language models to test its performance. Results demonstrate the positive potential of "q8SentiLabeler" to replace human annotators with a 93% for pairwise percent agreement and 0.87 for Cohen's Kappa coefficient. Using the ARBERT model on our dataset, we achieved 89% accuracy in the system's performance.
引用
收藏
页码:27709 / 27722
页数:14
相关论文
共 50 条
  • [1] Sentiment analysis dataset in Moroccan dialect: bridging the gap between Arabic and Latin scripted dialect
    Jbel, Mouad
    Jabrane, Mourad
    Hafidi, Imad
    Metrane, Abdulmutallib
    LANGUAGE RESOURCES AND EVALUATION, 2024,
  • [2] Natural language processing for Nepali text: a review
    Shahi, Tej Bahadur
    Sitaula, Chiranjibi
    ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (04) : 3401 - 3429
  • [3] Natural language processing for Nepali text: a review
    Tej Bahadur Shahi
    Chiranjibi Sitaula
    Artificial Intelligence Review, 2022, 55 : 3401 - 3429
  • [4] On Natural Language Processing Applications for Military Dialect Classification
    Gunasekara, Charith
    Carryer, Tobias
    Triff, Matt
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 211 - 218
  • [5] Screening for Depression Using Natural Language Processing:Literature Review
    Teferra, Bazen Gashaw
    Rueda, Alice
    Pang, Hilary
    Valenzano, Richard
    Samavi, Reza
    Krishnan, Sridhar
    Bhat, Venkat
    INTERACTIVE JOURNAL OF MEDICAL RESEARCH, 2024, 13
  • [6] A review of natural language processing techniques for opinion mining systems
    Sun, Shiliang
    Luo, Chen
    Chen, Junyu
    INFORMATION FUSION, 2017, 36 : 10 - 25
  • [7] Natural language processing (NLP) in management research: A literature review
    Kang, Yue
    Cai, Zhao
    Tan, Chee-Wee
    Huang, Qian
    Liu, Hefu
    JOURNAL OF MANAGEMENT ANALYTICS, 2020, 7 (02) : 139 - 172
  • [8] Analysis of news sentiments using natural language processing and deep learning
    Vicari, Mattia
    Gaspari, Mauro
    AI & SOCIETY, 2021, 36 (03) : 931 - 937
  • [9] Anomaly Detection on Natural Language Processing to Improve Predictions on Tourist Preferences
    Meira, Jorge
    Carneiro, Joao
    Bolon-Canedo, Veronica
    Alonso-Betanzos, Amparo
    Novais, Paulo
    Marreiros, Goreti
    ELECTRONICS, 2022, 11 (05)
  • [10] Analysis of news sentiments using natural language processing and deep learning
    Mattia Vicari
    Mauro Gaspari
    AI & SOCIETY, 2021, 36 : 931 - 937