Bridging the Kuwaiti Dialect Gap in Natural Language Processing

被引:0
|
作者
Husain, Fatemah [1 ]
Alostad, Hana [2 ]
Omar, Halima [3 ]
机构
[1] Kuwait Univ, Sabah AlSalem Univ City Alshadadiya, Coll Life Sci, Informat Sci Dept, Safat 13060, Kuwait
[2] Gulf Univ Sci & Technol, Coll Arts & Sci, Comp Sci Dept, Hawally 32093, Kuwait
[3] Kuwait Univ, Sabah AlSalem Univ City Alshadadiya, Coll Life Sci, Commun Disorders Sci Dept, Safat 13060, Kuwait
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Natural language processing; Sentiment analysis; Labeling; Linguistics; Annotations; Cleaning; Text categorization; Zero-shot learning; Machine learning; weak supervision; zero-shot language model; sentiment analysis; Arabic language; machine learning; Kuwaiti dialect;
D O I
10.1109/ACCESS.2024.3364367
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The available dialectal Arabic linguistic resources are very limited in their coverage of Arabic dialects, particularly the Kuwaiti dialect. This shortage of linguistic resources creates struggles for researchers in the Natural Language Processing (NLP) field and limits the development of advanced linguistic analytical and processing tools for the Kuwaiti dialect. Many other low-resource Arabic dialects are still not explored in research due to the challenges faced during the annotators' recruitment process for dataset labeling. This paper proposes a weak supervised classification system to solve the problem of recruiting human annotators called "q8SentiLabeler". In addition, we developed a large dataset consisting of over 16.6k posts serving sentiment analysis in the Kuwaiti dialect. This dataset covers several themes and timeframes to remove any bias that might affect its content. Furthermore, we evaluated our dataset using multiple traditional machine-learning classifiers and advanced deep-learning language models to test its performance. Results demonstrate the positive potential of "q8SentiLabeler" to replace human annotators with a 93% for pairwise percent agreement and 0.87 for Cohen's Kappa coefficient. Using the ARBERT model on our dataset, we achieved 89% accuracy in the system's performance.
引用
收藏
页码:27709 / 27722
页数:14
相关论文
共 50 条
  • [11] Sentiment Analysis of Multilingual Tweets Based on Natural Language Processing (NLP)
    Bera, Abhijit
    Ghose, Mrinal Kanti
    Pal, Dibyendu Kumar
    INTERNATIONAL JOURNAL OF SYSTEM DYNAMICS APPLICATIONS, 2021, 10 (04)
  • [12] Natural Language Processing in Game Studies Research: An Overview
    Zagal, Jose P.
    Tomuro, Noriko
    Shepitsen, Andriy
    SIMULATION & GAMING, 2012, 43 (03) : 356 - 373
  • [13] From NLP (Natural Language Processing) to MLP (Machine Language Processing)
    Teufl, Peter
    Payer, Udo
    Lackner, Guenter
    COMPUTER NETWORK SECURITY, 2010, 6258 : 256 - +
  • [14] Stress detection using natural language processing and machine learning over social interactions
    Nijhawan, Tanya
    Attigeri, Girija
    Ananthakrishna, T.
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [15] A Review of the Trends and Challenges in Adopting Natural Language Processing Methods for Education Feedback Analysis
    Shaik, Thanveer
    Tao, Xiaohui
    Li, Yan
    Dann, Christopher
    McDonald, Jacquie
    Redmond, Petrea
    Galligan, Linda
    IEEE ACCESS, 2022, 10 : 56720 - 56739
  • [16] Stress detection using natural language processing and machine learning over social interactions
    Tanya Nijhawan
    Girija Attigeri
    T. Ananthakrishna
    Journal of Big Data, 9
  • [17] GeoNLU: Bridging the gap between natural language and spatial data infrastructures
    Naveen, Palanichamy
    Maheswar, Rajagopal
    Trojovsky, Pavel
    ALEXANDRIA ENGINEERING JOURNAL, 2024, 87 : 126 - 147
  • [18] Language as a biomarker for psychosis: A natural language processing approach
    Corcoran, Cheryl M.
    Mittal, Vijay A.
    Bearden, Carrie E.
    Gur, Raquel E.
    Hitczenko, Kasia
    Bilgrami, Zarina
    Savic, Aleksandar
    Cecchi, Guillermo A.
    Wolff, Phillip
    SCHIZOPHRENIA RESEARCH, 2020, 226 : 158 - 166
  • [19] Natural Language Processing for Sentiment Analysis
    Chong, Wei Yen
    Selvaretnam, Bhawani
    Soon, Lay-Ki
    PROCEEDINGS 2014 4TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE WITH APPLICATIONS IN ENGINEERING AND TECHNOLOGY ICAIET 2014, 2014, : 212 - 217
  • [20] Bayesian Analysis in Natural Language Processing
    Cohen S.
    Synthesis Lectures on Human Language Technologies, 2016, 9 (02): : 1 - 276