Bridging the Kuwaiti Dialect Gap in Natural Language Processing

被引:0
|
作者
Husain, Fatemah [1 ]
Alostad, Hana [2 ]
Omar, Halima [3 ]
机构
[1] Kuwait Univ, Sabah AlSalem Univ City Alshadadiya, Coll Life Sci, Informat Sci Dept, Safat 13060, Kuwait
[2] Gulf Univ Sci & Technol, Coll Arts & Sci, Comp Sci Dept, Hawally 32093, Kuwait
[3] Kuwait Univ, Sabah AlSalem Univ City Alshadadiya, Coll Life Sci, Commun Disorders Sci Dept, Safat 13060, Kuwait
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Natural language processing; Sentiment analysis; Labeling; Linguistics; Annotations; Cleaning; Text categorization; Zero-shot learning; Machine learning; weak supervision; zero-shot language model; sentiment analysis; Arabic language; machine learning; Kuwaiti dialect;
D O I
10.1109/ACCESS.2024.3364367
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The available dialectal Arabic linguistic resources are very limited in their coverage of Arabic dialects, particularly the Kuwaiti dialect. This shortage of linguistic resources creates struggles for researchers in the Natural Language Processing (NLP) field and limits the development of advanced linguistic analytical and processing tools for the Kuwaiti dialect. Many other low-resource Arabic dialects are still not explored in research due to the challenges faced during the annotators' recruitment process for dataset labeling. This paper proposes a weak supervised classification system to solve the problem of recruiting human annotators called "q8SentiLabeler". In addition, we developed a large dataset consisting of over 16.6k posts serving sentiment analysis in the Kuwaiti dialect. This dataset covers several themes and timeframes to remove any bias that might affect its content. Furthermore, we evaluated our dataset using multiple traditional machine-learning classifiers and advanced deep-learning language models to test its performance. Results demonstrate the positive potential of "q8SentiLabeler" to replace human annotators with a 93% for pairwise percent agreement and 0.87 for Cohen's Kappa coefficient. Using the ARBERT model on our dataset, we achieved 89% accuracy in the system's performance.
引用
收藏
页码:27709 / 27722
页数:14
相关论文
共 50 条
  • [21] Review of Natural Language Processing in Radiology
    Luo, Jack W.
    Chong, Jaron J. R.
    NEUROIMAGING CLINICS OF NORTH AMERICA, 2020, 30 (04) : 447 - +
  • [22] Classification of Sentiments of the Roman Urdu Reviews of Daraz Products using Natural Language Processing Approach
    Talat, Muneeba
    Asim, Hira
    Asmat, Ayesha
    4TH INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING (IC)2, 2021, : 739 - 744
  • [23] "Revyew" Hotel Maintenance Issue Classifier and Analyzer using Machine Learning and Natural Language Processing
    Athuraliya, Banuka
    Farook, Cassim
    2018 IEEE 9TH ANNUAL INFORMATION TECHNOLOGY, ELECTRONICS AND MOBILE COMMUNICATION CONFERENCE (IEMCON), 2018, : 274 - 280
  • [24] Realization of natural language processing and machine learning approaches for text-based sentiment analysis
    Naithani, Kanchan
    Raiwani, Yadav Prasad
    EXPERT SYSTEMS, 2023, 40 (05)
  • [25] What do patients learn about psychotropic medications on the web? A natural language processing study
    Hart, Kamber L.
    Perlis, Roy H.
    McCoy, Thomas H., Jr.
    JOURNAL OF AFFECTIVE DISORDERS, 2020, 260 : 366 - 371
  • [26] Natural language processing and semantic technologies. The application on Brand Rain and Anpro21
    Trabazos, Oscar
    Suarez, Silvia
    Bori, Remei
    Flo, Oriol
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2014, (53): : 201 - 204
  • [27] Tourism Management Through Natural Language Processing and Sentiment Analysis. A Case Study of the Main Natural Areas of Extremadura, Spain
    Sanchez-Rivero, Marcelino
    Murillo-Gonzalez, Luis
    Rodriguez-Rangel, Maria Cristina
    TOURISM, 2025, 73 (01): : 169 - 185
  • [28] Performance Evaluation of Reddit Comments Using Machine Learning and Natural Language Processing Methods in Sentiment Analysis
    Zhang, Xiaoxia
    Qi, Xiuyuan
    Teng, Zixin
    COMPUTATIONAL AND EXPERIMENTAL SIMULATIONS IN ENGINEERING, ICCES 2024-VOL 2, 2025, 173 : 14 - 24
  • [29] THE DEPICTION OF ORANIA IN A QUANTITATIVE ANALYSIS USING NATURAL LANGUAGE PROCESSING (NLP)
    Senekal, Burgert
    COMMUNITAS, 2023, 28 (01): : 1 - 19
  • [30] Natural Language Processing Applications: A New Taxonomy using Textual Entailment
    Elshazly, Manar
    Haggag, Mohammed
    Ehssan, Soha Ahmed
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (05) : 676 - 690