Bridging the Kuwaiti Dialect Gap in Natural Language Processing

被引:0
|
作者
Husain, Fatemah [1 ]
Alostad, Hana [2 ]
Omar, Halima [3 ]
机构
[1] Kuwait Univ, Sabah AlSalem Univ City Alshadadiya, Coll Life Sci, Informat Sci Dept, Safat 13060, Kuwait
[2] Gulf Univ Sci & Technol, Coll Arts & Sci, Comp Sci Dept, Hawally 32093, Kuwait
[3] Kuwait Univ, Sabah AlSalem Univ City Alshadadiya, Coll Life Sci, Commun Disorders Sci Dept, Safat 13060, Kuwait
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Natural language processing; Sentiment analysis; Labeling; Linguistics; Annotations; Cleaning; Text categorization; Zero-shot learning; Machine learning; weak supervision; zero-shot language model; sentiment analysis; Arabic language; machine learning; Kuwaiti dialect;
D O I
10.1109/ACCESS.2024.3364367
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The available dialectal Arabic linguistic resources are very limited in their coverage of Arabic dialects, particularly the Kuwaiti dialect. This shortage of linguistic resources creates struggles for researchers in the Natural Language Processing (NLP) field and limits the development of advanced linguistic analytical and processing tools for the Kuwaiti dialect. Many other low-resource Arabic dialects are still not explored in research due to the challenges faced during the annotators' recruitment process for dataset labeling. This paper proposes a weak supervised classification system to solve the problem of recruiting human annotators called "q8SentiLabeler". In addition, we developed a large dataset consisting of over 16.6k posts serving sentiment analysis in the Kuwaiti dialect. This dataset covers several themes and timeframes to remove any bias that might affect its content. Furthermore, we evaluated our dataset using multiple traditional machine-learning classifiers and advanced deep-learning language models to test its performance. Results demonstrate the positive potential of "q8SentiLabeler" to replace human annotators with a 93% for pairwise percent agreement and 0.87 for Cohen's Kappa coefficient. Using the ARBERT model on our dataset, we achieved 89% accuracy in the system's performance.
引用
收藏
页码:27709 / 27722
页数:14
相关论文
共 50 条
  • [31] Mapping the plague through natural language processing
    Krauer, Fabienne
    Schmid, Boris V.
    EPIDEMICS, 2022, 41
  • [32] Natural Language Processing for Associative Word Predictions
    Grujic, Nebojsa D.
    Milovanovic, Vladimir M.
    PROCEEDINGS OF 18TH INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES (IEEE EUROCON 2019), 2019,
  • [33] Natural language processing for music knowledge discovery
    Oramas, Sergio
    Espinosa-Anke, Luis
    Gomez, Francisco
    Serra, Xavier
    JOURNAL OF NEW MUSIC RESEARCH, 2018, 47 (04) : 365 - 382
  • [34] Data augmentation techniques in natural language processing
    Pellicer, Lucas Francisco Amaral Orosco
    Ferreira, Taynan Maier
    Costa, Anna Helena Reali
    APPLIED SOFT COMPUTING, 2023, 132
  • [35] Using Natural Language Processing for Phishing Detection
    Jonker, Richard Adolph Aires
    Poudel, Roshan
    Pedrosa, Tiago
    Lopes, Rui Pedro
    OPTIMIZATION, LEARNING ALGORITHMS AND APPLICATIONS, OL2A 2021, 2021, 1488 : 540 - 552
  • [36] Text mining and natural language processing in construction
    Shamshiri, Alireza
    Ryu, Kyeong Rok
    Park, June Young
    AUTOMATION IN CONSTRUCTION, 2024, 158
  • [37] Applications of Pruning Methods in Natural Language Processing
    Touheed, Marva
    Zubair, Urooj
    Sabir, Dilshad
    Hassan, Ali
    Butt, Muhammad Fasih Uddin
    Riaz, Farhan
    Abdul, Wadood
    Ayub, Rashid
    IEEE ACCESS, 2024, 12 : 89418 - 89438
  • [38] A Non-Biological AI Approach towards Natural Language Understanding
    Stephen, Lernout
    Geert, Devos
    Andreas, Kraze
    Frank, Platteau
    PROCEEDINGS OF 2016 FUTURE TECHNOLOGIES CONFERENCE (FTC), 2016, : 1300 - 1302
  • [39] Natural Language Processing and Text Mining Algorithms for Financial Accounting Information Disclosure
    Shi, Huanhuan
    JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (09) : 453 - 461
  • [40] Perceptions of Electric Vehicle Adoption Through Natural Language Processing and Machine Learning
    Araiza, Jesus Alejandro Gutierrez
    Luna, Sergio
    Santiago, Ivonne
    Akundi, Aditya
    18TH ANNUAL IEEE INTERNATIONAL SYSTEMS CONFERENCE, SYSCON 2024, 2024,