Sentiment analysis dataset in Moroccan dialect: bridging the gap between Arabic and Latin scripted dialect

被引:1
|
作者
Jbel, Mouad [1 ]
Jabrane, Mourad [1 ]
Hafidi, Imad [1 ]
Metrane, Abdulmutallib [1 ]
机构
[1] Univ Sultan Moulay Slimane, Proc Engn Comp & Math Lab, Beni Mellal, Morocco
关键词
Sentiment analysis; Natural language processing; Arabic Moroccan dialect; Machine learning; Dialectical text; AGREEMENT;
D O I
10.1007/s10579-024-09764-6
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Sentiment analysis, the automated process of determining emotions or opinions expressed in text, has seen extensive exploration in the field of natural language processing. However, one aspect that has remained underrepresented is the sentiment analysis of the Moroccan dialect, which boasts a unique linguistic landscape and the coexistence of multiple scripts. Previous works in sentiment analysis primarily targeted dialects employing Arabic script. While these efforts provided valuable insights, they may not fully capture the complexity of Moroccan web content, which features a blend of Arabic and Latin script. As a result, our study emphasizes the importance of extending sentiment analysis to encompass the entire spectrum of Moroccan linguistic diversity. Central to our research is the creation of the largest public dataset for Moroccan dialect sentiment analysis that incorporates not only Moroccan dialect written in Arabic script but also in Latin characters. By assembling a diverse range of textual data, we were able to construct a dataset with a range of 19,991 manually labeled texts in Moroccan dialect and also publicly available lists of stop words in Moroccan dialect as a new contribution to Moroccan Arabic resources. In our exploration of sentiment analysis, we undertook a comprehensive study encompassing various machine-learning models to assess their compatibility with our dataset. While our investigation revealed that the highest accuracy of 98.42% was attained through the utilization of the DarijaBert-mix transfer-learning model, we also delved into deep learning models. Notably, our experimentation yielded a commendable accuracy rate of 92% when employing a CNN model. Furthermore, in an effort to affirm the reliability of our dataset, we tested the CNN model using smaller publicly available datasets of Moroccan dialect, with results that proved to be promising and supportive of our findings.
引用
收藏
页数:30
相关论文
共 50 条
  • [1] Sentiment Analysis on Moroccan Dialect of Arabic Combining NLP and ML Methods
    Ladrham, Khalil
    Gueddah, Hicham
    ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, ICALP 2023, PT I, 2025, 2339 : 3 - 16
  • [2] Sentiment Analysis of Arabic Jordanian Dialect Tweets
    Atoum, Jalal Omer
    Nouman, Mais
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (02) : 256 - 262
  • [3] Lexical Differences and Similarities between Moroccan Dialect and Arabic
    Tachicart, Ridouane
    Bouzoubaa, Karim
    Jaafar, Hamid
    2016 4TH IEEE INTERNATIONAL COLLOQUIUM ON INFORMATION SCIENCE AND TECHNOLOGY (CIST), 2016, : 331 - 337
  • [4] Sentiment analysis in poems in misurata sub-dialect a sentiment detection in an Arabic sub-dialect
    Department of Linguistics and Computer Science, Montclair State University, United States
    arXiv,
  • [5] Arabic dialect sentiment analysis with ZERO effort. Case study: Algerian dialect
    Guellil, Imane
    Mendoza, Marcelo
    Azouaou, Faical
    INTELIGENCIA ARTIFICIAL-IBEROAMERICAN JOURNAL OF ARTIFICIAL INTELLIGENCE, 2020, 23 (65): : 124 - 135
  • [6] A systematic literature review of Arabic dialect sentiment analysis
    Matrane, Yassir
    Benabbou, Faouzia
    Sael, Nawal
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (06)
  • [7] A hybrid approach to translate Moroccan Arabic dialect
    Tachicart, Ridouane
    Bouzoubaa, Karim
    2014 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS: THEORIES AND APPLICATIONS (SITA'14), 2014,
  • [8] IADD: An integrated Arabic dialect identification dataset
    Zahir, Jihad
    DATA IN BRIEF, 2022, 40
  • [9] MOROCCAN DIALECT OF ARABIC LANGUAGE - RUSSIAN - KJAMILEV,SX
    ABDELMAS.ET
    LINGUISTICS, 1972, (79) : 112 - 115
  • [10] SSA-SDA: Subjectivity and Sentiment Analysis of Sudanese Dialect Arabic
    Abo, Mohamed Elhag M.
    Shah, Nordiana Ahmad Kharman
    Balakrishnan, Vimala
    Kamal, Mohamed
    Abdelaziz, Ahmed
    Haruna, Khalid
    2019 INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCES (ICCIS), 2019, : 206 - 210