MSTD: Moroccan Sentiment Twitter Dataset

被引:0
|
作者
Mihi, Soukaina [1 ]
Ali, Brahim Ait Ben [1 ]
El Bazi, Ismail [2 ]
Arezki, Sara [1 ]
Laachfoubi, Nabil [1 ]
机构
[1] Univ Hassan First Settat Morocco, Settat, Morocco
[2] Univ Moulay Slimane Beni Mellal, Beni Mellal, Morocco
关键词
Sentiment analysis; Moroccan dialect; machine-learning; stemming; lemmatization; feature extraction;
D O I
10.14569/IJACSA.2020.0111045
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the proliferation of social media and Internet accessibility, a massive amount of data has been produced. In most cases, the textual data available through the web comes mainly from people expressing their views in informal words. The Arabic language is one of the hardest Semitic languages to deal with because of its complex morphology. In this paper, a new contribution to the Arabic resources is presented as a large Moroccan dataset retrieved from Twitter and carefully annotated by native speakers. For the best of our knowledge, this dataset is the largest Moroccan dataset for sentiment analysis. It is distinguished by its size, its quality given by the commitment of annotators, and its accessibility for the research community. Furthermore, the MSTD (Moroccan Sentiment Twitter Dataset) is benchmarked through experiments carried out for 4-way classification as well as polarity classification (positive, negative). Various machine-learning algorithms are combined to feature extraction techniques to reach optimal settings. This work also presents the effect of stemming and lemmatization on the improvement of the obtained accuracies.
引用
收藏
页码:363 / 372
页数:10
相关论文
共 50 条
  • [1] An aspect-level sentiment analysis dataset for therapies on Twitter
    Guo, Yuting
    Das, Sudeshna
    Lakamana, Sahithi
    Sarker, Abeed
    DATA IN BRIEF, 2023, 50
  • [2] Twitter Dataset and Evaluation of Transformers for Turkish Sentiment Analysis
    Koksal, Abdullatif
    Ozgur, Arzucan
    29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [3] Exploration, Sentiment Analysis, Topic Modeling, and Visualization of Moroccan Twitter Data
    Habbat, Nassera
    Anoun, Houda
    Hassouni, Larbi
    ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 1067 - 1083
  • [4] ASSIGNING SENTIMENT SCORE FOR TWITTER TWEETS
    Bhat, Srinidhi
    Garg, Saksham
    Poornalatha, G.
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 934 - 937
  • [5] Sentiment analysis dataset in Moroccan dialect: bridging the gap between Arabic and Latin scripted dialect
    Jbel, Mouad
    Jabrane, Mourad
    Hafidi, Imad
    Metrane, Abdulmutallib
    LANGUAGE RESOURCES AND EVALUATION, 2024, : 1401 - 1430
  • [6] Sentiment Analysis on Twitter
    Meral, Meric
    Diri, Banu
    2014 22ND SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2014, : 690 - 693
  • [7] Application of Support Vector Machine for Arabic Sentiment Classification Using Twitter-Based Dataset
    Alyami, Sarah N.
    Olatunji, Sunday O.
    JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2020, 19 (01)
  • [8] Real-time Twitter Sentiment Analysis for Moroccan Universities using Machine Learning and Big Data Technologies
    Lasri I.
    Riadsolh A.
    Elbelkacemi M.
    International Journal of Emerging Technologies in Learning, 2023, 18 (05) : 42 - 61
  • [9] MAC: An Open and Free Moroccan Arabic Corpus for Sentiment Analysis
    Garouani, Moncef
    Kharroubi, Jamal
    6TH INTERNATIONAL CONFERENCE ON SMART CITY APPLICATIONS, 2022, 393 : 849 - 858
  • [10] The climate change Twitter dataset
    Effrosynidis, Dimitrios
    Karasakalidis, Alexandros, I
    Sylaios, Georgios
    Arampatzis, Avi
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 204