MSTD: Moroccan Sentiment Twitter Dataset

被引:0
|
作者
Mihi, Soukaina [1 ]
Ali, Brahim Ait Ben [1 ]
El Bazi, Ismail [2 ]
Arezki, Sara [1 ]
Laachfoubi, Nabil [1 ]
机构
[1] Univ Hassan First Settat Morocco, Settat, Morocco
[2] Univ Moulay Slimane Beni Mellal, Beni Mellal, Morocco
关键词
Sentiment analysis; Moroccan dialect; machine-learning; stemming; lemmatization; feature extraction;
D O I
10.14569/IJACSA.2020.0111045
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the proliferation of social media and Internet accessibility, a massive amount of data has been produced. In most cases, the textual data available through the web comes mainly from people expressing their views in informal words. The Arabic language is one of the hardest Semitic languages to deal with because of its complex morphology. In this paper, a new contribution to the Arabic resources is presented as a large Moroccan dataset retrieved from Twitter and carefully annotated by native speakers. For the best of our knowledge, this dataset is the largest Moroccan dataset for sentiment analysis. It is distinguished by its size, its quality given by the commitment of annotators, and its accessibility for the research community. Furthermore, the MSTD (Moroccan Sentiment Twitter Dataset) is benchmarked through experiments carried out for 4-way classification as well as polarity classification (positive, negative). Various machine-learning algorithms are combined to feature extraction techniques to reach optimal settings. This work also presents the effect of stemming and lemmatization on the improvement of the obtained accuracies.
引用
收藏
页码:363 / 372
页数:10
相关论文
共 50 条
  • [21] Sentiment Analysis of Twitter Data Using NLP Models: A Comprehensive Review
    Albladi, Aish
    Islam, Minarul
    Seals, Cheryl
    IEEE ACCESS, 2025, 13 : 30444 - 30468
  • [22] Advanced Combined LSTM-CNN Model for Twitter Sentiment Analysis
    Chen, Nan
    Wang, Peikang
    PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 684 - 687
  • [23] A Machine Learning-Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion From Twitter Tweets
    Bengesi, Staphord
    Oladunni, Timothy
    Olusegun, Ruth
    Audu, Halima
    IEEE ACCESS, 2023, 11 : 11811 - 11826
  • [24] Cyberbullying Detection in Twitter Using Sentiment Analysis
    Theng, Chong Poh
    Othman, Nur Fadzilah
    Abdullah, Raihana Syahirah
    Anawar, Syarulnaziah
    Ayop, Zakiah
    Ramli, Sofia Najwa
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2021, 21 (11): : 1 - 10
  • [25] Spanish sentiment analysis in Twitter at the TASS workshop
    Ferran Pla
    Lluís-F. Hurtado
    Language Resources and Evaluation, 2018, 52 : 645 - 672
  • [26] Twitter Sentiment Analysis: A Bootstrap Ensemble Framework
    Hassan, Ammar
    Abbasi, Ahmed
    Zeng, Daniel
    2013 ASE/IEEE INTERNATIONAL CONFERENCE ON SOCIAL COMPUTING (SOCIALCOM), 2013, : 357 - 364
  • [27] Sentiment Analysis of Twitter Posts on Global Conflicts
    Sasikumar, Ujwal
    Zaman, A. N. K.
    Mawlood-Yunis, Abdul-Rahman
    Chatterjee, Prosenjit
    2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 759 - 764
  • [28] A Comprehensive Survey on Sentiment Analysis in Twitter Data
    Krishnan, Hema
    Elayidom, M. Sudheep
    Santhanakrishnan, T.
    INTERNATIONAL JOURNAL OF DISTRIBUTED SYSTEMS AND TECHNOLOGIES, 2022, 13 (05)
  • [29] Sentiment Analysis of Twitter Data: A Hybrid Approach
    Srivastava, Ankit
    Singh, Vijendra
    Drall, Gurdeep Singh
    INTERNATIONAL JOURNAL OF HEALTHCARE INFORMATION SYSTEMS AND INFORMATICS, 2019, 14 (02) : 1 - 16
  • [30] Towards Sentiment Analysis for Romanian Twitter Content
    Neagu, Dan Claudiu
    Rus, Andrei Bogdan
    Grec, Mihai
    Boroianu, Mihai Augustin
    Bogdan, Nicolae
    Gal, Attila
    ALGORITHMS, 2022, 15 (10)