MSTD: Moroccan Sentiment Twitter Dataset

被引:0
|
作者
Mihi, Soukaina [1 ]
Ali, Brahim Ait Ben [1 ]
El Bazi, Ismail [2 ]
Arezki, Sara [1 ]
Laachfoubi, Nabil [1 ]
机构
[1] Univ Hassan First Settat Morocco, Settat, Morocco
[2] Univ Moulay Slimane Beni Mellal, Beni Mellal, Morocco
关键词
Sentiment analysis; Moroccan dialect; machine-learning; stemming; lemmatization; feature extraction;
D O I
10.14569/IJACSA.2020.0111045
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the proliferation of social media and Internet accessibility, a massive amount of data has been produced. In most cases, the textual data available through the web comes mainly from people expressing their views in informal words. The Arabic language is one of the hardest Semitic languages to deal with because of its complex morphology. In this paper, a new contribution to the Arabic resources is presented as a large Moroccan dataset retrieved from Twitter and carefully annotated by native speakers. For the best of our knowledge, this dataset is the largest Moroccan dataset for sentiment analysis. It is distinguished by its size, its quality given by the commitment of annotators, and its accessibility for the research community. Furthermore, the MSTD (Moroccan Sentiment Twitter Dataset) is benchmarked through experiments carried out for 4-way classification as well as polarity classification (positive, negative). Various machine-learning algorithms are combined to feature extraction techniques to reach optimal settings. This work also presents the effect of stemming and lemmatization on the improvement of the obtained accuracies.
引用
收藏
页码:363 / 372
页数:10
相关论文
共 50 条
  • [41] A Sesotho news headlines dataset for sentiment analysis
    Mokhosi, Refuoe
    Shivachi, Casper-Shikali
    Sethobane, Matello
    DATA IN BRIEF, 2024, 54
  • [42] Stock Price Forecasting via Sentiment Analysis on Twitter
    Kordonis, John
    Symeonidis, Symeon
    Arampatzis, Avi
    20TH PAN-HELLENIC CONFERENCE ON INFORMATICS (PCI 2016), 2016,
  • [43] A Topic based Approach for Sentiment Analysis on Twitter Data
    Ficamos, Pierre
    Liu, Yan
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (12) : 201 - 205
  • [44] Sentiment Analysis of the Uri Terror Attack Using Twitter
    Garg, Pulkit
    Garg, Himanshu
    Ranga, Virender
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2017, : 17 - 20
  • [45] Techniques for Sentiment Analysis of Twitter Data: A Comprehensive Survey
    Desai, Mitali
    Mehta, Mayuri A.
    2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2016, : 149 - 154
  • [46] Twitter Sentiment Analysis using Deep Neural Network
    Wazery, Yaser Maher
    Mohammed, Hager Saleh
    Houssein, Essam Halim
    2018 14TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO), 2018, : 177 - 182
  • [47] Sentiment Analysis with Tweets Behaviour in Twitter Streaming API
    Chouhan K.
    Yadav M.
    Rout R.K.
    Sahoo K.S.
    Jhanjhi N.Z.
    Masud M.
    Aljahdali S.
    Computer Systems Science and Engineering, 2023, 45 (02): : 1113 - 1128
  • [48] Improving Sentiment Analysis of Moroccan Tweets Using Ensemble Learning
    Oussous, Ahmed
    Ait Lahcen, Ayoub
    Belfkih, Samir
    BIG DATA, CLOUD AND APPLICATIONS, BDCA 2018, 2018, 872 : 91 - 104
  • [49] NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis
    Muhammad, Shamsuddeen Hassan
    Adelani, David Ifeoluwa
    Ruder, Sebastian
    Ahmad, Ibrahim Sa'id
    Abdulmumin, Idris
    Bello, Bello Shehu
    Choudhury, Monojit
    Emezue, Chris Chinenye
    Abdullahi, Saheed Salahudeen
    Aremu, Anuoluwapo
    Jorge, Alipio
    Brazdil, Pavel
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 590 - 602
  • [50] Learning Discriminative Sentiment Chunk Vectors for Twitter Sentiment Analysis
    Yan, Leiming
    Zheng, Wenying
    Zhang, Huajie
    Tao, Hao
    He, Ming
    JOURNAL OF INTERNET TECHNOLOGY, 2017, 18 (07): : 1605 - 1613