Developing Analytical Tools for Arabic Sentiment Analysis of COVID-19 Data

被引:1
作者
Abdelhady, Naglaa [1 ]
Elsemman, Ibrahim E. [1 ]
Farghally, Mohammed F. [1 ]
Soliman, Taysir Hassan A. [1 ]
机构
[1] Assiut Univ, Fac Comp & Informat, Dept Informat Syst, Assiut 2071515, Egypt
关键词
sentiment analysis; Twitter; Arabic lexicon; Arabic annotated datasets; COVID-19; negation; emoticons; LEXICON;
D O I
10.3390/a16070318
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the widespread distribution of coronavirus and the existence of a massive quantity of data on social networking sites, particularly Twitter, there was an urgent need to develop a model that evaluates users' emotions and determines how they feel about the pandemic. However, the absence of resources to assist Sentiment Analysis (SA) in Arabic hampered the completion of this endeavor. This work presents the ArSentiCOVID lexicon, the first and largest Arabic SA lexicon for COVID-19 that handles negation and emojis. We design a lexicon-based sentiment analyzer tool that depends mainly on the ArSentiCOVID lexicon to perform a three-way classification. Furthermore, we employ the sentiment analyzer to automatically assemble 42K annotated Arabic tweets for COVID-19. We conduct two experiments. First, we test the effect of applying negation and emoji rules to the created lexicon. The results indicate that after applying the emoji, negation, and both rules, the F-score improved by 2.13%, 4.13%, and 6.13%, respectively. Second, we applied an ensemble method that combines four feature groups (n-grams, negation, polarity, and emojis) as input features for eight Machine Learning (ML) classifiers. The results reveal that Random Forest (RF) and Support Vector Machine (SVM) classifiers work best, and that the four feature groups combined are best for representing features produced the maximum accuracy of (92.21%), precision (92.23%), recall (92.21%), and F-score (92.23%) with 3.2% improvement over the base model.
引用
收藏
页数:24
相关论文
共 33 条
  • [1] Abdulla NA, 2013, 2013 IEEE JORDAN CONFERENCE ON APPLIED ELECTRICAL ENGINEERING AND COMPUTING TECHNOLOGIES (AEECT)
  • [2] AraSenCorpus: A Semi-Supervised Approach for Sentiment Annotation of a Large Arabic Text Corpus
    Al-Laith, Ali
    Shahbaz, Muhammad
    Alaskar, Hind F.
    Rehmat, Asim
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (05):
  • [3] Arabic senti-lexicon: Constructing publicly available language resources for Arabic sentiment analysis
    Al-Moslmi, Tareq
    Albared, Mohammed
    Al-Shabi, Adel
    Omar, Nazlia
    Abdullah, Salwani
    [J]. JOURNAL OF INFORMATION SCIENCE, 2018, 44 (03) : 345 - 362
  • [4] Sentiment lexicon for sentiment analysis of Saudi dialect tweets
    Al-Thubaity, Abdulmohsen
    Alqahtani, Qubayl
    Aljandal, Abdulaziz
    [J]. ARABIC COMPUTATIONAL LINGUISTICS, 2018, 142 : 301 - 307
  • [5] Al-Twairesh N, 2016, PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P697
  • [6] Alam F., 2020, arXiv
  • [7] Alharbi O., 2020, INT J OPER RES INF S, V11, P33, DOI [10.4018/IJORIS.2020100102, DOI 10.4018/IJORIS.2020100102]
  • [8] Sentiment Analysis of Arabic Tweets Regarding Distance Learning in Saudi Arabia during the COVID-19 Pandemic
    Aljabri, Malak
    Chrouf, Sara Mhd. Bachar
    Alzahrani, Norah A.
    Alghamdi, Leena
    Alfehaid, Reem
    Alqarawi, Reem
    Alhuthayfi, Jawaher
    Alduhailan, Nouf
    [J]. SENSORS, 2021, 21 (16)
  • [9] Alqurashi S, 2020, Arxiv, DOI [arXiv:2004.04315, 10.48550/arXiv.2004.04315, DOI 10.48550/ARXIV.2004.04315]
  • [10] Ameur M.S.H., 2021, arXiv