The Evolution of Language Models Applied to Emotion Analysis of Arabic Tweets

被引:25
作者
Al-Twairesh, Nora [1 ]
机构
[1] King Saud Univ, Coll Comp & Informat Sci, Dept Informat Technol, Riyadh 11451, Saudi Arabia
关键词
pretrained language models; BERT; emotion analysis; Arabic;
D O I
10.3390/info12020084
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The field of natural language processing (NLP) has witnessed a boom in language representation models with the introduction of pretrained language models that are trained on massive textual data then used to fine-tune downstream NLP tasks. In this paper, we aim to study the evolution of language representation models by analyzing their effect on an under-researched NLP task: emotion analysis; for a low-resource language: Arabic. Most of the studies in the field of affect analysis focused on sentiment analysis, i.e., classifying text into valence (positive, negative, neutral) while few studies go further to analyze the finer grained emotional states (happiness, sadness, anger, etc.). Emotion analysis is a text classification problem that is tackled using machine learning techniques. Different language representation models have been used as features for these machine learning models to learn from. In this paper, we perform an empirical study on the evolution of language models, from the traditional term frequency-inverse document frequency (TF-IDF) to the more sophisticated word embedding word2vec, and finally the recent state-of-the-art pretrained language model, bidirectional encoder representations from transformers (BERT). We observe and analyze how the performance increases as we change the language model. We also investigate different BERT models for Arabic. We find that the best performance is achieved with the ArabicBERT large model, which is a BERT model trained on a large dataset of Arabic text. The increase in F1-score was significant +7-21%.
引用
收藏
页码:1 / 15
页数:15
相关论文
共 42 条
  • [1] Abdelali A., 2016, P 2016 C N AM CHAPT, VVolume 2016, P11, DOI 10.18653/v1/N16-3003
  • [2] Abdul-Mageed M., 2020, P 5 AR NAT LANG PROC, P97
  • [3] Abdul-Mageed M., 2016, P 2 WORKSH AR CORP P
  • [4] Abdul-Mageed Muhammad, 2020, P 4 WORKSHOP OPENSOU, P16
  • [5] Abdullah M., 2018, P 12 INT WORKSH SEM, P350, DOI DOI 10.18653/V1/S18-1053
  • [6] SEDAT: Sentiment and Emotion Detection in Arabic Text using CNN-LSTM Deep Learning
    Abdullah, Malak
    Hadzikadic, Mirsad
    Shaikh, Samira
    [J]. 2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 835 - 840
  • [7] Al-Aabed M., 2016, P INT COMP SCI INF C
  • [8] Emotional Tone Detection in Arabic Tweets
    Al-Khatib, Amr
    El-Beltagy, Samhaa R.
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2017, PT II, 2018, 10762 : 105 - 114
  • [9] Alhuzali H., 2018, P 2 WORKSH COMP MOD, P25, DOI DOI 10.18653/V1/W18-1104
  • [10] Almahdawi Amer J., 2019, Intelligent Computing. Proceedings of the 2019 Computing Conference. Advances in Intelligent Systems and Computing (AISC 998), P200, DOI 10.1007/978-3-030-22868-2_16