Identifying Sentiments in Algerian Code-switched User-generated Comments

被引:0
|
作者
Adouane, Wafia [1 ]
Touileb, Sarnia [2 ]
Bernardy, Jean-Philippe [1 ]
机构
[1] Univ Gothenburg, Ctr Linguist Theory & Studies Probabil CLASP, Dept Philosophy Linguist & Theory Sci FLoV, Gothenburg, Sweden
[2] Univ Oslo, Dept Informat, Language Technol Grp, Oslo, Norway
来源
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020) | 2020年
基金
瑞典研究理事会;
关键词
Algerian Arabic; code-switching; user-generated data; sentiment analysis; under-resourced colloquial languages;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We present in this paper our work on Algerian language, an under-resourced North African colloquial Arabic variety, for which we built a comparably large corpus of more than 36,000 code-switched user-generated comments annotated for sentiments. We opted for this data domain because Algerian is a colloquial language with no existing freely available corpora. Moreover, we compiled sentiment lexicons of positive and negative unigrams and bigrams reflecting the code-switches present in the language. We compare the performance of four models on the task of identifying sentiments, and the results indicate that a CNN model trained end-to-end fits better our unedited code-switched and unbalanced data across the predefined sentiment classes. Additionally, injecting the lexicons as background knowledge to the model boosts its performance on the minority class with a gain of 10.54 points on the F-score. The results of our experiments can be used as a baseline for future research for Algerian sentiment analysis.
引用
收藏
页码:2698 / 2705
页数:8
相关论文
共 7 条
  • [1] An Algerian Arabic-French Code-Switched Corpus
    Cotterell, Ryan
    Renduchintala, Adithya
    Saphra, Naomi
    Callison-Burch, Chris
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [2] Studying vowel variation in French-Algerian Arabic code-switched speech
    Wottawa, Jane
    Amazouz, Djegdjiga
    Adda-Decker, Martine
    Lamel, Lori
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2753 - 2757
  • [3] DZDC12: a new multipurpose parallel Algerian Arabizi–French code-switched corpus
    Kheireddine Abainia
    Language Resources and Evaluation, 2020, 54 : 419 - 455
  • [4] DZDC12: a new multipurpose parallel Algerian Arabizi-French code-switched corpus
    Abainia, Kheireddine
    LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (02) : 419 - 455
  • [5] Brand Crisis-Sentiment Analysis of User-Generated Comments About @Maggi on Facebook
    Mridula S. Mishra
    Ruppal W. Sharma
    Corporate Reputation Review, 2019, 22 : 48 - 60
  • [6] Brand Crisis-Sentiment Analysis of User-Generated Comments About @Maggi on Facebook
    Mishra, Mridula S.
    Sharma, Ruppal W.
    CORPORATE REPUTATION REVIEW, 2019, 22 (02) : 48 - 60
  • [7] An ANN-based approach of interpreting user-generated comments from social media.
    Wong, T. C.
    Chan, Hing Kai
    Lacka, Ewelina
    APPLIED SOFT COMPUTING, 2017, 52 : 1169 - 1180