Deep Learning Based Sentiment Analysis in a Code-Mixed English-Hindi and English-Bengali Social Media Corpus

被引:16
作者
Jamatia, Anupam [1 ]
Swamy, Steve Durairaj [1 ]
Gamback, Bjorn [2 ,4 ]
Das, Amitava [3 ]
Debbarma, Swapan [1 ]
机构
[1] Natl Inst Technol, Comp Sci & Engn Dept, Agartala 799046, Tripura, India
[2] Norwegian Univ Sci & Technol, Dept Comp Sci, N-7491 Trondheim, Norway
[3] Wipro AI Labs, Bengaluru 560100, Karnataka, India
[4] Res Inst Sweden AB, RISE, Digital Syst Div, S-16428 Kista, Sweden
关键词
Code-switching; recurrent neural networks; convolutional neural networks;
D O I
10.1142/S0218213020500141
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentiment analysis is a circumstantial analysis of text, identifying the social sentiment to better understand the source material. The article addresses sentiment analysis of an English-Hindi and English-Bengali code-mixed textual corpus collected from social media. Code-mixing is an amalgamation of multiple languages, which previously mainly was associated with spoken language. However, social media users also deploy it to communicate in ways that tend to be somewhat casual. The coarse nature of social media text poses challenges for many language processing applications. Here, the focus is on the low predictive nature of traditional machine learners when compared to Deep Learning counterparts, including the contextual language representation model BERT (Bidirectional Encoder Representations from Transformers), on the task of extracting user sentiment from code-mixed texts. Three deep learners (a BiLSTM CNN, a Double BiLSTM and an Attention-based model) attained accuracy 20-60% greater than traditional approaches on code-mixed data, and were for comparison also tested on monolingual English data.
引用
收藏
页数:27
相关论文
共 78 条
  • [1] Abadi M., 2016, TENSORFLOW LARGE SCA
  • [2] Aguilar Gustavo, 2018, P 3 WORKSH COMP APPR
  • [3] Androutsopoulos J., 2011, Standard languages and language standards in a changing Europe, P145
  • [4] [Anonymous], 2014, P 11 INT C NAT LANG
  • [5] [Anonymous], 2013, 2 JOINT C LEX COMP S
  • [6] [Anonymous], 2015, FIRE Workshops
  • [7] [Anonymous], 2011, P WORKSHOP ADV TEXT
  • [8] [Anonymous], 2014, P 2014 C EMP METH NA
  • [9] [Anonymous], 2018, P 32 NAT C ART INT A
  • [10] Auer P., 1984, BILINGUAL CONVERSATI