Quality of Word Embeddings on Sentiment Analysis Tasks

被引:9
作者
Cano, Erion [1 ]
Morisio, Maurizio [1 ]
机构
[1] Politecn Torino, Duca Abruzzi 24, I-10129 Turin, Italy
来源
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2017 | 2017年 / 10260卷
关键词
Word embeddings; Lyrics mood analysis; Movie review polarity;
D O I
10.1007/978-3-319-59569-6_42
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Word embeddings or distributed representations of words are being used in various applications like machine translation, sentiment analysis, topic identification etc. Quality of word embeddings and performance of their applications depends on several factors like training method, corpus size and relevance etc. In this study we compare performance of a dozen of pretrained word embedding models on lyrics sentiment analysis and movie review polarity tasks. According to our results, Twitter Tweets is the best on lyrics sentiment analysis, whereas Google News and Common Crawl are the top performers on movie polarity analysis. Glove trained models slightly outrun those trained with Skip-gram. Also, factors like topic relevance and size of corpus significantly impact the quality of the models. When medium or large-sized text sets are available, obtaining word embeddings from same training dataset is usually the best choice.
引用
收藏
页码:332 / 338
页数:7
相关论文
共 50 条
  • [21] Persian Sentiment Analysis without Training Data Using Cross-Lingual Word Embeddings
    Aliramezani, Mohammad
    Doostmohammadi, Ehsan
    Bokaei, Mohammad Hadi
    Sameti, Hossien
    [J]. 2020 10TH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2020, : 78 - 82
  • [22] Improving Arabic Sentiment Analysis with Sentiment-Specific Embeddings
    Altowayan, A. Aziz
    Elnagar, Ashraf
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 4314 - 4320
  • [23] Unlock big Data Emotions: Weighted Word Embeddings for sentiment Classification
    Dai, Xiangfeng
    Prout, Bob
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 3833 - 3838
  • [24] Pre-trained Word Embeddings for Arabic Aspect-Based Sentiment Analysis of Airline Tweets
    Ashi, Mohammed Matuq
    Siddiqui, Muazzam Ahmed
    Nadeem, Farrukh
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2018, 2019, 845 : 241 - 251
  • [25] Multi-domain sentiment analysis with mimicked and polarized word embeddings for human-robot interaction
    Atzeni, Mattia
    Recupero, Diego Reforgiato
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 110 : 984 - 999
  • [26] Sentiment Analysis of Code-Mixed Telugu-English Data Leveraging Syllable and Word Embeddings
    Rayala, Upendar Rao
    Seshadri, Karthick
    Sristy, Nagesh Bhattu
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (10)
  • [27] Evaluating Pre-trained Word Embeddings and Neural Network Architectures for Sentiment Analysis in Spanish Financial Tweets
    Antonio Garcia-Diaz, Jose
    Apolinario-Arzube, Oscar
    Valencia-Garcia, Rafael
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE, MICAI 2020, PT II, 2020, 12469 : 167 - 178
  • [28] Sentiment Analysis with Contextual Embeddings and Self-attention
    Biesialska, Katarzyna
    Biesialska, Magdalena
    Rybinski, Henryk
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2020), 2020, 12117 : 32 - 41
  • [29] Improving semantic change analysis by combining word embeddings and word frequencies
    Englhardt, Adrian
    Willkomm, Jens
    Schaeler, Martin
    Boehm, Klemens
    [J]. INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2020, 21 (03) : 247 - 264
  • [30] Improving semantic change analysis by combining word embeddings and word frequencies
    Adrian Englhardt
    Jens Willkomm
    Martin Schäler
    Klemens Böhm
    [J]. International Journal on Digital Libraries, 2020, 21 : 247 - 264