Quality of Word Embeddings on Sentiment Analysis Tasks

被引:9
|
作者
Cano, Erion [1 ]
Morisio, Maurizio [1 ]
机构
[1] Politecn Torino, Duca Abruzzi 24, I-10129 Turin, Italy
来源
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2017 | 2017年 / 10260卷
关键词
Word embeddings; Lyrics mood analysis; Movie review polarity;
D O I
10.1007/978-3-319-59569-6_42
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Word embeddings or distributed representations of words are being used in various applications like machine translation, sentiment analysis, topic identification etc. Quality of word embeddings and performance of their applications depends on several factors like training method, corpus size and relevance etc. In this study we compare performance of a dozen of pretrained word embedding models on lyrics sentiment analysis and movie review polarity tasks. According to our results, Twitter Tweets is the best on lyrics sentiment analysis, whereas Google News and Common Crawl are the top performers on movie polarity analysis. Glove trained models slightly outrun those trained with Skip-gram. Also, factors like topic relevance and size of corpus significantly impact the quality of the models. When medium or large-sized text sets are available, obtaining word embeddings from same training dataset is usually the best choice.
引用
收藏
页码:332 / 338
页数:7
相关论文
共 50 条
  • [1] Word Embeddings for Arabic Sentiment Analysis
    Altowayan, A. Aziz
    Tao, Lixin
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 3820 - 3825
  • [2] Refining Word Embeddings with Sentiment Information for Sentiment Analysis
    Kasri M.
    Birjali M.
    Nabil M.
    Beni-Hssane A.
    El-Ansari A.
    El Fissaoui M.
    Journal of ICT Standardization, 2022, 10 (03): : 353 - 382
  • [3] Generating Word Embeddings from an Extreme Learning Machine for Sentiment Analysis and Sequence Labeling Tasks
    Lauren, Paula
    Qu, Guangzhi
    Yang, Jucheng
    Watta, Paul
    Huang, Guang-Bin
    Lendasse, Amaury
    COGNITIVE COMPUTATION, 2018, 10 (04) : 625 - 638
  • [4] Generating Word Embeddings from an Extreme Learning Machine for Sentiment Analysis and Sequence Labeling Tasks
    Paula Lauren
    Guangzhi Qu
    Jucheng Yang
    Paul Watta
    Guang-Bin Huang
    Amaury Lendasse
    Cognitive Computation, 2018, 10 : 625 - 638
  • [5] Refined Global Word Embeddings Based on Sentiment Concept for Sentiment Analysis
    Wang, Yabing
    Huang, Guimin
    Li, Jun
    Li, Hui
    Zhou, Ya
    Jiang, Hua
    IEEE ACCESS, 2021, 9 : 37075 - 37085
  • [6] Evaluating Quality of Word Embeddings with Sentiment Polarity Identification Task
    Indurthi, Vijayasaradhi
    Oota, Subba Reddy
    SEMANTIC WEB CHALLENGES, SEMWEBEVAL 2018, 2018, 927 : 232 - 237
  • [7] Sentiment Analysis in Turkish Based on Weighted Word Embeddings
    Onan, Aytug
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [8] Sentiment analysis with covariate-assisted word embeddings
    Xu, Shirong
    Dai, Ben
    Wang, Junhui
    ELECTRONIC JOURNAL OF STATISTICS, 2021, 15 (01): : 3015 - 3039
  • [9] Cross-domain sentiment aware word embeddings for review sentiment analysis
    Liu, Jun
    Zheng, Shuang
    Xu, Guangxia
    Lin, Mingwei
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (02) : 343 - 354
  • [10] Cross-domain sentiment aware word embeddings for review sentiment analysis
    Jun Liu
    Shuang Zheng
    Guangxia Xu
    Mingwei Lin
    International Journal of Machine Learning and Cybernetics, 2021, 12 : 343 - 354