Quality of Word Embeddings on Sentiment Analysis Tasks

被引:9
作者
Cano, Erion [1 ]
Morisio, Maurizio [1 ]
机构
[1] Politecn Torino, Duca Abruzzi 24, I-10129 Turin, Italy
来源
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2017 | 2017年 / 10260卷
关键词
Word embeddings; Lyrics mood analysis; Movie review polarity;
D O I
10.1007/978-3-319-59569-6_42
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Word embeddings or distributed representations of words are being used in various applications like machine translation, sentiment analysis, topic identification etc. Quality of word embeddings and performance of their applications depends on several factors like training method, corpus size and relevance etc. In this study we compare performance of a dozen of pretrained word embedding models on lyrics sentiment analysis and movie review polarity tasks. According to our results, Twitter Tweets is the best on lyrics sentiment analysis, whereas Google News and Common Crawl are the top performers on movie polarity analysis. Glove trained models slightly outrun those trained with Skip-gram. Also, factors like topic relevance and size of corpus significantly impact the quality of the models. When medium or large-sized text sets are available, obtaining word embeddings from same training dataset is usually the best choice.
引用
收藏
页码:332 / 338
页数:7
相关论文
共 17 条
[1]  
[Anonymous], TECHNICAL REP
[2]  
[Anonymous], 2014, DEEP LEARNING SENTIM
[3]  
[Anonymous], 2013, P 1 INT C LEARN REPR
[4]  
[Anonymous], 2009, 10 INT SOC MUSIC INF
[5]   A neural probabilistic language model [J].
Bengio, Y ;
Ducharme, R ;
Vincent, P ;
Jauvin, C .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) :1137-1155
[6]  
Cano E., 2017, 2017 INT C INT SYST
[7]  
Çano E, 2015, 2015 IEEE 1ST INTERNATIONAL FORUM ON RESEARCH AND TECHNOLOGIES FOR SOCIETY AND INDUSTRY (RTSI 2015) PROCEEDINGS
[8]  
Johnson R., 2015, P 2015 C N AM CHAPT, P103, DOI DOI 10.3115/V1/N15-1011
[9]  
Lee W.-S., 2009, 2013 IEEE INT S MULT, P24
[10]   Dependency-Based Word Embeddings [J].
Levy, Omer ;
Goldberg, Yoav .
PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2014, :302-308