A Survey and Comparative Study of Tweet Sentiment Analysis via Semi-Supervised Learning

被引:58
作者
Da Silva, Nadia Felix F. [1 ]
Coletta, Luiz F. S. [1 ]
Hruschka, Eduardo R. [1 ]
机构
[1] Univ Sao Paulo, Inst Math & Comp Sci, Sao Carlos, SP, Brazil
基金
巴西圣保罗研究基金会;
关键词
Co-training; self-training; semi-supervised learning; topic modeling; tweet sentiment analysis; CLASSIFIER;
D O I
10.1145/2932708
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Twitter is a microblogging platform in which users can post status messages, called "tweets," to their friends. It has provided an enormous dataset of the so-called sentiments, whose classification can take place through supervised learning. To build supervised learning models, classification algorithms require a set of representative labeled data. However, labeled data are usually difficult and expensive to obtain, which motivates the interest in semi-supervised learning. This type of learning uses unlabeled data to complement the information provided by the labeled data in the training process; therefore, it is particularly useful in applications including tweet sentiment analysis, where a huge quantity of unlabeled data is accessible. Semi-supervised learning for tweet sentiment analysis, although appealing, is relatively new. We provide a comprehensive survey of semi-supervised approaches applied to tweet classification. Such approaches consist of graph-based, wrapper-based, and topic-based methods. A comparative study of algorithms based on self-training, co-training, topic modeling, and distant supervision highlights their biases and sheds light on aspects that the practitioner should consider in real-world applications.
引用
收藏
页数:26
相关论文
共 115 条
[1]  
[Anonymous], 2005, 1530 U WISC
[2]  
[Anonymous], 2009, P 14 AUSTR DOC COMP
[3]  
[Anonymous], 2014, P 8 INT WORKSH SEM E
[4]  
[Anonymous], P NAACL
[5]  
[Anonymous], P 2 JOINT C LEX COMP
[6]  
[Anonymous], 2013, SHORT PAPERS
[7]  
[Anonymous], 2014, P 8 INT WORKSH SEM E
[8]  
[Anonymous], 2006, BOOK REV IEEE T NEUR
[9]  
[Anonymous], 2010, Proceedings of the 23rdInternational Conference on Computational Linguistics: Posters
[10]  
[Anonymous], 2014, Proceedings of the 25th International Conference on Computational Linguistics