A Survey and Comparative Study of Tweet Sentiment Analysis via Semi-Supervised Learning

被引:58
作者
Da Silva, Nadia Felix F. [1 ]
Coletta, Luiz F. S. [1 ]
Hruschka, Eduardo R. [1 ]
机构
[1] Univ Sao Paulo, Inst Math & Comp Sci, Sao Carlos, SP, Brazil
基金
巴西圣保罗研究基金会;
关键词
Co-training; self-training; semi-supervised learning; topic modeling; tweet sentiment analysis; CLASSIFIER;
D O I
10.1145/2932708
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Twitter is a microblogging platform in which users can post status messages, called "tweets," to their friends. It has provided an enormous dataset of the so-called sentiments, whose classification can take place through supervised learning. To build supervised learning models, classification algorithms require a set of representative labeled data. However, labeled data are usually difficult and expensive to obtain, which motivates the interest in semi-supervised learning. This type of learning uses unlabeled data to complement the information provided by the labeled data in the training process; therefore, it is particularly useful in applications including tweet sentiment analysis, where a huge quantity of unlabeled data is accessible. Semi-supervised learning for tweet sentiment analysis, although appealing, is relatively new. We provide a comprehensive survey of semi-supervised approaches applied to tweet classification. Such approaches consist of graph-based, wrapper-based, and topic-based methods. A comparative study of algorithms based on self-training, co-training, topic modeling, and distant supervision highlights their biases and sheds light on aspects that the practitioner should consider in real-world applications.
引用
收藏
页数:26
相关论文
共 115 条
[21]  
[Anonymous], 2003, ICML 2003 WORKSHOP C
[22]  
[Anonymous], 2004, 20 INT C COMP LING G
[23]  
[Anonymous], THESIS U WISCONSIN M
[24]   A Discriminative Model for Semi-Supervised Learning [J].
Balcan, Maria-Florina ;
Blum, Avrim .
JOURNAL OF THE ACM, 2010, 57 (03)
[25]  
Bifet A, 2010, P 13 INT C DISC SCI, P1, DOI DOI 10.1007/978-3-642-16184-1_1
[26]  
Bifet A, 2011, LECT NOTES ARTIF INT, V6926, P46, DOI 10.1007/978-3-642-24477-3_7
[27]  
Bing L., 2012, Sentiment Analysis and Opinion Mining (Synthesis Lectures on Human Language Technologies)
[28]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[29]  
Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[30]   A microblogging-based approach to terrorism informatics: Exploration and chronicling civilian sentiment and response to terrorism events via Twitter [J].
Cheong, Marc ;
Lee, Vincent C. S. .
INFORMATION SYSTEMS FRONTIERS, 2011, 13 (01) :45-59