A Survey and Comparative Study of Tweet Sentiment Analysis via Semi-Supervised Learning

被引:58
作者
Da Silva, Nadia Felix F. [1 ]
Coletta, Luiz F. S. [1 ]
Hruschka, Eduardo R. [1 ]
机构
[1] Univ Sao Paulo, Inst Math & Comp Sci, Sao Carlos, SP, Brazil
基金
巴西圣保罗研究基金会;
关键词
Co-training; self-training; semi-supervised learning; topic modeling; tweet sentiment analysis; CLASSIFIER;
D O I
10.1145/2932708
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Twitter is a microblogging platform in which users can post status messages, called "tweets," to their friends. It has provided an enormous dataset of the so-called sentiments, whose classification can take place through supervised learning. To build supervised learning models, classification algorithms require a set of representative labeled data. However, labeled data are usually difficult and expensive to obtain, which motivates the interest in semi-supervised learning. This type of learning uses unlabeled data to complement the information provided by the labeled data in the training process; therefore, it is particularly useful in applications including tweet sentiment analysis, where a huge quantity of unlabeled data is accessible. Semi-supervised learning for tweet sentiment analysis, although appealing, is relatively new. We provide a comprehensive survey of semi-supervised approaches applied to tweet classification. Such approaches consist of graph-based, wrapper-based, and topic-based methods. A comparative study of algorithms based on self-training, co-training, topic modeling, and distant supervision highlights their biases and sheds light on aspects that the practitioner should consider in real-world applications.
引用
收藏
页数:26
相关论文
共 115 条
[51]  
Hu M, 2004, P 10 ACM SIGKDD INT, P168, DOI DOI 10.1145/1014052.1014073
[52]  
Hu Xia, 2013, P 6 ACM INT C WEB SE
[53]   Twitter Power: Tweets as Electronic Word of Mouth [J].
Jansen, Bernard J. ;
Zhang, Mimi ;
Sobel, Kate ;
Chowdury, Abdur .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2009, 60 (11) :2169-2188
[54]  
Jo Y., 2011, P 4 ACM INT C WEB SE, P815, DOI [DOI 10.1145/1935826.1935932, https://doi.org/10.1145/1935826.1935932]
[55]  
Joachims T, 1999, ADVANCES IN KERNEL METHODS, P169
[56]  
Johnson C., 2012, On classifying the political sentiment of tweets
[57]  
Kim HG, 2013, 2013 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), P1215
[58]  
Kmieciak Milosz R., 2011, Control and Cybernetics, V40, P667
[59]  
Larissa A., 2014, The 29th Annual ACM Symposium on Applied Computing, P628
[60]  
Lazarsfeld P.F., 1954, Freedom and Control in Modern Society, V18, P18