Combining Classification and Clustering for Tweet Sentiment Analysis

被引:40
作者
Coletta, Luiz F. S. [1 ]
da Silva, Nadia F. F. [1 ]
Hruschka, Eduardo R. [1 ]
Hruschka, Estevam R., Jr. [2 ]
机构
[1] Univ Sao Paulo, Inst Math & Comp Sci, BR-13560 Sao Carlos, Brazil
[2] Fed Univ Sao Carlos UFSCAR, Dept Comp Sci, Sao Carlos, Brazil
来源
2014 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS) | 2014年
关键词
Tweet Sentiment Analysis; Classification; Support Vector Machines; Clustering; Cluster Ensemble;
D O I
10.1109/BRACIS.2014.46
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The goal of sentiment analysis is to determine opinions, emotions, and attitudes presented in source material. In tweet sentiment analysis, opinions in messages can be typically categorized as positive or negative. To classify them, researchers have been using traditional classifiers like Naive Bayes, Maximum Entropy, and Support Vector Machines (SVM). In this paper, we show that a SVM classifier combined with a cluster ensemble can offer better classification accuracies than a stand-alone SVM. In our study, we employed an algorithm, named (CE)-E-3-SL, capable to combine classifier and cluster ensembles. This algorithm can refine tweet classifications from additional information provided by clusterers, assuming that similar instances from the same clusters are more likely to share the same class label. The resulting classifier has shown to be competitive with the best results found so far in the literature, thereby suggesting that the studied approach is promising for tweet sentiment classification.
引用
收藏
页码:210 / 215
页数:6
相关论文
共 54 条
[1]  
Acharya A, 2011, LECT NOTES COMPUT SC, V6713, P269, DOI 10.1007/978-3-642-21557-5_29
[2]  
Acharya A., 2014, ACM T KNOWL IN PRESS
[3]  
[Anonymous], 2011, WORKSH UNS LEARN NLP
[4]  
[Anonymous], 2012, Synth. Lectures Human Lang. Technol., DOI [10.2200/S00416ED1V01Y201204HLT016, DOI 10.2200/S00416ED1V01Y201204HLT016]
[5]  
[Anonymous], 2010, Proceedings of the 23rdInternational Conference on Computational Linguistics: Posters
[6]  
[Anonymous], 2005, Proceedings of the ACL student research workshop
[7]  
[Anonymous], 2011, TECHNICAL REPORT
[8]  
[Anonymous], 2013, P 6 ACM INT C WEB SE
[9]  
[Anonymous], DATA CENTRIC SYSTEMS
[10]  
[Anonymous], 2007, ACM Transactions on Knowledge Discovery from Data, DOI [DOI 10.1145/1217299.1217303, 10.1145/1217299.1217303]