Semantic Twitter sentiment analysis based on a fuzzy thesaurus

被引:19
作者
Ismail, Heba M. [1 ]
Belkhouche, Boumediene [1 ]
Zaki, Nazar [1 ]
机构
[1] United Arab Emirates Univ, Coll Informat Technol, Dept Comp Sci & Software Engn, Al Ain, U Arab Emirates
关键词
Text mining; Fuzzy thesaurus; Semantic analysis; Text context; Twitter sentiment analysis;
D O I
10.1007/s00500-017-2994-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We define a new, fully automated and domain-independent method for building feature vectors from Twitter text corpus for machine learning sentiment analysis based on a fuzzy thesaurus and sentiment replacement. The proposed method measures the semantic similarity of Tweets with features in the feature space instead of using terms' presence or frequency feature vectors. Thus, we account for the sentiment of the context instead of just counting sentiment words. We use sentiment replacement to reduce the dimensionality of the feature space and a fuzzy thesaurus to incorporate semantics. Experimental results show that sentiment replacement yields up to 35% reduction in the dimensionality of the feature space. Moreover, feature vectors developed based on a fuzzy thesaurus show improvement of sentiment classification performance with multinomial na < ve Bayes and support vector machine classifiers with accuracies of 83 and 85%, respectively, on the Stanford testing dataset. Incorporating the fuzzy thesaurus resulted in the best accuracy compared to the baselines with an increase greater than 3%. Comparable results were obtained with a larger dataset, the STS-Gold, indicating the robustness of the proposed method. Furthermore, comparison of results with previous work shows that the proposed method outperforms other methods reported in the literature using the same benchmark data.
引用
收藏
页码:6011 / 6024
页数:14
相关论文
共 44 条
[1]   Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums [J].
Abbasi, Ahmed ;
Chen, Hsinchun ;
Salem, Arab .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2008, 26 (03)
[2]  
[Anonymous], 2011, WORKSH UNS LEARN NLP
[3]  
[Anonymous], 2014, CELL PHON SOC MED CA
[4]  
[Anonymous], 2014, Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media, DOI DOI 10.1609/ICWSM.V8I1.14508
[5]  
[Anonymous], 2009, Stanford
[6]  
[Anonymous], 2006, Proceedings of the LREC-06, 5th conference on language resources and evaluation,, noeth, DOI DOI 10.1155/2015/715730
[7]  
[Anonymous], 2010, P 2 INT WORKSH SEARC
[8]  
Barbosa L., 2010, INT C COMP LING, P36, DOI [DOI 10.1145/3167132.3167324, 10.1016/j.sedgeo.2006.07.004]
[9]  
Batra Siddharth., 2010, Science, V9, P1
[10]  
Bhuta S, 2014, PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ISSUES AND CHALLENGES IN INTELLIGENT COMPUTING TECHNIQUES (ICICT), P583, DOI 10.1109/ICICICT.2014.6781346