Semantic Feature Clustering for Sentiment Analysis of English Reviews

被引:19
作者
Agarwal, Basant [1 ]
Mittal, Namita [1 ]
机构
[1] Malaviya Natl Inst Technol, Dept Comp Engn, Jaipur, Rajasthan, India
关键词
Feature extraction methods; Machine learning; Clustering features; Sentiment analysis; Semantic features;
D O I
10.1080/03772063.2014.963172
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Sentiment analysis research has increased tremendously in recent times due to the wide range of business and social applications. Motivation behind sentiment analysis is that it provides companies' methods to determine the product acceptance and ways to improve its quality. It also helps users to take purchasing decisions. Various parsing schemes/feature extraction methods have been proposed in the literature to process unstructured text to extract patterns that may help machine learning model to learn. The main limitation of the existing feature extraction techniques is the sparseness of the data and inability to incorporate semantic information. In this paper, a new feature extraction method is proposed, namely clustering features. Proposed feature extraction technique focuses on alleviating the data sparsity faced by supervised sentiment analysis by clustering of semantic features. Proposed clustering features are capable of including semantic information and alleviating data sparseness for machine learning algorithm. In all the experiments, support vector machine and Boolean Multinomial Naive Bayes (BMNB) machine learning algorithms are used for classification. Experimental results show that the proposed clustering features significantly outperform other features for document-level sentiment classification. All the experiments are performed on standard movie review data-set and product review data-sets, namely book, electronics, kitchen appliances.
引用
收藏
页码:414 / 422
页数:9
相关论文
共 11 条
[1]  
[Anonymous], 2012, Synth. Lectures Human Lang. Technol., DOI [10.2200/S00416ED1V01Y201204HLT016, DOI 10.2200/S00416ED1V01Y201204HLT016]
[2]  
[Anonymous], 2008, Introduction to information retrieval
[3]  
Bo Pang, 2008, Foundations and Trends in Information Retrieval, V2, P1, DOI 10.1561/1500000001
[4]  
Covoes TF, 2009, LECT NOTES ARTIF INT, V5572, P169, DOI 10.1007/978-3-642-02319-4_20
[5]   A Lexicon-Enhanced Method for Sentiment Classification: An Experiment on Online Product Reviews [J].
Dang, Yan ;
Zhang, Yulei ;
Chen, Hsinchun .
IEEE INTELLIGENT SYSTEMS, 2010, 25 (04) :46-53
[6]   Al and Opinion Mining, Part 2 [J].
Esuli, Andrea ;
Sebastiani, Fabrizio ;
Abbasi, Ahmed .
IEEE INTELLIGENT SYSTEMS, 2010, 25 (04) :72-79
[7]  
Guyon I., 2003, J MACH LEARN RES, V3, P1157
[8]  
Liu B, 2010, CH CRC MACH LEARN PA, P627
[9]   Unsupervised feature selection using feature similarity [J].
Mitra, P ;
Murthy, CA ;
Pal, SK .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (03) :301-312
[10]   A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data [J].
Song, Qinbao ;
Ni, Jingjie ;
Wang, Guangtao .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (01) :1-14