Multi-class sentiment classification: The experimental comparisons of feature selection and machine learning algorithms

被引:135
作者
Liu, Yang [1 ]
Bi, Jian-Wu [1 ]
Fan, Zhi-Ping [1 ,2 ]
机构
[1] Northeastern Univ, Sch Business Adm, Dept Management Sci & Engn, Shenyang 110167, Peoples R China
[2] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Peoples R China
基金
美国国家科学基金会;
关键词
Multi-class sentiment classification; Experimental comparison; Feature selection algorithms; Machine learning algorithms; STRENGTH DETECTION; INFORMATION; REVIEWS;
D O I
10.1016/j.eswa.2017.03.042
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-class sentiment classification has extensive application backgrounds, whereas studies on this issue are still relatively scarce. In this paper, a framework for multi-class sentiment classification is proposed, which includes two parts: 1) selecting important features of texts using the feature selection algorithm, and 2) training multi-class sentiment classifier using the machine learning algorithm. Then, experiments are conducted for comparing the performances of four popular feature selection algorithms (document frequency, CHI statistics, information gain and gain ratio) and five popular machine learning algorithms (decision tree, naive Bayes, support vector machine, radial basis function neural network and K-nearest neighbor) in multi-class sentiment classification. The experiments are conducted on three public datasets which include twelve data subsets, and 10-fold cross validation is used to obtain the classification accuracy concerning each combination of feature selection algorithm, machine learning algorithm, feature set size and data subset. Based on the obtained 3600 classification accuracies (4 feature selection algorithms x 5 machine learning algorithms x 15 feature set sizes x 12 data subsets), the average classification accuracy of each algorithm is calculated, and the Wilcoxon test is used to verify the existence of significant difference between different algorithms in multi-class sentiment classification. The results show that, in terms of classification accuracy, gain ratio performs best among the four feature selection algorithms and support vector machine performs best among the five machine learning algorithms. In terms of execution time, the similar comparisons are also conducted. The obtained results would be valuable for further improving the existing multi-class sentiment classifiers and developing new multi-class sentiment classifiers. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:323 / 339
页数:17
相关论文
共 54 条
[1]   Prominent feature extraction for review analysis: an empirical study [J].
Agarwal, Basant ;
Mittal, Namita .
JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2016, 28 (03) :485-498
[2]  
[Anonymous], 1998, P AAAI 98 WORKSH LEA, DOI DOI 10.1109/TSMC.1985.6313426
[3]   Opinion Mining and Information Fusion: A survey [J].
Balazs, Jorge A. ;
Velasquez, Juan D. .
INFORMATION FUSION, 2016, 27 :95-110
[4]   The impact of polices on government social media usage: Issues, challenges, and recommendations [J].
Bertot, John Carlo ;
Jaeger, Paul T. ;
Hansen, Derek .
GOVERNMENT INFORMATION QUARTERLY, 2012, 29 (01) :30-40
[5]  
Bickerstaffe A., 2010, Proceedings of the 23rd international conference on computational linguistics, P62
[6]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[7]  
Cao M.D., 2012, P AUSTR LANG TECHN A, P52
[8]   An effective early fraud detection method for online auctions [J].
Chang, Wen-Hsi ;
Chang, Jau-Shien .
ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS, 2012, 11 (04) :346-360
[9]   Visualizing market structure through online product reviews: Integrate topic modeling, TOPSIS, and multi-dimensional scaling approaches [J].
Chen, Kun ;
Kou, Gang ;
Shang, Jennifer ;
Chen, Yang .
ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS, 2015, 14 (01) :58-74
[10]   ORTHOGONAL LEAST-SQUARES LEARNING ALGORITHM FOR RADIAL BASIS FUNCTION NETWORKS [J].
CHEN, S ;
COWAN, CFN ;
GRANT, PM .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1991, 2 (02) :302-309