Efficient feature selection techniques for sentiment analysis

被引:0
作者
Avinash Madasu
Sivasankar Elango
机构
[1] Bagmane Constellation Business Park,Samsung R and D Institute India, Bengaluru
[2] National Institute of Technology,Department of Computer Science
来源
Multimedia Tools and Applications | 2020年 / 79卷
关键词
Feature selection; Ensemble techniques; Sentiment analysis; Machine learning;
D O I
暂无
中图分类号
学科分类号
摘要
Sentiment analysis is a domain of study that focuses on identifying and classifying the ideas expressed in the form of text into positive, negative and neutral polarities. Feature selection is a crucial process in machine learning. In this paper, we aim to study the performance of different feature selection techniques for sentiment analysis. Term Frequency Inverse Document Frequency (TF-IDF) is used as the feature extraction technique for creating feature vocabulary. Various Feature Selection (FS) techniques are experimented to select the best set of features from feature vocabulary. The selected features are trained using different machine learning classifiers Logistic Regression (LR), Support Vector Machines (SVM), Decision Tree (DT) and Naive Bayes (NB). Ensemble techniques Bagging and Random Subspace are applied on classifiers to enhance the performance on sentiment analysis. We show that, when the best FS techniques are trained using ensemble methods achieve remarkable results on sentiment analysis. We also compare the performance of FS methods trained using Bagging, Random Subspace with varied neural network architectures. We show that FS techniques trained using ensemble classifiers outperform neural networks requiring significantly less training time and parameters thereby eliminating the need for extensive hyper-parameter tuning.
引用
收藏
页码:6313 / 6335
页数:22
相关论文
共 38 条
[1]  
Abdi A(2019)Deep learning-based sentiment classification of evaluative text based on Multi-feature fusion Inf Process Manag 56 1245-1259
[2]  
Shamsuddin SM(1998)The random subspace method for constructing decision forests IEEE Trans Pattern Anal Mach Intell 20 1-22
[3]  
Hasan S(1996)Bagging predictors Mach Learn 24 123-140
[4]  
Piran J(2019)Cognitive-inspired class-statistic matching with triple-constrain for camera free 3D object retrieval Futur Gener Comput Syst 94 641-653
[5]  
Barandiaran I(1954)Distributional structure Word 10.2-3 146-162
[6]  
Breiman L(1997)Long short-term memory Neural computation 9 1735-1780
[7]  
Gao Z(2018)A novel multivariate filter method for feature selection in text classification problems Eng Appl Artif Intel 70 25-37
[8]  
Wang DY(2019)Memetic feature selection for multilabel text categorization using label frequency difference Inform Sci 485 263-280
[9]  
Wan SH(2019)E2SAM: Evolutionary ensemble of sentiment analysis methods for domain adaptation Inform Sci 480 273-286
[10]  
Zhang H(2015)Relative discrimination criterion–A novel feature ranking method for text data Expert Syst Appl 42 3670-3681