A feature selection model based on genetic rank aggregation for text sentiment classification

被引:312
作者
Onan, Aytug [1 ]
Korukoglu, Serdar [2 ]
机构
[1] Celal Bayar Univ, Manisa, Turkey
[2] Ege Univ, Izmir, Turkey
关键词
Feature selection; rank aggregation; sentiment classification; TRAVELING SALESMAN PROBLEM; ALGORITHMS;
D O I
10.1177/0165551515613226
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis is an important research direction of natural language processing, text mining and web mining which aims to extract subjective information in source materials. The main challenge encountered in machine learning method-based sentiment classification is the abundant amount of data available. This amount makes it difficult to train the learning algorithms in a feasible time and degrades the classification accuracy of the built model. Hence, feature selection becomes an essential task in developing robust and efficient classification models whilst reducing the training time. In text mining applications, individual filter-based feature selection methods have been widely utilized owing to their simplicity and relatively high performance. This paper presents an ensemble approach for feature selection, which aggregates the several individual feature lists obtained by the different feature selection methods so that a more robust and efficient feature subset can be obtained. In order to aggregate the individual feature lists, a genetic algorithm has been utilized. Experimental evaluations indicated that the proposed aggregation model is an efficient method and it outperforms individual filter-based feature selection methods on sentiment classification.
引用
收藏
页码:25 / 38
页数:14
相关论文
共 48 条
[31]   Genetic algorithms for the travelling salesman problem:: A review of representations and operators [J].
Larrañaga, P ;
Kuijpers, CMH ;
Murga, RH ;
Inza, I ;
Dizdarevic, S .
ARTIFICIAL INTELLIGENCE REVIEW, 1999, 13 (02) :129-170
[32]  
Liu B., 2012, Sentiment Analysis and Opinion Mining, P167
[33]   Sentiment analysis algorithms and applications: A survey [J].
Medhat, Walaa ;
Hassan, Ahmed ;
Korashy, Hoda .
AIN SHAMS ENGINEERING JOURNAL, 2014, 5 (04) :1093-1113
[34]   Feature sub-set selection metrics for Arabic text classification [J].
Mesleh, Abdelwadood Moh'd .
PATTERN RECOGNITION LETTERS, 2011, 32 (14) :1922-1929
[35]  
Onan A, 2015, SIG PROCESS COMMUN, P212, DOI 10.1109/SIU.2015.7129796
[36]   Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach [J].
Pihur, Vasyl ;
Datta, Susmita ;
Datta, Somnath .
BIOINFORMATICS, 2007, 23 (13) :1607-1615
[37]  
Pratt R., 2012, 2012 IEEE PES Innovative Smart Grid Technologies (ISGT), DOI 10.1109/ISGT.2012.6175820
[38]   Theoretical and empirical analysis of ReliefF and RReliefF [J].
Robnik-Sikonja, M ;
Kononenko, I .
MACHINE LEARNING, 2003, 53 (1-2) :23-69
[39]   Robust Feature Selection Technique Using Rank Aggregation [J].
Sarkar, Chandrima ;
Cooley, Sarah ;
Srivastava, Jaideep .
APPLIED ARTIFICIAL INTELLIGENCE, 2014, 28 (03) :243-257
[40]   A novel feature selection method for text classification using association rules and clustering [J].
Sheydaei, Navid ;
Saraee, Mohamad ;
Shahgholian, Azar .
JOURNAL OF INFORMATION SCIENCE, 2015, 41 (01) :3-15