Optimized Swarm Search-based Feature Selection for Text Mining in Sentiment Analysis

被引:5
作者
Fong, Simon [1 ]
Gao, Elisa [1 ]
Wong, Raymond [2 ]
机构
[1] Univ Macau, Dept Comp & Informat Sci, Taipa, Macau, Peoples R China
[2] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW, Australia
来源
2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW) | 2015年
关键词
Feature Selection; Classification; Swarm Search; ALGORITHMS;
D O I
10.1109/ICDMW.2015.231
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentiment analysis emerged as an important computational domain to gain insights from snippets of texts, as social media recently gained popularity. Text mining has long been a fundamental data analytic for sentiment analysis. One of the popular preprocessing approaches in text mining is transforming text strings to word vectors which form a high-dimensional sparse matrix. This sparse matrix poses challenges to induction of an accurate sentiment classification model. Feature selection is usually applied for finding a subset of features from all the original features from the sparse matrix, in order to enhance the accuracy of the classification model. In this paper, a new feature selection method called Optimized Swarm Search-based Feature Selection (OS-FS) is proposed. OS-FS is a swarm-type of searching function that selects an ideal subset of features for enhanced classification accuracy. The swarm search in OS-FS is optimized by a new feature evaluation technique called Clustering-by-Coefficient-of-Variation (CCV). The proposed scheme is verified via a mood classification scenario where 100 sample news are extracted from CNN.com. One of six human emotions (or sentiments) would have to be recognized from the news contents, by computer using text mining. The results show superiority of OS-FS over the traditional feature selection methods.
引用
收藏
页码:1153 / 1162
页数:10
相关论文
共 21 条
[1]  
[Anonymous], 1998, CORRELATION BASED FE
[2]  
[Anonymous], 1996, PROBABILISTIC APPROA
[3]  
[Anonymous], P WILK INT C COMP SC
[4]  
[Anonymous], 1990, SUPPORT VECTOR LEARN
[5]  
[Anonymous], 2011, P 28 INT C MACHINE L
[6]   Takeover time in Evolutionary Dynamic Optimization: From theory to practice [J].
Bravo, Yesnier ;
Luque, Gabriel ;
Alba, Enrique .
APPLIED MATHEMATICS AND COMPUTATION, 2015, 250 :94-104
[7]   Different metaheuristic strategies to solve the feature selection problem [J].
Casado Yusta, Silvia .
PATTERN RECOGNITION LETTERS, 2009, 30 (05) :525-534
[8]   Search based algorithms for test sequence generation in functional testing [J].
Ferrer, Javier ;
Kruse, Peter M. ;
Chicano, Francisco ;
Alba, Enrique .
INFORMATION AND SOFTWARE TECHNOLOGY, 2015, 58 :419-432
[9]  
Fong S, 2014, 2014 NINTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT (ICDIM), P205, DOI 10.1109/ICDIM.2014.6991429
[10]   Feature Selection in Life Science Classification: Metaheuristic Swarm Search [J].
Fong, Simon ;
Deb, Suash ;
Yang, Xin-She ;
Li, Jinyan .
IT PROFESSIONAL, 2014, 16 (04) :24-29