Optimized Swarm Search-based Feature Selection for Text Mining in Sentiment Analysis

被引:5
作者
Fong, Simon [1 ]
Gao, Elisa [1 ]
Wong, Raymond [2 ]
机构
[1] Univ Macau, Dept Comp & Informat Sci, Taipa, Macau, Peoples R China
[2] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW, Australia
来源
2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW) | 2015年
关键词
Feature Selection; Classification; Swarm Search; ALGORITHMS;
D O I
10.1109/ICDMW.2015.231
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentiment analysis emerged as an important computational domain to gain insights from snippets of texts, as social media recently gained popularity. Text mining has long been a fundamental data analytic for sentiment analysis. One of the popular preprocessing approaches in text mining is transforming text strings to word vectors which form a high-dimensional sparse matrix. This sparse matrix poses challenges to induction of an accurate sentiment classification model. Feature selection is usually applied for finding a subset of features from all the original features from the sparse matrix, in order to enhance the accuracy of the classification model. In this paper, a new feature selection method called Optimized Swarm Search-based Feature Selection (OS-FS) is proposed. OS-FS is a swarm-type of searching function that selects an ideal subset of features for enhanced classification accuracy. The swarm search in OS-FS is optimized by a new feature evaluation technique called Clustering-by-Coefficient-of-Variation (CCV). The proposed scheme is verified via a mood classification scenario where 100 sample news are extracted from CNN.com. One of six human emotions (or sentiments) would have to be recognized from the news contents, by computer using text mining. The results show superiority of OS-FS over the traditional feature selection methods.
引用
收藏
页码:1153 / 1162
页数:10
相关论文
共 21 条
[11]   Evaluation of Stream Mining Classifiers for Real-Time Clinical Decision Support System: A Case Study of Blood Glucose Prediction in Diabetes Therapy [J].
Fong, Simon ;
Zhang, Yang ;
Fiaidhi, Jinan ;
Mohammed, Osama ;
Mohammed, Sabah .
BIOMED RESEARCH INTERNATIONAL, 2013, 2013
[12]   NEURAL NETWORKS AND THE BIAS VARIANCE DILEMMA [J].
GEMAN, S ;
BIENENSTOCK, E ;
DOURSAT, R .
NEURAL COMPUTATION, 1992, 4 (01) :1-58
[13]  
Hall M. A., 1999, Proceedings of the Twelfth International Florida AI Research Society Conference, P235
[14]  
Hassanien A.-E., 2015, STUDIES BIG DATA
[15]   Comparative study of classification algorithms for immunosignaturing data [J].
Kukreja, Muskan ;
Johnston, Stephen Albert ;
Stafford, Phillip .
BMC BIOINFORMATICS, 2012, 13
[16]  
Lakshminarayan Choudur K., 2013, Big Data Analytics. Second International Conference, BDA 2013. Proceedings: LNCS 8302, P68, DOI 10.1007/978-3-319-03689-2_5
[17]  
Moraglio A, 2007, LECT NOTES COMPUT SC, V4445, P125
[18]  
Ohta K, 1995, LECT NOTES COMPUT SC, V963, P157
[19]  
Senliol B, 2008, 23RD INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, P500
[20]   A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data [J].
Song, Qinbao ;
Ni, Jingjie ;
Wang, Guangtao .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (01) :1-14