A Survey on Filter Techniques for Feature Selection in Text Mining

被引:15
作者
Bharti, Kusum Kumari [1 ]
Singh, Pramod Kumar [1 ]
机构
[1] ABV Indian Inst Informat Technol & Management Gwa, Computat Intelligence & Data Min Res Lab, Gwalior, Madhya Pradesh, India
来源
PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2012) | 2014年 / 236卷
关键词
Text mining; Text categorization; Text clustering; Feature extraction; Feature selection; Filter methods; PRINCIPAL COMPONENT ANALYSIS; PARTICLE SWARM OPTIMIZATION; INFORMATION GAIN; ALGORITHM; CLASSIFICATION;
D O I
10.1007/978-81-322-1602-5_154
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A large portion of a document is usually covered by irrelevant features. Instead of identifying actual context of the document, such features increase dimensions in the representation model and computational complexity of underlying algorithm, and hence adversely affect the performance. It necessitates a requirement of relevant feature selection in the given feature space. In this context, feature selection plays a key role in removing irrelevant features from the original feature space. Feature selection methods are broadly categorized into three groups: filter, wrapper, and embedded. Filter methods are widely used in text mining because of their simplicity, computational complexity, and efficiency. In this article, we provide a brief survey of filter feature selection methods along with some of the recent developments in this area.
引用
收藏
页码:1545 / 1559
页数:15
相关论文
共 30 条
[1]   Feature selection for text classification with Naive Bayes [J].
Chen, Jingnian ;
Huang, Houkuan ;
Tian, Shengfeng ;
Qu, Youli .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) :5432-5435
[2]   An improved branch and bound algorithm for feature selection [J].
Chen, XW .
PATTERN RECOGNITION LETTERS, 2003, 24 (12) :1925-1933
[3]   Improved binary particle swarm optimization using catfish effect for feature selection [J].
Chuang, Li-Yeh ;
Tsai, Sheng-Wei ;
Yang, Cheng-Hong .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (10) :12699-12707
[4]   A hybrid feature selection method for DNA microarray data [J].
Chuang, Li-Yeh ;
Yang, Cheng-Huei ;
Wu, Kuo-Chuan ;
Yang, Cheng-Hong .
COMPUTERS IN BIOLOGY AND MEDICINE, 2011, 41 (04) :228-237
[5]  
Church K.W., 1990, J COMPUT LINGUIST, V27, P22
[6]  
DEERWESTER S, 1988, P ASIS ANNU MEET, V25, P36
[7]   Minimum redundancy feature selection from microarray gene expression data [J].
Ding, C ;
Peng, HC .
PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, :523-528
[8]   Efficient feature selection filters for high-dimensional data [J].
Ferreira, Artur J. ;
Figueiredo, Mario A. T. .
PATTERN RECOGNITION LETTERS, 2012, 33 (13) :1794-1804
[9]  
Hall M. A., 1998, Correlation-based feature subset selection for machine learning
[10]   Hybrid feature selection by combining filters and wrappers [J].
Hsu, Hui-Huang ;
Hsieh, Cheng-Wei ;
Lu, Ming-Da .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (07) :8144-8150