A new feature subset selection using bottom-up clustering

被引:0
作者
Zeinab Dehghan
Eghbal G. Mansoori
机构
[1] Shiraz University,School of Electrical and Computer Engineering
来源
Pattern Analysis and Applications | 2018年 / 21卷
关键词
Dimensionality reduction; Feature selection; Hierarchical clustering; Feature clustering;
D O I
暂无
中图分类号
学科分类号
摘要
Feature subset selection and/or dimensionality reduction is an essential preprocess before performing any data mining task, especially when there are too many features in the problem space. In this paper, a clustering-based feature subset selection (CFSS) algorithm is proposed for discriminating more relevant features. In each level of agglomeration, it uses similarity measure among features to merge two most similar clusters of features. By gathering similar features into clusters and then introducing representative features of each cluster, it tries to remove some redundant features. To identify the representative features, a criterion based on mutual information is proposed. Since CFSS works in a filter manner in specifying the representatives, it is noticeably fast. As an advantage of hierarchical clustering, it does not need to determine the number of clusters in advance. In CFSS, the clustering process is repeated until all features are distributed in some clusters. However, to diffuse the features in a reasonable number of clusters, a recently proposed approach is used to obtain a suitable level for cutting the clustering tree. To assess the performance of CFSS, we have applied it on some valid UCI datasets and compared with some popular feature selection methods. The experimental results reveal the efficiency and fastness of our proposed method.
引用
收藏
页码:57 / 66
页数:9
相关论文
共 45 条
[1]  
Roweis S(2000)Nonlinear dimensionality reduction by locally linear embedding Science 290 2323-2326
[2]  
Saul L(1997)Wrapper for feature subset selection Artif Intell 97 273-324
[3]  
Kohavi R(1994)Floating search methods in feature selection Pattern Recognit Lett 15 1119-1125
[4]  
John GH(2003)Overfitting in making comparisons between variable selection methods J Mach Learn Res 3 1371-1382
[5]  
Pudil P(1995)Particle swarm optimization IEEE Int Conf Neural Netw 4 942-1948
[6]  
Novovicova J(2014)A survey on feature selection methods Comput Electr Eng 40 16-28
[7]  
Kittler J(2005)Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundance IEEE Trans Pattern Anal Mach Intell 27 1226-1238
[8]  
Reunanen J(2013)Multi-stage filtering for improving confidence level and determining dominant clusters in clustering algorithms of gene expression data Comput Biol Med 43 1120-1133
[9]  
Kennedy J(2012)A survey of hierarchical clustering algorithms J Math Comput Sci 5 229-240
[10]  
Eberhart RC(2014)A top-down information theoretic word clustering algorithm for phrase recognition Inf Sci 275 213-225