A new unsupervised feature selection algorithm using similarity-based feature clustering

被引:36
作者
Zhu, Xiaoyan [1 ]
Wang, Yu [1 ]
Li, Yingbin [1 ]
Tan, Yonghui [1 ]
Wang, Guangtao [2 ]
Song, Qinbao [1 ]
机构
[1] Xi An Jiao Tong Univ, Sch Elect & Informat Engn, Xian, Shaanxi, Peoples R China
[2] JD AI Res, Mountain View, CA USA
基金
中国国家自然科学基金;
关键词
clustering; feature selection; feature similarity; CLASSIFICATION;
D O I
10.1111/coin.12192
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unsupervised feature selection is an important problem, especially for high-dimensional data. However, until now, it has been scarcely studied and the existing algorithms cannot provide satisfying performance. Thus, in this paper, we propose a new unsupervised feature selection algorithm using similarity-based feature clustering, Feature Selection-based Feature Clustering (FSFC). FSFC removes redundant features according to the results of feature clustering based on feature similarity. First, it clusters the features according to their similarity. A new feature clustering algorithm is proposed, which overcomes the shortcomings of K-means. Second, it selects a representative feature from each cluster, which contains most interesting information of features in the cluster. The efficiency and effectiveness of FSFC are tested upon real-world data sets and compared with two representative unsupervised feature selection algorithms, Feature Selection Using Similarity (FSUS) and Multi-Cluster-based Feature Selection (MCFS) in terms of runtime, feature compression ratio, and the clustering results of K-means. The results show that FSFC can not only reduce the feature space in less time, but also significantly improve the clustering performance of K-means.
引用
收藏
页码:2 / 22
页数:21
相关论文
共 32 条
[1]  
Ali SI, 2012, 2012 INT C EM TECHN
[2]  
[Anonymous], NUMERICAL RECIPES C
[3]   USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING [J].
BATTITI, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04) :537-550
[4]  
Belkin M, 2002, ADV NEUR IN, V14, P585
[5]  
Boutsidis C, 2011, IEEE T INFORM THEORY, V61, P1045
[6]  
Cai D, 2010, P 16 ACM SIGKDD INT
[7]  
Dash M., 1997, Intelligent Data Analysis, V1
[8]  
Dash M, 2002, 2 IEEE INT C DAT MIN
[9]  
Dy J.G., 2000, P 17 INT C MACHINE L, P247
[10]  
Dy JG, 2004, J MACH LEARN RES, V5, P845