RELIEF-C: Efficient Feature Selection for Clustering over Noisy Data

被引:13
作者
Dash, Manoranjan [1 ]
Ong, Yew-Soon [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
来源
2011 23RD IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2011) | 2011年
关键词
Feature selection; clustering; RELIEF; High-dimensionality; noise;
D O I
10.1109/ICTAI.2011.135
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
RELIEF is a very effective and extremely popular feature selection algorithm developed for the first time in 1992 by Kira and Rendell. Since then it has been modified and expanded in various ways to make it more efficient. But the original RELIEF and all of its expansions are for feature selection over labeled data for classification purposes. To the best of our knowledge, for the first time ever RELIEF is used in this paper as RELIEF-C for unlabeled data to select relevant features for clustering. We modified RELIEF so as to overcome its inherent difficulties in the presence of large number of irrelevant features and/or significant number of noisy tuples. RELIEF-C has several advantages over existing wrapper and filter feature selection methods: (a) it works well in the presence of large amount of noisy tuples; (b) it is robust even when underlying clustering algorithm fails to cluster properly; and (c) it accurately recognizes the relevant features even in the presence of large number of irrelevant features. We compared RELIEF-C with two established feature selection methods for clustering. RELIEF-C outperforms other methods significantly over synthetic, benchmark and real world data sets particularly when data set consists of large amount of noisy tuples and/or irrelevant features.
引用
收藏
页码:869 / 872
页数:4
相关论文
共 11 条
[1]  
[Anonymous], 2000, ICML
[2]  
[Anonymous], 1992, AAAI
[3]  
Blake C. L., 1998, Uci repository of machine learning databases
[4]  
Bouman CA., 1997, UNSUPERVISED ALGORIT
[5]  
DASH M, 2002, ICDM
[6]  
Dash M., 1997, INT J INTELLIGENT DA, V1
[7]   Chameleon: Hierarchical clustering using dynamic modeling [J].
Karypis, G ;
Han, EH ;
Kumar, V .
COMPUTER, 1999, 32 (08) :68-+
[8]  
Lab K., CLUTO FAMILY DATA CL
[9]  
Law A. K. J. M. H. C., 2002, NIPS, P625
[10]   Theoretical and empirical analysis of ReliefF and RReliefF [J].
Robnik-Sikonja, M ;
Kononenko, I .
MACHINE LEARNING, 2003, 53 (1-2) :23-69