Feature selection for clustering - A filter solution

被引:231
|
作者
Dash, M [1 ]
Choi, K [1 ]
Scheuermann, P [1 ]
Liu, H [1 ]
机构
[1] Northwestern Univ, Dept Elect & Comp Engn, Evanston, IL 60208 USA
关键词
D O I
10.1109/ICDM.2002.1183893
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Processing applications with a large number of dimensions has been a challenge to the KDD community. Feature selection, an effective dimensionality reduction technique, is an essential pre-processing method to remove noisy features. In the literature there are only a few methods proposed for feature selection for clustering. And, almost all of those methods are 'wrapper' techniques that require a clustering algorithm to evaluate the candidate feature subsets. The wrapper approach is largely unsuitable in real-world applications due to its heavy reliance on clustering algorithms that require parameters such as number of clusters, and due to lack of suitable clustering criteria to evaluate clustering in different subspaces. In this paper we propose a 'filter' method that is independent of any clustering algorithm. The proposed method is based on the observation that data with clusters has very different point-to-point distance histogram than that of data without clusters. Using this we propose an entropy measure that is low if data has distinct clusters and high otherwise. The entropy measure is suitable for selecting the most important subset of features because it is invariant with number of dimensions, and is affected only by the quality of clustering. Extensive performance evaluation over synthetic, benchmark, and real datasets shows its effectiveness.
引用
收藏
页码:115 / 122
页数:8
相关论文
共 50 条
  • [31] Heuristic feature selection method for clustering
    School of Computer Science and Engineering, Southeast University, Nanjing 210096, China
    不详
    不详
    J. Southeast Univ. Engl. Ed., 2006, 2 (169-175):
  • [32] Clustering-based feature selection
    School of Informatics, Guangdong University of Foreign Studies, Guangzhou 510006, China
    Tien Tzu Hsueh Pao, 2008, SUPPL. (157-160):
  • [33] Feature selection via fuzzy clustering
    Sun, Hao-Jun
    Sun, Mei
    Mei, Zhen
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 1400 - +
  • [34] Feature Selection and Semisupervised Fuzzy Clustering
    Kong, Yi-qing
    Wang, Shi-tong
    FUZZY INFORMATION AND ENGINEERING, 2009, 1 (02) : 179 - 190
  • [35] Greedy Feature Selection for Subspace Clustering
    Dyer, Eva L.
    Sankaranarayanan, Aswin C.
    Baraniuk, Richard G.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2013, 14 : 2487 - 2517
  • [36] FEATURE-SELECTION BY INTERACTIVE CLUSTERING
    WISMATH, SK
    SOONG, HP
    AKL, SG
    PATTERN RECOGNITION, 1981, 14 (1-6) : 75 - 80
  • [37] Unsupervised feature selection for balanced clustering
    Zhou, Peng
    Chen, Jiangyong
    Fan, Mingyu
    Du, Liang
    Shen, Yi-Dong
    Li, Xuejun
    KNOWLEDGE-BASED SYSTEMS, 2020, 193
  • [38] A survey on feature selection approaches for clustering
    Hancer, Emrah
    Xue, Bing
    Zhang, Mengjie
    ARTIFICIAL INTELLIGENCE REVIEW, 2020, 53 (06) : 4519 - 4545
  • [39] Filter pruning via feature map clustering
    Li, Wei
    He, Yongxing
    Zhang, Xiaoyu
    Tang, Yongchuan
    INTELLIGENT DATA ANALYSIS, 2023, 27 (04) : 911 - 933
  • [40] GLEE: A granularity filter for feature selection
    Ba, Jing
    Wang, Pingxin
    Yang, Xibei
    Yu, Hualong
    Yu, Dongjun
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 122