Feature selection for clustering - A filter solution

被引:231
|
作者
Dash, M [1 ]
Choi, K [1 ]
Scheuermann, P [1 ]
Liu, H [1 ]
机构
[1] Northwestern Univ, Dept Elect & Comp Engn, Evanston, IL 60208 USA
关键词
D O I
10.1109/ICDM.2002.1183893
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Processing applications with a large number of dimensions has been a challenge to the KDD community. Feature selection, an effective dimensionality reduction technique, is an essential pre-processing method to remove noisy features. In the literature there are only a few methods proposed for feature selection for clustering. And, almost all of those methods are 'wrapper' techniques that require a clustering algorithm to evaluate the candidate feature subsets. The wrapper approach is largely unsuitable in real-world applications due to its heavy reliance on clustering algorithms that require parameters such as number of clusters, and due to lack of suitable clustering criteria to evaluate clustering in different subspaces. In this paper we propose a 'filter' method that is independent of any clustering algorithm. The proposed method is based on the observation that data with clusters has very different point-to-point distance histogram than that of data without clusters. Using this we propose an entropy measure that is low if data has distinct clusters and high otherwise. The entropy measure is suitable for selecting the most important subset of features because it is invariant with number of dimensions, and is affected only by the quality of clustering. Extensive performance evaluation over synthetic, benchmark, and real datasets shows its effectiveness.
引用
收藏
页码:115 / 122
页数:8
相关论文
共 50 条
  • [1] A filter feature selection method for clustering
    Jouve, PE
    Nicoloyannis, N
    FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2005, 3488 : 583 - 593
  • [2] An evaluation of filter and wrapper methods for feature selection in categorical clustering
    Talavera, L
    ADVANCES IN INTELLIGENT DATA ANALYSIS VI, PROCEEDINGS, 2005, 3646 : 440 - 451
  • [3] A Hybrid Clustering Method with a Filter Feature Selection for Hyperspectral Image Classification
    Zhang, Junzhe
    JOURNAL OF IMAGING, 2022, 8 (07)
  • [4] Feature selection for clustering
    Dash, M
    Liu, H
    KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS: CURRENT ISSUES AND NEW APPLICATIONS, 2000, 1805 : 110 - 121
  • [5] Unsupervised Feature Selection with Feature Clustering
    Cheung, Yiu-ming
    Jia, Hong
    2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2012), VOL 1, 2012, : 9 - 15
  • [6] Interaction-based clustering algorithm for feature selection: a multivariate filter approach
    Ahmad Esfandiari
    Hamid Khaloozadeh
    Faezeh Farivar
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 1769 - 1782
  • [7] An Iterative Hybrid Filter-Wrapper Approach to Feature Selection for Document Clustering
    Jashki, Mohammad-Amin
    Makki, Majid
    Bagheri, Ebrahim
    Ghorbani, Ali A.
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, 5549 : 74 - +
  • [8] Interaction-based clustering algorithm for feature selection: a multivariate filter approach
    Esfandiari, Ahmad
    Khaloozadeh, Hamid
    Farivar, Faezeh
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (05) : 1769 - 1782
  • [9] A Novel Crowding Clustering Algorithm for Unsupervised and Supervised Filter Feature Selection Problem
    Ghanem, Khadoudja
    Layeb, Abdesslem
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024,
  • [10] Improved Filter-Based Feature Selection Using Correlation and Clustering Techniques
    Atmakuru, Akhila
    Di Fatta, Giuseppe
    Nicosia, Giuseppe
    Badii, Atta
    MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, LOD 2023, PT I, 2024, 14505 : 379 - 389