A Framework for Feature Selection in Clustering

被引:424
作者
Witten, Daniela M. [1 ]
Tibshirani, Robert [1 ,2 ]
机构
[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Hlth Res & Policy, Stanford, CA 94305 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
Hierarchical clustering; High-dimensional; K-means clustering; Lasso; Model selection; Sparsity; Unsupervised learning; VARIABLE SELECTION; PRINCIPAL-COMPONENTS; OBJECTS; NUMBER;
D O I
10.1198/jasa.2010.tm09415
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the problem of clustering observations using a potentially large set of features. One might expect that the true underlying clusters present in the data differ only with respect to a small fraction of the features, and will be missed if one clusters the observations using the full set of features. We propose a novel framework for sparse clustering, in which one clusters the observations using an adaptively chosen subset of the features. The method uses a lasso-type penalty to select the features. We use this framework to develop simple methods for sparse K-means and sparse hierarchical clustering. A single criterion governs both the selection of the features and the resulting clusters. These approaches are demonstrated on simulated and genomic data.
引用
收藏
页码:713 / 726
页数:14
相关论文
共 50 条
  • [21] Simultaneous supervised clustering and feature selection over a graph
    Shen, Xiaotong
    Huang, Hsin-Cheng
    Pan, Wei
    BIOMETRIKA, 2012, 99 (04) : 899 - 914
  • [22] Fault Line Selection Based on Feature Fusion and Clustering
    Hu, Linjing
    Hu, Wenchen
    Chen, Hongyu
    2024 4TH POWER SYSTEM AND GREEN ENERGY CONFERENCE, PSGEC 2024, 2024, : 148 - 152
  • [23] Deep Spectral Clustering With Projected Adaptive Feature Selection
    Zhao, Yang
    Bi, Zixuan
    Zhu, Peican
    Yuan, Aihong
    Li, Xuelong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [24] Simultaneous feature selection and clustering using mixture models
    Law, MHC
    Figueiredo, MAT
    Jain, AK
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (09) : 1154 - 1166
  • [25] Exploiting Feature Relationships Towards Stable Feature Selection
    Kamkar, Iman
    Gupta, Sunil Kumar
    Dinh Phung
    Venkatesh, Svetha
    PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (IEEE DSAA 2015), 2015, : 727 - 736
  • [26] Deterministic Feature Selection for k-Means Clustering
    Boutsidis, Christos
    Magdon-Ismail, Malik
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (09) : 6099 - 6110
  • [27] Unsupervised Hybrid Feature Extraction Selection for High-Dimensional Non-Gaussian Data Clustering with Variational Inference
    Fan, Wentao
    Bouguila, Nizar
    Ziou, Djemel
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (07) : 1670 - 1685
  • [28] A Bayesian mixture model for clustering and selection of feature occurrence rates under mean constraints
    Li, Qiwei
    Guindani, Michele
    Reich, Brian J.
    Bondell, Howard D.
    Vannucci, Marina
    STATISTICAL ANALYSIS AND DATA MINING, 2017, 10 (06) : 393 - 409
  • [29] Feature evaluation and selection based on an entropy measure with data clustering
    Chi, ZR
    Yan, H
    OPTICAL ENGINEERING, 1995, 34 (12) : 3514 - 3519
  • [30] K-means Clustering with Feature Selection for Stream Data
    Wang, Xiao-dong
    Chen, Rung-Ching
    Yan, Fei
    Hendry
    2018 INTERNATIONAL SYMPOSIUM ON COMPUTER, CONSUMER AND CONTROL (IS3C 2018), 2018, : 453 - 456