A Framework for Feature Selection in Clustering

被引:436
作者
Witten, Daniela M. [1 ]
Tibshirani, Robert [1 ,2 ]
机构
[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Hlth Res & Policy, Stanford, CA 94305 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
Hierarchical clustering; High-dimensional; K-means clustering; Lasso; Model selection; Sparsity; Unsupervised learning; VARIABLE SELECTION; PRINCIPAL-COMPONENTS; OBJECTS; NUMBER;
D O I
10.1198/jasa.2010.tm09415
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the problem of clustering observations using a potentially large set of features. One might expect that the true underlying clusters present in the data differ only with respect to a small fraction of the features, and will be missed if one clusters the observations using the full set of features. We propose a novel framework for sparse clustering, in which one clusters the observations using an adaptively chosen subset of the features. The method uses a lasso-type penalty to select the features. We use this framework to develop simple methods for sparse K-means and sparse hierarchical clustering. A single criterion governs both the selection of the features and the resulting clusters. These approaches are demonstrated on simulated and genomic data.
引用
收藏
页码:713 / 726
页数:14
相关论文
共 50 条
[31]   A combined feature selection method based on clustering in intrusion detection [J].
Huang, Ting ;
Chen, Wenbo ;
Zhang, Ruisheng .
PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL CONTROL AND COMPUTATIONAL ENGINEERING (AMCCE 2017), 2017, 118 :65-73
[32]   Feature evaluation and selection based on an entropy measure with data clustering [J].
Chi, ZR ;
Yan, H .
OPTICAL ENGINEERING, 1995, 34 (12) :3514-3519
[33]   IMPROVING IMAGE CLUSTERING: AN UNSUPERVISED FEATURE WEIGHT LEARNING FRAMEWORK [J].
Bai, Xinxin ;
Chen, Gang ;
Lin, Zhonglin ;
Yin, Wenjun ;
Dong, Jin .
2008 15TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-5, 2008, :977-980
[34]   FEAST: An Automated Feature Selection Framework for Compilation Tasks [J].
Ting, Pai-Shun ;
Tu, Chun-Chen ;
Chen, Pin-Yu ;
Lo, Ya-Yun ;
Cheng, Shin-Ming .
2017 IEEE 31ST INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2017, :1138-1145
[35]   A novel framework for online supervised learning with feature selection [J].
Sun, Lizhe ;
Wang, Mingyuan ;
Zhu, Siquan ;
Barbu, Adrian .
JOURNAL OF NONPARAMETRIC STATISTICS, 2024,
[36]   Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selection [J].
Ohl, Louis ;
Mattei, Pierre-Alexandre ;
Bouveyron, Charles ;
Leclercq, Mickael ;
Droit, Arnaud ;
Precioso, Frederic .
STATISTICS AND COMPUTING, 2024, 34 (05)
[37]   GOLFS: feature selection via combining both global and local information for high dimensional clustering [J].
Xing, Zhaoyu ;
Wan, Yang ;
Wen, Juan ;
Zhong, Wei .
COMPUTATIONAL STATISTICS, 2024, 39 (05) :2651-2675
[38]   A DCA Based Algorithm for Feature Selection in Model-Based Clustering [J].
Viet Anh Nguyen ;
Hoai An Le Thi ;
Hoai Minh Le .
INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT I, 2020, 12033 :404-415
[39]   The Important Role of Feature Selection when Clustering Load and Generation Scenarios [J].
Kile, Hakon ;
Uhlen, Kjetil ;
Kjolle, Gerd .
2013 IEEE PES ASIA-PACIFIC POWER AND ENERGY ENGINEERING CONFERENCE (APPEEC), 2013,
[40]   A new feature subset selection using bottom-up clustering [J].
Dehghan, Zeinab ;
Mansoori, Eghbal G. .
PATTERN ANALYSIS AND APPLICATIONS, 2018, 21 (01) :57-66