A Framework for Feature Selection in Clustering

被引:436
作者
Witten, Daniela M. [1 ]
Tibshirani, Robert [1 ,2 ]
机构
[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Hlth Res & Policy, Stanford, CA 94305 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
Hierarchical clustering; High-dimensional; K-means clustering; Lasso; Model selection; Sparsity; Unsupervised learning; VARIABLE SELECTION; PRINCIPAL-COMPONENTS; OBJECTS; NUMBER;
D O I
10.1198/jasa.2010.tm09415
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the problem of clustering observations using a potentially large set of features. One might expect that the true underlying clusters present in the data differ only with respect to a small fraction of the features, and will be missed if one clusters the observations using the full set of features. We propose a novel framework for sparse clustering, in which one clusters the observations using an adaptively chosen subset of the features. The method uses a lasso-type penalty to select the features. We use this framework to develop simple methods for sparse K-means and sparse hierarchical clustering. A single criterion governs both the selection of the features and the resulting clusters. These approaches are demonstrated on simulated and genomic data.
引用
收藏
页码:713 / 726
页数:14
相关论文
共 50 条
[41]   Unsupervised Feature Selection for Proportional Data Clustering via Expectation Propagation [J].
Fan, Wentao ;
Bouguila, Nizar .
2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
[42]   Feature Selection and Kernel Learning for Local Learning-Based Clustering [J].
Zeng, Hong ;
Cheung, Yiu-Ming .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (08) :1532-1547
[43]   Mixed integer linear programming and heuristic methods for feature selection in clustering [J].
Benati, Stefano ;
Garcia, Sergio ;
Puerto, Justo .
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 2018, 69 (09) :1379-1395
[44]   Clustering Enabled Classification using Ensemble Feature Selection for Intrusion Detection [J].
Salo, Fadi ;
Injadat, MohammadNoor ;
Moubayed, Abdallah ;
Nassif, Ali Bou ;
Essex, Aleksander .
2019 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS (ICNC), 2019, :276-281
[45]   A new feature subset selection using bottom-up clustering [J].
Zeinab Dehghan ;
Eghbal G. Mansoori .
Pattern Analysis and Applications, 2018, 21 :57-66
[46]   Adaptive clustering and feature selection for categorical time series using interpretable frequency-domain features [J].
Bruce, Scott A. .
STATISTICS AND ITS INTERFACE, 2023, 16 (02) :319-335
[47]   Deployment of a regularized feature selection framework on an overlay desktop grid [J].
Barla, Annalisa ;
Ferrante, Marco .
2009 INTERNATIONAL WORKSHOP ON HIGH PERFORMANCE COMPUTATIONAL SYSTEMS BIOLOGY, PROCEEDINGS, 2009, :103-104
[48]   Framework for feature selection of predicting the diagnosis and prognosis of necrotizing enterocolitis [J].
Song, Jianfei ;
Li, Zhenyu ;
Yao, Guijin ;
Wei, Songping ;
Li, Ling ;
Wu, Hui .
PLOS ONE, 2022, 17 (08)
[49]   Multi-stage convex relaxation for feature selection [J].
Zhang, Tong .
BERNOULLI, 2013, 19 (5B) :2277-2293
[50]   Robust simultaneous positive data clustering and unsupervised feature selection using generalized inverted Dirichlet mixture models [J].
Al Mashrgy, Mohamed ;
Bdiri, Taoufik ;
Bouguila, Nizar .
KNOWLEDGE-BASED SYSTEMS, 2014, 59 :182-195