Discriminative variable selection for clustering with the sparse Fisher-EM algorithm

被引:18
作者
Bouveyron, Charles [1 ]
Brunet-Saumard, Camille [2 ]
机构
[1] Univ Paris 01, EA 4543, Lab SAMM, F-75231 Paris 05, France
[2] Univ Angers, UMR CNRS 6093, Lab LAREMA, Angers, France
关键词
Model-based clustering; Variable selection; Discriminative subspace; Fisher-EM algorithm; l(1)-Type penalizations; HIGH-DIMENSIONAL DATA; FRAMEWORK; MIXTURES;
D O I
10.1007/s00180-013-0433-6
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The interest in variable selection for clustering has increased recently due to the growing need in clustering high-dimensional data. Variable selection allows in particular to ease both the clustering and the interpretation of the results. Existing approaches have demonstrated the importance of variable selection for clustering but turn out to be either very time consuming or not sparse enough in high-dimensional spaces. This work proposes to perform a selection of the discriminative variables by introducing sparsity in the loading matrix of the Fisher-EM algorithm. This clustering method has been recently proposed for the simultaneous visualization and clustering of high-dimensional data. It is based on a latent mixture model which fits the data into a low-dimensional discriminative subspace. Three different approaches are proposed in this work to introduce sparsity in the orientation matrix of the discriminative subspace through -type penalizations. Experimental comparisons with existing approaches on simulated and real-world data sets demonstrate the interest of the proposed methodology. An application to the segmentation of hyperspectral images of the planet Mars is also presented.
引用
收藏
页码:489 / 513
页数:25
相关论文
共 50 条
  • [31] Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering
    Celeux, Gilles
    Martin-Magniette, Marie-Laure
    Maugis-Rabusseau, Cathy
    Raftery, Adrian E.
    JOURNAL OF THE SFDS, 2014, 155 (02): : 57 - 71
  • [32] Variable selection in multivariate calibration based on clustering of variable concept
    Farrokhnia, Maryam
    Karimi, Sadegh
    ANALYTICA CHIMICA ACTA, 2016, 902 : 70 - 81
  • [33] Clustering and variable selection for categorical multivariate data
    Bontemps, Dominique
    Toussile, Wilson
    ELECTRONIC JOURNAL OF STATISTICS, 2013, 7 : 2344 - 2371
  • [34] Variable selection for model-based high-dimensional clustering
    Wang, Sijian
    Zhu, Ji
    PREDICTION AND DISCOVERY, 2007, 443 : 177 - +
  • [35] Variable Selection for Mixed Data Clustering: Application in Human Population Genomics
    Matthieu Marbac
    Mohammed Sedki
    Tienne Patin
    Journal of Classification, 2020, 37 : 124 - 142
  • [36] Variable Selection for Mixed Data Clustering: Application in Human Population Genomics
    Marbac, Matthieu
    Sedki, Mohammed
    Patin, Tienne
    JOURNAL OF CLASSIFICATION, 2020, 37 (01) : 124 - 142
  • [37] A simple model-based approach to variable selection in classification and clustering
    Partovi Nia, Vahid
    Davison, Anthony C.
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2015, 43 (02): : 157 - 175
  • [38] A sparse fuzzy c-means algorithm based on sparse clustering framework
    Qiu, Xianen
    Qiu, Yanyi
    Feng, Guocan
    Li, Peixing
    NEUROCOMPUTING, 2015, 157 : 290 - 295
  • [39] Leveraging pleiotropic association using sparse group variable selection in genomics data
    Sutton, Matthew
    Sugier, Pierre-Emmanuel
    Truong, Therese
    Liquet, Benoit
    BMC MEDICAL RESEARCH METHODOLOGY, 2022, 22 (01)
  • [40] Leveraging pleiotropic association using sparse group variable selection in genomics data
    Matthew Sutton
    Pierre-Emmanuel Sugier
    Therese Truong
    Benoit Liquet
    BMC Medical Research Methodology, 22