Discriminative variable selection for clustering with the sparse Fisher-EM algorithm

被引:18
作者
Bouveyron, Charles [1 ]
Brunet-Saumard, Camille [2 ]
机构
[1] Univ Paris 01, EA 4543, Lab SAMM, F-75231 Paris 05, France
[2] Univ Angers, UMR CNRS 6093, Lab LAREMA, Angers, France
关键词
Model-based clustering; Variable selection; Discriminative subspace; Fisher-EM algorithm; l(1)-Type penalizations; HIGH-DIMENSIONAL DATA; FRAMEWORK; MIXTURES;
D O I
10.1007/s00180-013-0433-6
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The interest in variable selection for clustering has increased recently due to the growing need in clustering high-dimensional data. Variable selection allows in particular to ease both the clustering and the interpretation of the results. Existing approaches have demonstrated the importance of variable selection for clustering but turn out to be either very time consuming or not sparse enough in high-dimensional spaces. This work proposes to perform a selection of the discriminative variables by introducing sparsity in the loading matrix of the Fisher-EM algorithm. This clustering method has been recently proposed for the simultaneous visualization and clustering of high-dimensional data. It is based on a latent mixture model which fits the data into a low-dimensional discriminative subspace. Three different approaches are proposed in this work to introduce sparsity in the orientation matrix of the discriminative subspace through -type penalizations. Experimental comparisons with existing approaches on simulated and real-world data sets demonstrate the interest of the proposed methodology. An application to the segmentation of hyperspectral images of the planet Mars is also presented.
引用
收藏
页码:489 / 513
页数:25
相关论文
共 50 条
  • [41] Quantile function regression and variable selection for sparse models
    Yoshida, Takuma
    [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2021, 49 (04): : 1196 - 1221
  • [42] Learning sparse gradients for variable selection and dimension reduction
    Gui-Bo Ye
    Xiaohui Xie
    [J]. Machine Learning, 2012, 87 : 303 - 355
  • [43] SPARSE REGULARIZATION FOR BI-LEVEL VARIABLE SELECTION
    Matsui, Hidetoshi
    [J]. JOURNAL JAPANESE SOCIETY OF COMPUTATIONAL STATISTICS, 2015, 28 (01): : 83 - 103
  • [44] Bayesian variable selection for globally sparse probabilistic PCA
    Bouveyron, Charles
    Latouche, Pierre
    Mattei, Pierre-Alexandre
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2018, 12 (02): : 3036 - 3070
  • [45] Learning sparse gradients for variable selection and dimension reduction
    Ye, Gui-Bo
    Xie, Xiaohui
    [J]. MACHINE LEARNING, 2012, 87 (03) : 303 - 355
  • [46] Robust EM algorithm for model-based curve clustering
    Chamroukhi, Faicel
    [J]. 2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [47] A Flexible EM-Like Clustering Algorithm for Noisy Data
    Roizman, Violeta
    Jonckheere, Matthieu
    Pascal, Frederic
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 2709 - 2721
  • [48] Clustering of longitudinal curves via a penalized method and EM algorithm
    Wang, Xin
    [J]. COMPUTATIONAL STATISTICS, 2024, 39 (03) : 1485 - 1512
  • [49] Pairwise Variable Selection for High-Dimensional Model-Based Clustering
    Guo, Jian
    Levina, Elizaveta
    Michailidis, George
    Zhu, Ji
    [J]. BIOMETRICS, 2010, 66 (03) : 793 - 804
  • [50] Variable selection in model-based clustering using multilocus genotype data
    Toussile W.
    Gassiat E.
    [J]. Advances in Data Analysis and Classification, 2009, 3 (2) : 109 - 134