Model-based co-clustering for mixed type data

被引:16
|
作者
Selosse, Margot [1 ,2 ]
Jacques, Julien [1 ,2 ]
Biernacki, Christophe [3 ,4 ]
机构
[1] Lab ERIC, 5 Ave Pierre Mendes France, F-69500 Bron, France
[2] Univ Lumiere Lyon 2, 86 Rue Pasteur, F-69007 Lyon, France
[3] Univ Lille, UFR Math, Cite Sci, F-59655 Villeneuve Dascq, France
[4] INRIA, 40 Av Halley,Bat A,Pk Plaza, F-59650 Villeneuve Dascq, France
关键词
Co-clustering; Mixed-type data; Latent block model;
D O I
10.1016/j.csda.2019.106866
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The importance of clustering for creating groups of observations is well known. The emergence of high-dimensional data sets with a huge number of features leads to co-clustering techniques, and several methods have been developed for simultaneously producing groups of observations and features. By grouping the data set into blocks (the crossing of a row-cluster and a column-cluster), these techniques can sometimes better summarize the data set and its inherent structure. The Latent Block Model (LBM) is a well-known method for performing co-clustering. However, recently, contexts with features of different types (here called mixed type data sets) are becoming more common. The LBM is not directly applicable to this kind of data set. Here a natural extension of the usual LBM to the "Multiple Latent Block Model" (MLBM) is proposed in order to handle mixed type data sets. Inference is performed using a Stochastic EM algorithm that embeds a Gibbs sampler, and allows for missing data situations. A model selection criterion is defined to choose the number of row and column clusters. The method is then applied to both simulated and real data sets. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Model-based co-clustering for ordinal data
    Jacques, Julien
    Biernacki, Christophe
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2018, 123 : 101 - 115
  • [2] Model-based co-clustering for functional data
    Ben Slimen, Yosra
    Allio, Sylvain
    Jacques, Julien
    NEUROCOMPUTING, 2018, 291 : 97 - 108
  • [3] Co-clustering contaminated data: a robust model-based approach
    Fibbi, Edoardo
    Perrotta, Domenico
    Torti, Francesca
    Van Aelst, Stefan
    Verdonck, Tim
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2024, 18 (01) : 121 - 161
  • [4] Co-clustering contaminated data: a robust model-based approach
    Edoardo Fibbi
    Domenico Perrotta
    Francesca Torti
    Stefan Van Aelst
    Tim Verdonck
    Advances in Data Analysis and Classification, 2024, 18 : 121 - 161
  • [5] Model-based co-clustering for the effective handling of sparse data
    Ailem, Melissa
    Role, Francois
    Nadif, Mohamed
    PATTERN RECOGNITION, 2017, 72 : 108 - 122
  • [6] Model-based Co-clustering for High Dimensional Sparse Data
    Salah, Aghiles
    Rogovschi, Nicoleta
    Nadif, Mohamed
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 866 - 874
  • [7] Model-based Poisson co-clustering for Attributed Networks
    Riverain, Paul
    Fossier, Simon
    Nadif, Mohamed
    21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS ICDMW 2021, 2021, : 703 - 710
  • [8] A Hierarchical Model-based Approach to Co-Clustering High-Dimensional Data
    Costa, Gianni
    Manco, Giuseppe
    Ortale, Riccardo
    APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 886 - 890
  • [9] blockcluster: An R Package for Model-Based Co-Clustering
    Bhatia, Parmeet Singh
    Iovleff, Serge
    Govaert, Gerard
    JOURNAL OF STATISTICAL SOFTWARE, 2017, 76 (09): : 1 - 24
  • [10] A Survey on Model-Based Co-Clustering: High Dimension and Estimation Challenges
    Biernacki, C.
    Jacques, J.
    Keribin, C.
    JOURNAL OF CLASSIFICATION, 2023, 40 (02) : 332 - 381