A Survey on Model-Based Co-Clustering: High Dimension and Estimation Challenges

被引:4
作者
Biernacki, C. [1 ]
Jacques, J. [2 ]
Keribin, C. [3 ]
机构
[1] Univ Lille, Inria, CNRS, Lab Math Painleve, F-59650 Villeneuve Dascq, France
[2] Univ Lyon, Lyon 2, ERIC UR 3083, 5 Ave Pierre Mendes France, F-69676 Bron, France
[3] Univ Paris Saclay, CNRS, Inria, Lab Math Orsay, F-91405 Orsay, France
关键词
High-dimension clustering; Mixture models; EM-like algorithms; Model selection; Mixed data types; MAXIMUM-LIKELIHOOD-ESTIMATION; UNIVARIATE GAUSSIAN MIXTURES; LATENT BLOCK MODEL; VARIABLE SELECTION; ASYMPTOTIC NORMALITY; EM ALGORITHM; CONSISTENCY; DEGENERACY; DENSITY;
D O I
10.1007/s00357-023-09441-3
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Model-based co-clustering can be seen as a particularly important extension of model-based clustering. It allows for a significant reduction of both the number of rows (individuals) and columns (variables) of a data set in a parsimonious manner, and also allows interpretability of the resulting reduced data set since the meaning of the initial individuals and features is preserved. Moreover, it benefits from the rich statistical theory for both estimation and model selection. Many works have produced new advances on this topic in recent years, and this paper offers a general update of the related literature. In addition, we advocate two main messages, supported by specific research material: (1) co-clustering requires further research to fix some well-identified estimation issues, and (2) co-clustering is one of the most promising approaches for clustering in the (very) high-dimensional setting, which corresponds to the global trend in modern data sets.
引用
收藏
页码:332 / 381
页数:50
相关论文
共 129 条
[1]   Community detection and stochastic block models: Recent developments [J].
Abbe, Emmanuel .
Journal of Machine Learning Research, 2018, 18 :1-86
[2]   Sparse Poisson Latent Block Model for Document Clustering [J].
Ailem, Melissa ;
Role, Francois ;
Nadif, Mohamed .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (07) :1563-1576
[3]  
Ambroise C, 2012, J ROY STAT SOC B, V74, P3, DOI 10.1111/j.1467-9868.2011.01009.x
[4]  
[Anonymous], 2013, COCLUSTERING
[5]  
[Anonymous], 1979, Anal. des donnees es et Inform.
[6]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[7]   Estimation and model selection for model-based clustering with the conditional classification likelihood [J].
Baudry, Jean-Patrick .
ELECTRONIC JOURNAL OF STATISTICS, 2015, 9 (01) :1041-1077
[8]   DYNAMIC PROGRAMMING [J].
BELLMAN, R .
SCIENCE, 1966, 153 (3731) :34-&
[9]   The latent topic block model for the co-clustering of textual interaction data [J].
Berge, Laurent R. ;
Bouveyron, Charles ;
Corneli, Marco ;
Latouche, Pierre .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2019, 137 :247-270
[10]   blockcluster: An R Package for Model-Based Co-Clustering [J].
Bhatia, Parmeet Singh ;
Iovleff, Serge ;
Govaert, Gerard .
JOURNAL OF STATISTICAL SOFTWARE, 2017, 76 (09) :1-24