Model-Based Clustering with Nested Gaussian Clusters

被引:0
作者
Hou-Liu, Jason [1 ]
Browne, Ryan P. [1 ]
机构
[1] 200 Univ Ave West, Waterloo, ON, Canada
关键词
Intercluster structure; Model-based clustering; Hierarchical structure; Gaussian mixture model; MAXIMUM-LIKELIHOOD; IDENTIFIABILITY; MULTIVARIATE; RECOGNITION;
D O I
10.1007/s00357-023-09453-z
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
A dataset may exhibit multiple class labels for each observation; sometimes, these class labels manifest in a hierarchical structure. A textbook analogy would be that a book can be labelled as statistics as well as the encompassing label of non-fiction. To capture this behaviour in a model-based clustering context, we describe a model formulation and estimation procedure for performing clustering with nested Gaussian clusters in orthogonal intrinsic variable subspaces. We elucidate a two-stage clustering model, whereby the observed manifest variables are assumed to be a rotation of intrinsic primary and secondary clustering subspaces with additional noise subspaces. In a hierarchical sense, secondary clusters are presumed to be subclusters of primary clusters and so share Gaussian cluster parameters in the primary cluster subspace. An estimation procedure using the expectation-maximization algorithm is provided, with model selection via Bayesian information criterion. Real-world datasets are evaluated under the proposed model.
引用
收藏
页码:39 / 64
页数:26
相关论文
共 27 条
[1]  
[Anonymous], 1983, Classification of olive oils from their fatty acid composition
[2]   Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models [J].
Biernacki, C ;
Celeux, G ;
Govaert, G .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2003, 41 (3-4) :561-575
[3]   Simultaneous model-based clustering and visualization in the Fisher discriminative subspace [J].
Bouveyron, Charles ;
Brunet, Camille .
STATISTICS AND COMPUTING, 2012, 22 (01) :301-324
[4]   Estimating common principal components in high dimensions [J].
Browne, Ryan P. ;
McNicholas, Paul D. .
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2014, 8 (02) :217-226
[5]   MULTIVARIATE STUDY OF VARIATION IN 2 SPECIES OF ROCK CRAB OF GENUS LEPTOGRAPSUS [J].
CAMPBELL, NA ;
MAHON, RJ .
AUSTRALIAN JOURNAL OF ZOOLOGY, 1974, 22 (03) :417-425
[6]   GAUSSIAN PARSIMONIOUS CLUSTERING MODELS [J].
CELEUX, G ;
GOVAERT, G .
PATTERN RECOGNITION, 1995, 28 (05) :781-793
[7]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[8]  
Dua D., 2017, UCI MACHINE LEARNING
[9]   Model-based methods to identify multiple cluster structures in a data set [J].
Galimberti, Giuliano ;
Soffritti, Gabriele .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) :520-536
[10]   Modelling the role of variables in model-based cluster analysis [J].
Galimberti, Giuliano ;
Manisi, Annamaria ;
Soffritti, Gabriele .
STATISTICS AND COMPUTING, 2018, 28 (01) :145-169