A Model-Based Clustering Approach for Bounded Data Using Transformation-Based Gaussian Mixture Models

被引:0
作者
Scrucca, Luca [1 ]
机构
[1] Univ Bologna, Dept Stat Sci, Via Belle Arti 41, I-40126 Bologna, Italy
关键词
Model-based clustering; Bounded data; Gaussian mixture models; Data transformation; Expectation-Maximization algorithm; Clustering uncertainty; Normalized classification entropy; GAMMA-DISTRIBUTIONS; MAXIMUM-LIKELIHOOD; NUMBER; POPULATION; ALGORITHM; CRITERION; FAMILY;
D O I
10.1007/s00357-025-09511-8
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The clustering of bounded data presents unique challenges in statistical analysis due to the constraints imposed on the data values. This paper introduces a novel method for model-based clustering specifically designed for bounded data. Building on the transformation-based approach to Gaussian mixture density estimation introduced by Scrucca (Biometrical Journal,61(4), 873-888, 2019), we extend this framework to develop a probabilistic clustering algorithm for data with bounded support that allows for accurate clustering while respecting the natural bounds of the variables. In our proposal, a flexible range-power transformation is employed to map the data from its bounded domain to the unrestricted real space, hence enabling the estimation of Gaussian mixture models in the transformed space. Despite the close connection to density estimation, the behavior of this approach has not been previously investigated in the literature. Furthermore, we introduce a novel measure of clustering uncertainty, the normalized classification entropy (NCE), which provides a general and interpretable measure of classification uncertainty. The performance of the proposed method is evaluated through real-world data applications involving both fully and partially bounded data, in both univariate and multivariate settings, showing improved cluster recovery and interpretability. Overall, the empirical results demonstrate the effectiveness and advantages of our approach over traditional and advanced model-based clustering techniques that rely on distributions with bounded support.
引用
收藏
页数:19
相关论文
共 48 条
[1]  
Bagnato L, 2013, COMPUTATION STAT, V28, P1571, DOI 10.1007/s00180-012-0367-4
[2]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[3]   A POPULATION AND FAMILY STUDY OF N-ACETYLTRANSFERASE USING CAFFEINE URINARY METABOLITES [J].
BECHTEL, YC ;
BONAITIPELLIE, C ;
POISSON, N ;
MAGNETTE, J ;
BECHTEL, PR .
CLINICAL PHARMACOLOGY & THERAPEUTICS, 1993, 54 (02) :134-141
[4]   Assessing a mixture model for clustering with the integrated completed likelihood [J].
Biernacki, C ;
Celeux, G ;
Govaert, G .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (07) :719-725
[5]   An improvement of the NEC criterion for assessing the number of clusters in a mixture model [J].
Biernacki, C ;
Celeux, G ;
Govaert, G .
PATTERN RECOGNITION LETTERS, 1999, 20 (03) :267-272
[6]  
Bouveyron C, 2019, CA ST PR MA, V50, P1, DOI 10.1017/9781108644181
[7]   A mixture of generalized hyperbolic distributions [J].
Browne, Ryan P. ;
McNicholas, Paul D. .
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2015, 43 (02) :176-198
[8]   A LIMITED MEMORY ALGORITHM FOR BOUND CONSTRAINED OPTIMIZATION [J].
BYRD, RH ;
LU, PH ;
NOCEDAL, J ;
ZHU, CY .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1995, 16 (05) :1190-1208
[9]  
Cardoso Margarida GMS, 2014, UCI Machine Learning Repository
[10]   An entropy criterion for assessing the number of clusters in a mixture model [J].
Celeux, G ;
Soromenho, G .
JOURNAL OF CLASSIFICATION, 1996, 13 (02) :195-212