Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering

被引：3

作者：

Casa, Alessandro ^{[1
]}

Cappozzo, Andrea ^{[2
]}

Fop, Michael ^{[3
]}

机构：

[1] Free Univ Bozen Bolzano, Fac Econ & Management, Piazza Univ 1, I-39100 Bolzano, Italy

[2] Politecn Milan, MOX Lab Modeling & Sci Comp, Milan, Italy

[3] Univ Coll Dublin, Sch Math & Stat, Dublin, Ireland

来源：

JOURNAL OF CLASSIFICATION | 2022年 / 39卷 / 03期

关键词：

Model-based clustering; Penalized likelihood; Sparse precision matrices; Gaussian graphical models; Graphical lasso; EM algorithm; HIGH-DIMENSIONAL DATA; INVERSE COVARIANCE ESTIMATION; MAXIMUM-LIKELIHOOD-ESTIMATION; VARIABLE SELECTION; GRAPHICAL LASSO; ADAPTIVE LASSO; MATRICES;

D O I：

10.1007/s00357-022-09421-z

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

Finite Gaussian mixture models provide a powerful and widely employed probabilistic approach for clustering multivariate continuous data. However, the practical usefulness of these models is jeopardized in high-dimensional spaces, where they tend to be over-parameterized. As a consequence, different solutions have been proposed, often relying on matrix decompositions or variable selection strategies. Recently, a methodological link between Gaussian graphical models and finite mixtures has been established, paving the way for penalized model-based clustering in the presence of large precision matrices. Notwithstanding, current methodologies implicitly assume similar levels of sparsity across the classes, not accounting for different degrees of association between the variables across groups. We overcome this limitation by deriving group-wise penalty factors, which automatically enforce under or over-connectivity in the estimated graphs. The approach is entirely data-driven and does not require additional hyper-parameter specification. Analyses on synthetic and real data showcase the validity of our proposal.

引用

页码：648 / 674

页数：27

共 71 条

[1]

[Anonymous], 1983, FOOD RES DAT AN P IU

[2]

Banerjee O, 2008, J MACH LEARN RES, V9, P485

[3] MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].

BANFIELD, JD ;

RAFTERY, AE .

BIOMETRICS, 1993, 49 (03) :803-821

[4] Dirichlet-Laplace Priors for Optimal Shrinkage [J].

Bhattacharya, Anirban ;

Pati, Debdeep ;

Pillai, Natesh S. ;

Dunson, David B. .

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2015, 110 (512) :1479-1490

[5] COVARIANCE REGULARIZATION BY THRESHOLDING [J].

Bickel, Peter J. ;

Levina, Elizaveta .

ANNALS OF STATISTICS, 2008, 36 (06) :2577-2604

[6] Sparse estimation of a covariance matrix [J].

Bien, Jacob ;

Tibshirani, Robert J. .

BIOMETRIKA, 2011, 98 (04) :807-820

[7] Stable and visualizable Gaussian parsimonious clustering models [J].

Biernacki, Christophe ;

Lourme, Alexandre .

STATISTICS AND COMPUTING, 2014, 24 (06) :953-969

[8] High-dimensional data clustering [J].

Bouveyron, C. ;

Girard, S. ;

Schmid, C. .

COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) :502-519

[9]

Bouveyron C, 2019, CA ST PR MA, V50, P1

[10] Model-based clustering of high-dimensional data: A review [J].

Bouveyron, Charles ;

Brunet-Saumard, Camille .

COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 :52-78

← 1 2 3 4 5 6 7 8 →