Clustering Mixed-Type Data via Dirichlet Process Mixture Model with Cluster-Specific Covariance Matrices

被引：0

作者：

Burhanuddin, Nurul Afiqah ^{[1
,2
]}

Ibrahim, Kamarulzaman ^{[1
]}

Zulkafli, Hani Syahida ^{[3
]}

Mustapha, Norwati ^{[4
]}

机构：

[1] Univ Kebangsaan Malaysia, Fac Sci & Technol, Dept Math Sci, Bangi 43600, Selangor, Malaysia

[2] Univ Putra Malaysia, Inst Math Res, Serdang 43400, Selangor, Malaysia

[3] Univ Putra Malaysia, Fac Sci, Dept Math & Stat, Serdang 43400, Selangor, Malaysia

[4] Univ Putra Malaysia, Fac Comp Sci & Informat Technol, Dept Comp Sci, Serdang 43400, Selangor, Malaysia

来源：

SYMMETRY-BASEL | 2024年 / 16卷 / 06期

关键词：

Dirichlet process mixture model; Bayesian nonparametric; model-based clustering; mixed-type data; latent variables; BAYESIAN-ANALYSIS; INFERENCE; VARIABLES; BINARY;

D O I：

10.3390/sym16060712

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Many studies have shown successful applications of the Dirichlet process mixture model (DPMM) for clustering continuous data. Beyond continuous data, in practice, one can expect to see different data types, including ordinal and nominal data. Existing DPMMs for clustering mixed-type data assume a strict covariance matrix structure, resulting in an overfit model. This article explores a DPMM for mixed-type data that allows the covariance matrix to differ from one cluster to another. We assume an underlying latent variable framework for ordinal and nominal data, which is then modeled jointly with the continuous data. The identifiability issue on the covariance matrix poses computational challenges, thus requiring a nonstandard inferential algorithm. The applicability and flexibility of the proposed model are illustrated through simulation examples and real data applications.

引用

页数：19

共 7 条

[1] Clustering bivariate mixed-type data via the cluster-weighted model
Punzo, Antonio
Ingrassia, Salvatore
COMPUTATIONAL STATISTICS, 2016, 31 (03) : 989 - 1013
[2] Clustering bivariate mixed-type data via the cluster-weighted model
Antonio Punzo
Salvatore Ingrassia
Computational Statistics, 2016, 31 : 989 - 1013
[3] A Spatial Dirichlet Process Mixture Model for Clustering Population Genetics Data
Reich, Brian J.
Bondell, Howard D.
BIOMETRICS, 2011, 67 (02) : 381 - 390
[4] Mixtures of general location model with factor analyzer covariance structure for clustering mixed type data
Amiri, Leila
Khazaei, Mojtaba
Ganjali, Mojtaba
JOURNAL OF APPLIED STATISTICS, 2019, 46 (11) : 2075 - 2100
[5] Composite likelihood methods for parsimonious model-based clustering of mixed-type data
Ranalli, Monia
Rocci, Roberto
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2024, 18 (02) : 381 - 407
[6] Simultaneous inference for multiple testing and clustering via a Dirichlet, process mixture model
Dahl, David B.
Mo, Qianxing
Vannucci, Marina
STATISTICAL MODELLING, 2008, 8 (01) : 23 - 39
[7] The complexity of financial wellness: examining survey patterns via kernel metric learning and clustering of mixed-type data
Ghashti, Jesse S.
Thompson, John R. J.
PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, ICAIF 2023, 2023, : 314 - 322

← 1 →