Clustering Mixed-Type Data via Dirichlet Process Mixture Model with Cluster-Specific Covariance Matrices

被引:0
|
作者
Burhanuddin, Nurul Afiqah [1 ,2 ]
Ibrahim, Kamarulzaman [1 ]
Zulkafli, Hani Syahida [3 ]
Mustapha, Norwati [4 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Sci & Technol, Dept Math Sci, Bangi 43600, Selangor, Malaysia
[2] Univ Putra Malaysia, Inst Math Res, Serdang 43400, Selangor, Malaysia
[3] Univ Putra Malaysia, Fac Sci, Dept Math & Stat, Serdang 43400, Selangor, Malaysia
[4] Univ Putra Malaysia, Fac Comp Sci & Informat Technol, Dept Comp Sci, Serdang 43400, Selangor, Malaysia
来源
SYMMETRY-BASEL | 2024年 / 16卷 / 06期
关键词
Dirichlet process mixture model; Bayesian nonparametric; model-based clustering; mixed-type data; latent variables; BAYESIAN-ANALYSIS; INFERENCE; VARIABLES; BINARY;
D O I
10.3390/sym16060712
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Many studies have shown successful applications of the Dirichlet process mixture model (DPMM) for clustering continuous data. Beyond continuous data, in practice, one can expect to see different data types, including ordinal and nominal data. Existing DPMMs for clustering mixed-type data assume a strict covariance matrix structure, resulting in an overfit model. This article explores a DPMM for mixed-type data that allows the covariance matrix to differ from one cluster to another. We assume an underlying latent variable framework for ordinal and nominal data, which is then modeled jointly with the continuous data. The identifiability issue on the covariance matrix poses computational challenges, thus requiring a nonstandard inferential algorithm. The applicability and flexibility of the proposed model are illustrated through simulation examples and real data applications.
引用
收藏
页数:19
相关论文
共 7 条