Balanced longitudinal data clustering with a copula kernel mixture model

被引:1
作者
Zhang, Xi [1 ]
Murphy, Orla A. [2 ]
Mcnicholas, Paul D. [1 ]
机构
[1] McMaster Univ, Dept Math & Stat, Hamilton, ON, Canada
[2] Dalhousie Univ, Dept Math & Stat, Halifax, NS, Canada
来源
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE | 2025年 / 53卷 / 01期
基金
加拿大自然科学与工程研究理事会;
关键词
Clustering; copula; finite mixture model; longitudinal data; LIKELIHOOD;
D O I
10.1002/cjs.11838
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Many common clustering methods cannot be used for clustering balanced multivariate longitudinal data in cases where the covariance of variables is a function of the time points. In this article, a copula kernel mixture model (CKMM) is proposed for clustering data of this type. The CKMM is a finite mixture model that decomposes each mixture component's joint density function into a copula and marginal distribution functions. In this decomposition, the Gaussian copula is used due to its mathematical tractability and Gaussian kernel functions are used to estimate the marginal distributions. A generalized expectation-maximization algorithm is used to estimate the model parameters. The performance of the proposed model is assessed in a simulation study and on two real datasets. The proposed model is shown to have effective performance in comparison with standard methods, such as K-means with dynamic time warping clustering, latent growth models and functional high-dimensional data clustering. Les m & eacute;thodes de regroupement classiques pr & eacute;sentent des limitations pour l'analyse des donn & eacute;es longitudinales multivari & eacute;es & eacute;quilibr & eacute;es dont la covariance varie dans le temps. Les auteurs de cet article d & eacute;veloppent un mod & egrave;le de m & eacute;lange bas & eacute; sur des copules et des noyaux (CKMM) pour r & eacute;pondre & agrave; cette probl & eacute;matique. Le CKMM est un mod & egrave;le & agrave; m & eacute;lange fini qui d & eacute;compose la densit & eacute; conjointe de chaque composante en une copule et des lois marginales. La copule gaussienne est retenue pour sa simplicit & eacute; math & eacute;matique. Des noyaux gaussiens sont utilis & eacute;s pour l'estimation des lois marginales. Un algorithme g & eacute;n & eacute;ralis & eacute; d'esp & eacute;rance-maximisation permet d'estimer les param & egrave;tres du mod & egrave;le. Une & eacute;tude de simulation et l'analyse de deux jeux de donn & eacute;es r & eacute;els d & eacute;montrent la performance du mod & egrave;le. Le CKMM surpasse les m & eacute;thodes classiques comme le K-means avec alignement dynamique, les mod & egrave;les de croissance latente et les m & eacute;thodes de regroupement de donn & eacute;es fonctionnelles en haute dimension.
引用
收藏
页数:28
相关论文
共 47 条
[1]   Unsupervised curve clustering using B-splines [J].
Abraham, C ;
Cornillon, PA ;
Matzner-Lober, E ;
Molinari, N .
SCANDINAVIAN JOURNAL OF STATISTICS, 2003, 30 (03) :581-595
[2]  
Bagnall Anthony, 2018, arXiv
[3]  
Berndt D. J., 1994, P 3 INT C KNOWLEDGE, V10, P359
[4]   ESTIMATION OF A MIXING DISTRIBUTION FUNCTION [J].
BLUM, JR ;
SUSARLA, V .
ANNALS OF PROBABILITY, 1977, 5 (02) :200-209
[5]   THE DISCRIMINATIVE FUNCTIONAL MIXTURE MODEL FOR A COMPARATIVE ANALYSIS OF BIKE SHARING SYSTEMS [J].
Bouveyron, Charles ;
Come, Etienne ;
Jacques, Julien .
ANNALS OF APPLIED STATISTICS, 2015, 9 (04) :1726-1760
[6]   An entropy criterion for assessing the number of clusters in a mixture model [J].
Celeux, G ;
Soromenho, G .
JOURNAL OF CLASSIFICATION, 1996, 13 (02) :195-212
[7]  
CHANDRA S, 1977, SCAND J STAT, V4, P105
[8]   Functional clustering and identifying substructures of longitudinal data [J].
Chiou, Jeng-Min ;
Li, Pai-Ling .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2007, 69 :679-699
[9]  
Choros B, 2010, Copula Theory and Its Applications, Proceedings of the Workshop Held in Warsaw, 25-26 September 2009, Lecture Notes in Statistics, P77, DOI DOI 10.1007/978-3-642-12465-53
[10]   Clustering longitudinal profiles using P-splines and mixed effects models applied to time-course gene expression data [J].
Coffey, N. ;
Hinde, J. ;
Holian, E. .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 :14-29