Clustering data with non-ignorable missingness using semi-parametric mixture models assuming independence within components

被引:4
作者
de Chaumaray, Marie du Roy [1 ]
Marbac, Matthieu [1 ]
机构
[1] Univ Rennes, Ensai, CNRS, CREST UMR 9194, F-35000 Rennes, France
关键词
Clustering; Mixture model; Non-ignorable missingness; Smoothed likelihood; NONPARAMETRIC-ESTIMATION; MULTIVARIATE; DISTRIBUTIONS; LIKELIHOOD;
D O I
10.1007/s11634-023-00534-w
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We propose a semi-parametric clustering model assuming conditional independence given the component. One advantage is that this model can handle non-ignorable missingness. The model defines each component as a product of univariate probability distributions but makes no assumption on the form of each univariate density. Note that the mixture model is used for clustering but not for estimating the density of the full variables (observed and unobserved). Estimation is performed by maximizing an extension of the smoothed likelihood allowing missingness. This optimization is achieved by a Majorization-Minorization algorithm. We illustrate the relevance of our approach by numerical experiments conducted on simulated data. Under mild assumptions, we show the identifiability of the model defining the distribution of the observed data and the monotonicity of the algorithm. We also propose an extension of this new method to the case of mixed-type data that we illustrate on a real data set. The proposed method is implemented in the R package MNARclust available on CRAN.
引用
收藏
页码:1081 / 1122
页数:42
相关论文
共 49 条
[1]   IDENTIFIABILITY OF PARAMETERS IN LATENT STRUCTURE MODELS WITH MANY OBSERVED VARIABLES [J].
Allman, Elizabeth S. ;
Matias, Catherine ;
Rhode, John A. .
ANNALS OF STATISTICS, 2009, 37 (6A) :3099-3132
[2]  
Audigier V, 2021, ARXIV
[3]  
Audigier V, 2020, ARXIV
[4]   A Framework for Multiple Imputation in Cluster Analysis [J].
Basagana, Xavier ;
Barrera-Gomez, Jose ;
Benet, Marta ;
Anto, Josep M. ;
Garcia-Aymerich, Judith .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 2013, 177 (07) :718-725
[5]  
Benaglia T, 2011, NONPARAMETRIC STATISTICS AND MIXTURE MODELS: A FESTSCHRIFT IN HONOR OF THOMAS P HETTMANSPERGER, P15
[6]   An EM-Like Algorithm for Semi- and Nonparametric Estimation in Multivariate Mixtures [J].
Benaglia, Tatiana ;
Chauveau, Didier ;
Hunter, David R. .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2009, 18 (02) :505-526
[7]   Exact and Monte Carlo calculations of integrated likelihoods for the latent class model [J].
Biernacki, C. ;
Celeux, G. ;
Govaert, G. .
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2010, 140 (11) :2991-3002
[8]   ESTIMATING MULTIVARIATE LATENT-STRUCTURE MODELS [J].
Bonhomme, Stephane ;
Jochmans, Koen ;
Robin, Jean-Marc .
ANNALS OF STATISTICS, 2016, 44 (02) :540-563
[9]   Clustering multiply imputed multivariate high-dimensional longitudinal profiles [J].
Bruckers, Liesbeth ;
Molenberghs, Geert ;
Dendale, Paul .
BIOMETRICAL JOURNAL, 2017, 59 (05) :998-1015
[10]   Semi-parametric estimation for conditional independence multivariate finite mixture models [J].
Chauveau, Didier ;
Hunter, David R. ;
Levine, Michael .
STATISTICS SURVEYS, 2015, 9 :1-31