Robust clustering via mixtures of t factor analyzers with incomplete data

被引:0
作者
Wan-Lun Wang
Tsung-I Lin
机构
[1] Feng Chia University,Department of Statistics, Graduate Institute of Statistics and Actuarial Science
[2] National Chung Hsing University,Institute of Statistics
[3] China Medical University,Department of Public Health
来源
Advances in Data Analysis and Classification | 2022年 / 16卷
关键词
Data reduction; Factor analyzer; Information matrix; Mixture models; Multivariate ; distribution; Missing data; 62H25; 62H30;
D O I
暂无
中图分类号
学科分类号
摘要
Mixtures of t factor analyzers (MtFA) are powerful and widely used tools for robust clustering of high-dimensional data in the presence of outliers. However, the occurrence of missing values may cause analytical intractability and computational complexity when fitting the MtFA model. We explicitly derive the score vector and Hessian matrix of the MtFA model with incomplete data to approximate the information matrix. In this regard, some asymptotic properties can be established under certain regularity conditions. Three expectation-maximization-based algorithms are developed for maximum likelihood estimation of the MtFA model with possibly missing values at random. Practical issues related to the recovery of missing values and clustering of partially observed samples are also investigated. The relevant utility of our methodology is exemplified through the analysis of simulated and real data sets.
引用
收藏
页码:659 / 690
页数:31
相关论文
共 87 条
[1]  
Anderson TW(1957)Maximum likelihood estimates for a multivariate normal distribution when some observations are missing J Am Stat Assoc 52 200-203
[2]  
Boldea O(2009)Maximum likelihood estimation of the multivariate normal mixture model J Am Stat Assoc 104 1539-1549
[3]  
Magnus JR(1977)Maximum likelihood from incomplete data via the EM algorithm (with discussion) J R Stat Soc Ser B 39 1-38
[4]  
Dempster AP(2003)Mixtures of factor analyzers. Bayesian estimation and inference by stochastic simulation Mach Learn 50 73-94
[5]  
Laird NM(2015)Maximum likelihood estimation in constrained parameter spaces for mixtures of factor analyzers Stat Comput 25 215-226
[6]  
Rubin DB(2016)Full information maximum likelihood estimation in factor analysis with a large number of missing values J Stat Comput Simul 86 91-104
[7]  
Fokoué E(1968)Estimation of parameters in the multivariate normal distribution with missing observations J Am Stat Assoc 63 159-173
[8]  
Titterington DM(2021)Mixtures of factor analyzers with fundamental skew symmetric distributions Adv Data Anal Classif 15 481-512
[9]  
Greselin F(2018)Multivariate longitudinal data analysis with censored and intermittent missing responses Stat Med 37 2822-2835
[10]  
Ingrassia S(2006)On fast supervised learning for normal mixture models with missing information Pattern Recognit 39 1177-1187