Flexible clustering via extended mixtures of common t-factor analyzers

被引:0
作者
Wan-Lun Wang
Tsung-I Lin
机构
[1] Feng Chia University,Department of Statistics, Graduate Institute of Statistics and Actuarial Science
[2] National Chung Hsing University,Institute of Statistics
[3] China Medical University,Department of Public Health
来源
AStA Advances in Statistical Analysis | 2017年 / 101卷
关键词
Clustering; Classification; Factor loadings; Mixture models; Outlier detection; 62H25; 62H30;
D O I
暂无
中图分类号
学科分类号
摘要
Mixtures of t-factor analyzers have been broadly used for model-based density estimation and clustering of high-dimensional data from a heterogeneous population with longer-than-normal tails or atypical observations. To reduce the number of parameters in the component covariance matrices, the mixtures of common t-factor analyzers (MCtFA) have been recently proposed by assuming a common factor loading across different components. In this paper, we present an extended version of MCtFA using distinct covariance matrices for component errors. The modified mixture model offers a more appropriate way to represent the data in a graphical fashion. Two flexible EM-type algorithms are developed for iteratively computing maximum likelihood estimates of parameters. Practical considerations for the specification of starting values, model-based clustering, classification of new subject and identification of potential outliers are also provided. We demonstrate the superiority of the proposed methodology by analyzing the Italian wine data and a simulation study.
引用
收藏
页码:227 / 252
页数:25
相关论文
共 82 条
[1]  
Andrews JL(2011)Extending mixtures of multivariate Stat. Comput. 21 361-373
[2]  
McNicholas PD(2011)-factor analyzers Bioinformatics 27 1269-1276
[3]  
Baek J(2010)Mixtures of common IEEE Trans. Pattern Anal. Mach. Intell. 32 1-13
[4]  
McLachlan GJ(2000)-factor analyzers for clustering high-dimensional microarray data IEEE Trans. Pattern Anal. Mach. Intell. 22 719-725
[5]  
Baek J(1977)Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data J. R. Stat. Soc. B 39 1-38
[6]  
McLachlan GJ(1994)Assessing a mixture model for clustering with the integrated completed likelihood J. R. Stat. Soc. B 56 363-375
[7]  
Flack LK(1984)Maximum likelihood from incomplete data via the EM algorithm (with discussion) J. Am. Stat. Assoc. 79 892-898
[8]  
Biernacki C(2003)Estimation of finite mixtures through Bayesian sampling Mach. Learn. 50 73-94
[9]  
Celeux G(1986)Common principle components in Vitis 25 189-201
[10]  
Govaert G(2002) groups J. Am. Stat. Assoc. 97 611-631