A robust model-based clustering based on the geometric median and the median covariation matrix

被引:0
作者
Antoine Godichon-Baggioni
Stéphane Robin
机构
[1] Sorbonne Université,Laboratoire de Probabilités, Statistique et Modélisation
[2] CNRS,undefined
来源
Statistics and Computing | 2024年 / 34卷
关键词
EM algorithm; Geometric median; Median covariation matrix; Mixture models; Robust statistics;
D O I
暂无
中图分类号
学科分类号
摘要
Grouping observations into homogeneous groups is a recurrent task in statistical data analysis. We consider Gaussian Mixture Models, which are the most famous parametric model-based clustering method. We propose a new robust approach for model-based clustering, which consists in a modification of the EM algorithm (more specifically, the M-step) by replacing the estimates of the mean and the variance by robust versions based on the median and the median covariation matrix. All the proposed methods are available in the R package RGMM accessible on CRAN.
引用
收藏
相关论文
共 73 条
[1]  
Andrews JL(2018)teigen: an R package for model-based clustering and classification via the multivariate t distribution J. Stat. Softw. 83 1-32
[2]  
Wickins JR(1993)Model-based Gaussian and non-Gaussian clustering Biometrics 49 803-821
[3]  
Boers NM(2012)Slope heuristics: overview and implementation Stat. Comput. 22 455-470
[4]  
McNicholas PD(2000)Assessing a mixture model for clustering with the integrated completed likelihood IEEE Trans. Pattern Anal. Mach. Intell. 22 719-25
[5]  
Banfield JD(2003)Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate gaussian mixture models Comput. Stat. Data Anal. 41 561-575
[6]  
Raftery AE(2013)Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm Bernoulli 19 18-43
[7]  
Baudry J-P(2017)Online estimation of the geometric median in Hilbert spaces: nonasymptotic confidence balls Ann. Stat. 45 591-614
[8]  
Maugis C(2016)Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust gaussian clustering J. Am. Stat. Assoc. 111 1648-1659
[9]  
Michel B(2017)Consistency, breakdown robustness, and algorithms for robust improper maximum likelihood clustering J. Mach. Learn. Res. 18 1-39
[10]  
Biernacki C(1977)Maximum likelihood from incomplete data via the EM algorithm J. R. Stat. Soc. B 39 1-38