A new iterative initialization of EM algorithm for Gaussian mixture models

被引:7
作者
You, Jie [1 ]
Li, Zhaoxuan [1 ]
Du, Junli [1 ]
机构
[1] Northwest A&F Univ, Coll Sci, Yangling, Shaanxi, Peoples R China
关键词
MAXIMUM-LIKELIHOOD; CLASSIFICATION;
D O I
10.1371/journal.pone.0284114
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
BackgroundThe expectation maximization (EM) algorithm is a common tool for estimating the parameters of Gaussian mixture models (GMM). However, it is highly sensitive to initial value and easily gets trapped in a local optimum. MethodTo address these problems, a new iterative method of EM initialization (MRIPEM) is proposed in this paper. It incorporates the ideas of multiple restarts, iterations and clustering. In particular, the mean vector and covariance matrix of sample are calculated as the initial values of the iteration. Then, the optimal feature vector is selected from the candidate feature vectors by the maximum Mahalanobis distance as a new partition vector for clustering. The parameter values are renewed continuously according to the clustering results. ResultsTo verify the applicability of the MRIPEM, we compared it with other two popular initialization methods on simulated and real datasets, respectively. The comparison results of the three stochastic algorithms indicate that MRIPEM algorithm is comparable in relatively high dimensions and high overlaps and significantly better in low dimensions and low overlaps.
引用
收藏
页数:17
相关论文
共 29 条
[1]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[2]  
[Anonymous], 2007, EM ALGORITHM EXTENSI
[3]  
Asuncion A., 2007, UCI machine learning repository
[4]   Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models [J].
Biernacki, C ;
Celeux, G ;
Govaert, G .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2003, 41 (3-4) :561-575
[5]   Adaptive Seeding for Gaussian Mixture Models [J].
Bloemer, Johannes ;
Bujna, Kathrin .
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2016, PT II, 2016, 9652 :296-308
[6]   The Mahalanobis distance [J].
De Maesschalck, R ;
Jouan-Rimbaud, D ;
Massart, DL .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2000, 50 (01) :1-18
[7]   Robust Classification Under?0 Attack for the Gaussian Mixture Model* [J].
Delgosha, Payam ;
Hassani, Hamed ;
Pedarsani, Ramtin .
SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2022, 4 (01) :362-385
[8]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[9]  
Ghai Wiqas, 2021, Advances in Computational Intelligence and Communication Technology. Proceedings of CICT 2019. Advances in Intelligent Systems and Computing (AISC 1086), P395, DOI 10.1007/978-981-15-1275-9_32
[10]   COMPARING PARTITIONS [J].
HUBERT, L ;
ARABIE, P .
JOURNAL OF CLASSIFICATION, 1985, 2 (2-3) :193-218