A new random approach for initialization of the multiple restart EM algorithm for Gaussian model-based clustering

被引:13
作者
Kwedlo, Wojciech [1 ]
机构
[1] Bialystok Tech Univ, Fac Comp Sci, PL-15351 Bialystok, Poland
关键词
Gaussian mixture models; EM algorithm initialization; Model-based clustering; Multiple restart EM; DISCRIMINANT-ANALYSIS; MAXIMUM-LIKELIHOOD; SIMULATING DATA; MIXTURE-MODELS; PERFORMANCE;
D O I
10.1007/s10044-014-0441-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The paper proposes a new method for initialization of the multiple restart EM algorithm for Gaussian mixture model-based clustering. The method initializes randomly both the mean vector and covariance matrix of a mixture component. In particular, the mean vector is initialized by a feature vector selected deterministically from a random subset of candidate feature vectors. The selection criterion is the maximum Mahalanobis distance from the already initialized mixture component centers. The covariance matrix of a component is initialized by randomly generating its eigenvalues and eigenvectors. In computational experiments, the used approach was compared with three other random EM initialization methods. The experiments were performed on synthetic datasets generated from the Gaussian mixtures with the different overlap characteristics, as well as on four real-life datasets. The results on synthetic data indicate that, for well separated clusters, for which the maximum pairwise overlap is not excessively high, the described method yields clusterings which correspond better to the original partitions of data, as indicated by the adjusted Rand index. The experiments on real data indicate that the performance of the method is comparable to other three methods for two smaller datasets and significantly better for two larger datasets.
引用
收藏
页码:757 / 770
页数:14
相关论文
共 37 条
[1]   Using evolutionary algorithms for model-based clustering [J].
Andrews, Jeffrey L. ;
McNicholas, Paul D. .
PATTERN RECOGNITION LETTERS, 2013, 34 (09) :987-992
[2]  
[Anonymous], 2008, EM ALGORITHM EXTENSI
[3]  
[Anonymous], 2007, SOC IND APPL MATH
[4]  
[Anonymous], 1966, Textures: a photographic album for artists and designers
[5]   Maximum likelihood estimation of Gaussian mixture models using stochastic search [J].
Ari, Caglar ;
Aksoy, Selim ;
Arikan, Orhan .
PATTERN RECOGNITION, 2012, 45 (07) :2804-2816
[6]  
Bache K., 2013, UCI Machine Learning Repository
[7]   Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models [J].
Biernacki, C ;
Celeux, G ;
Govaert, G .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2003, 41 (3-4) :561-575
[8]  
Bishop C., 2006, PATTERN RECOGN, DOI DOI 10.1117/1.2819119
[9]  
Conover W.J, 1999, Practical Nonparametric Statistics, V3rd
[10]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38