Clustering based on a multilayer mixture model

被引:45
作者
Li, J [1 ]
机构
[1] Penn State Univ, Dept Stat, University Pk, PA 16802 USA
关键词
CEM; classification maximum likelihood (CML); clustering; EM; multilayer mixture of normals;
D O I
10.1198/106186005X59586
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In model-based Clustering, the density of each cluster is usually assumed to be a certain basic parametric distribution, for example, the normal distribution. In practice, it is often difficult to decide which parametric distribution is suitable to characterize a cluster, especially for multivariate data. Moreover, the densities of individual clusters may be multimodal themselves, and therefore cannot be accurately modeled by basic parametric distributions. This article explores a clustering approach that models each Cluster by a mixture of normals. The resulting overall model is a multilayer mixture of normals. Algorithms to estimate the model and perform clustering are developed based on the classification maximum likelihood (CML) and mixture maximum likelihood (MML) criteria. BIC and ICL-BIC are examined for choosing the number of normal components per cluster. Experiments on both simulated and real data are presented.
引用
收藏
页码:547 / 568
页数:22
相关论文
共 19 条
[1]  
Baker L. D., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P96, DOI 10.1145/290941.290970
[2]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[3]   An improvement of the NEC criterion for assessing the number of clusters in a mixture model [J].
Biernacki, C ;
Celeux, G ;
Govaert, G .
PATTERN RECOGNITION LETTERS, 1999, 20 (03) :267-272
[4]  
Biernacki C, 1997, COMPUTING SCI STAT, V29, P451
[5]  
Biernacki C., 1998, Technical Report 3521
[6]  
BINDER DA, 1978, BIOMETRIKA, V65, P31, DOI 10.2307/2335273
[7]  
Blake C.L., 1998, UCI repository of machine learning databases
[8]   A CLASSIFICATION EM ALGORITHM FOR CLUSTERING AND 2 STOCHASTIC VERSIONS [J].
CELEUX, G ;
GOVAERT, G .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1992, 14 (03) :315-332
[9]  
Celeux G., 1993, J STAT COMPUT SIM, V47, P127, DOI DOI 10.1080/00949659308811525
[10]  
DAY NE, 1969, BIOMETRIKA, V56, P463, DOI 10.1093/biomet/56.3.463