Bayesian estimation of membership uncertainty in model-based clustering

被引:6
作者
Chen, Liyuan [1 ]
Brown, Steven D. [1 ]
机构
[1] Univ Delaware, Dept Chem & Biochem, Brown Lab, Newark, DE 19716 USA
关键词
model-based clustering; Markov chain Monte Carlo simulation; Gibbs sampling; DISCRIMINANT-ANALYSIS; PRINCIPAL COMPONENT; MASS-SPECTROMETRY; EM ALGORITHM; CLASSIFICATION; STRATEGY;
D O I
10.1002/cem.2511
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We report the use of a cluster analysis method based on a multivariate mixture model, known as model-based clustering, for overcoming the limitations of hierarchical clustering and relocation clustering. Unlike traditional clustering methods in which clusters are formed on the basis of intercluster distances, model-based clustering classifies observations on the basis of probability estimated from Gaussian mixture modeling, and its statistical basis allows for inference. Three examples are given in which we demonstrate that model-based clustering gives much better performance for overlapping clusters, a more reliable determination of the number of clusters in data, and better identification of clustering in the presence of outliers than agglomerative hierarchical clustering or iterative relocation clustering using a K-means criterion. We also show that Markov chain Monte Carlo simulation, as implemented via Gibbs sampling coupled with model-based clustering, may be used to assess uncertainty of group memberships. Copyright (c) 2013 John Wiley & Sons, Ltd. We illustrate the use of model-based clustering and show three examples in which model-based clustering gives much better performance for overlapping clusters, a more reliable determination of the number of clusters in data, and better identification of clustering than other clustering methods. We also show that Markov chain Monte Carlo simulation, as implemented via Gibbs sampling coupled with model-based clustering, may be used to assess uncertainty of cluster group memberships.
引用
收藏
页码:358 / 369
页数:12
相关论文
共 46 条
[1]  
[Anonymous], 1995, Markov Chain Monte Carlo in Practice
[2]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[3]   NMR-based metabonomic toxicity classification: hierarchical cluster analysis and k-nearest-neighbour approaches [J].
Beckonert, O ;
Bollard, ME ;
Ebbels, TMD ;
Keun, HC ;
Antti, H ;
Holmes, E ;
Lindon, JC ;
Nicholson, JK .
ANALYTICA CHIMICA ACTA, 2003, 490 (1-2) :3-15
[4]  
Bensmail H, 2003, J CLASSIF, V20, P49, DOI 10.1007/s00357-003-00-05-5
[5]   A strategy for finding relevant clusters;: with an application to microarray data [J].
Berget, I ;
Mevik, BH ;
Vebo, H ;
Næs, T .
JOURNAL OF CHEMOMETRICS, 2005, 19 (09) :482-491
[6]   Using unclassified observations for improving classifiers [J].
Berget, I ;
Næs, T .
JOURNAL OF CHEMOMETRICS, 2004, 18 (02) :103-111
[7]   Model-based cluster and discriminant analysis with the MIXMOD software [J].
Biernacki, Christophe ;
Celeux, Gilles ;
Govaert, Gerard ;
Langrognet, Florent .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2006, 51 (02) :587-600
[8]   A stochastic EM algorithm for a semiparametric mixture model [J].
Bordes, Laurent ;
Chauveau, Didier ;
Vandekerkhove, Pierre .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 51 (11) :5429-5443
[9]  
Brooks SP, 1998, J ROY STAT SOC D-STA, V47, P69, DOI 10.1111/1467-9884.00117
[10]   Stochastic versions of the EM algorithm: An experimental study in the mixture case [J].
Celeux, G ;
Chauveau, D ;
Diebolt, J .
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 1996, 55 (04) :287-314