Bayesian estimation of membership uncertainty in model-based clustering

被引:6
作者
Chen, Liyuan [1 ]
Brown, Steven D. [1 ]
机构
[1] Univ Delaware, Dept Chem & Biochem, Brown Lab, Newark, DE 19716 USA
关键词
model-based clustering; Markov chain Monte Carlo simulation; Gibbs sampling; DISCRIMINANT-ANALYSIS; PRINCIPAL COMPONENT; MASS-SPECTROMETRY; EM ALGORITHM; CLASSIFICATION; STRATEGY;
D O I
10.1002/cem.2511
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We report the use of a cluster analysis method based on a multivariate mixture model, known as model-based clustering, for overcoming the limitations of hierarchical clustering and relocation clustering. Unlike traditional clustering methods in which clusters are formed on the basis of intercluster distances, model-based clustering classifies observations on the basis of probability estimated from Gaussian mixture modeling, and its statistical basis allows for inference. Three examples are given in which we demonstrate that model-based clustering gives much better performance for overlapping clusters, a more reliable determination of the number of clusters in data, and better identification of clustering in the presence of outliers than agglomerative hierarchical clustering or iterative relocation clustering using a K-means criterion. We also show that Markov chain Monte Carlo simulation, as implemented via Gibbs sampling coupled with model-based clustering, may be used to assess uncertainty of group memberships. Copyright (c) 2013 John Wiley & Sons, Ltd. We illustrate the use of model-based clustering and show three examples in which model-based clustering gives much better performance for overlapping clusters, a more reliable determination of the number of clusters in data, and better identification of clustering than other clustering methods. We also show that Markov chain Monte Carlo simulation, as implemented via Gibbs sampling coupled with model-based clustering, may be used to assess uncertainty of cluster group memberships.
引用
收藏
页码:358 / 369
页数:12
相关论文
共 46 条
  • [1] [Anonymous], 1995, Markov Chain Monte Carlo in Practice
  • [2] MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING
    BANFIELD, JD
    RAFTERY, AE
    [J]. BIOMETRICS, 1993, 49 (03) : 803 - 821
  • [3] NMR-based metabonomic toxicity classification: hierarchical cluster analysis and k-nearest-neighbour approaches
    Beckonert, O
    Bollard, ME
    Ebbels, TMD
    Keun, HC
    Antti, H
    Holmes, E
    Lindon, JC
    Nicholson, JK
    [J]. ANALYTICA CHIMICA ACTA, 2003, 490 (1-2) : 3 - 15
  • [4] Bensmail H, 2003, J CLASSIF, V20, P49, DOI 10.1007/s00357-003-00-05-5
  • [5] A strategy for finding relevant clusters;: with an application to microarray data
    Berget, I
    Mevik, BH
    Vebo, H
    Næs, T
    [J]. JOURNAL OF CHEMOMETRICS, 2005, 19 (09) : 482 - 491
  • [6] Using unclassified observations for improving classifiers
    Berget, I
    Næs, T
    [J]. JOURNAL OF CHEMOMETRICS, 2004, 18 (02) : 103 - 111
  • [7] Model-based cluster and discriminant analysis with the MIXMOD software
    Biernacki, Christophe
    Celeux, Gilles
    Govaert, Gerard
    Langrognet, Florent
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2006, 51 (02) : 587 - 600
  • [8] A stochastic EM algorithm for a semiparametric mixture model
    Bordes, Laurent
    Chauveau, Didier
    Vandekerkhove, Pierre
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 51 (11) : 5429 - 5443
  • [9] Brooks SP, 1998, J ROY STAT SOC D-STA, V47, P69, DOI 10.1111/1467-9884.00117
  • [10] Stochastic versions of the EM algorithm: An experimental study in the mixture case
    Celeux, G
    Chauveau, D
    Diebolt, J
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 1996, 55 (04) : 287 - 314