Overfitting Bayesian Mixture Models with an Unknown Number of Components

被引:31
作者
van Havre, Zoe [1 ,2 ]
White, Nicole [1 ]
Rousseau, Judith [2 ]
Mengersen, Kerrie [2 ]
机构
[1] Queensland Univ Technol, Sch Math Sci, Brisbane, Qld 4001, Australia
[2] Univ Paris 09, CEREMADE, F-75775 Paris, France
基金
澳大利亚研究理事会;
关键词
CHAIN-MONTE-CARLO; DENSITY-ESTIMATION; MARGINAL LIKELIHOOD; INFERENCE;
D O I
10.1371/journal.pone.0131739
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This paper proposes solutions to three issues pertaining to the estimation of finite mixture models with an unknown number of components: the non-identifiability induced by overfitting the number of components, the mixing limitations of standard Markov Chain Monte Carlo (MCMC) sampling techniques, and the related label switching problem. An overfitting approach is used to estimate the number of components in a finite mixture model via a Zmix algorithm. Zmix provides a bridge between multidimensional samplers and test based estimation methods, whereby priors are chosen to encourage extra groups to have weights approaching zero. MCMC sampling is made possible by the implementation of prior parallel tempering, an extension of parallel tempering. Zmix can accurately estimate the number of components, posterior parameter estimates and allocation probabilities given a sufficiently large sample size. The results will reflect uncertainty in the final model and will report the range of possible candidate models and their respective estimated probabilities from a single run. Label switching is resolved with a computationally light-weight method, Zswitch, developed for overfitted mixtures by exploiting the intuitiveness of allocation-based relabelling algorithms and the precision of label-invariant loss functions. Four simulation studies are included to illustrate Zmix and Zswitch, as well as three case studies from the literature. All methods are available as part of the R package Zmix, which can currently be applied to univariate Gaussian mixture models.
引用
收藏
页数:27
相关论文
共 42 条
[1]   Likelihood and Bayesian analysis of mixtures [J].
Aitkin, Murray .
STATISTICAL MODELLING, 2001, 1 (04) :287-304
[2]   Parallel metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference [J].
Altekar, G ;
Dwarkadas, S ;
Huelsenbeck, JP ;
Ronquist, F .
BIOINFORMATICS, 2004, 20 (03) :407-415
[3]  
[Anonymous], WILEY SERIES PROBABI
[4]  
[Anonymous], ARXIV07110458
[5]  
[Anonymous], JOURNAL OF THE ROYAL
[6]  
[Anonymous], IEEE COMP SOC C COMP
[7]  
[Anonymous], 2010, Statistical Methods in Medical Research
[8]  
[Anonymous], 1996, MARKOV CHAIN MONTE C
[9]   Likelihood-free parallel tempering [J].
Baragatti, Meili ;
Grimaud, Agnes ;
Pommeret, Denys .
STATISTICS AND COMPUTING, 2013, 23 (04) :535-549
[10]   A POPULATION AND FAMILY STUDY OF N-ACETYLTRANSFERASE USING CAFFEINE URINARY METABOLITES [J].
BECHTEL, YC ;
BONAITIPELLIE, C ;
POISSON, N ;
MAGNETTE, J ;
BECHTEL, PR .
CLINICAL PHARMACOLOGY & THERAPEUTICS, 1993, 54 (02) :134-141