Hierarchical Bayesian nonparametric mixture models for clustering with variable relevance determination

被引:26
作者
Yau, Christopher [1 ]
Holmes, Chris [1 ,2 ]
机构
[1] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
[2] Univ Oxford, Oxford Man Inst, Oxford, England
来源
BAYESIAN ANALYSIS | 2011年 / 6卷 / 02期
基金
英国医学研究理事会;
关键词
Bayesian mixture models; Bayesian nonparametric priors; variable selection; unsupervised learning; MONTE-CARLO METHODS; FEATURE-SELECTION; SAMPLING METHODS; UNKNOWN NUMBER; DIRICHLET;
D O I
10.1214/11-BA612
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
We propose a hierarchical Bayesian nonparametric mixture model for clustering when some of the covariates are assumed to be of varying relevance to the clustering problem. This can be thought of as an issue in variable selection for unsupervised learning. We demonstrate that by defining a hierarchical population based nonparametric prior on the cluster locations scaled by the inverse covariance matrices of the likelihood we arrive at a 'sparsity prior' representation which admits a conditionally conjugate prior. This allows us to perform full Gibbs sampling to obtain posterior distributions over parameters of interest including an explicit measure of each covariate's relevance and a distribution over the number of potential clusters present in the data. This also allows for individual cluster specific variable selection. We demonstrate improved inference on a number of canonical problems.
引用
收藏
页码:329 / 351
页数:23
相关论文
共 37 条
[1]  
ANDREWS DF, 1974, J ROY STAT SOC B MET, V36, P99
[2]  
[Anonymous], 2006, Model selection and model averaging, DOI DOI 10.1017/CBO9780511790485.003
[3]  
[Anonymous], 2006, FINITE MIXTURE MARKO
[4]   MIXTURES OF DIRICHLET PROCESSES WITH APPLICATIONS TO BAYESIAN NONPARAMETRIC PROBLEMS [J].
ANTONIAK, CE .
ANNALS OF STATISTICS, 1974, 2 (06) :1152-1174
[5]   Genomic aberrations and survival in chronic lymphocytic leukemia. [J].
Döhner, H ;
Stilgenbauer, S ;
Benner, A ;
Leupolt, E ;
Kröber, A ;
Bullinger, L ;
Döhner, K ;
Bentz, M ;
Lichter, P .
NEW ENGLAND JOURNAL OF MEDICINE, 2000, 343 (26) :1910-1916
[6]  
Dy JG, 2004, J MACH LEARN RES, V5, P845
[7]   BAYESIAN DENSITY-ESTIMATION AND INFERENCE USING MIXTURES [J].
ESCOBAR, MD ;
WEST, M .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1995, 90 (430) :577-588
[8]   ESTIMATING NORMAL MEANS WITH A DIRICHLET PROCESS PRIOR [J].
ESCOBAR, MD .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1994, 89 (425) :268-277
[9]   BAYESIAN ANALYSIS OF SOME NONPARAMETRIC PROBLEMS [J].
FERGUSON, TS .
ANNALS OF STATISTICS, 1973, 1 (02) :209-230
[10]   The use of multiple measurements in taxonomic problems [J].
Fisher, RA .
ANNALS OF EUGENICS, 1936, 7 :179-188