Bayesian mixed membership models for soft clustering and classification

被引:24
作者
Erosheva, EA [1 ]
Fienberg, SE
机构
[1] Univ Washington, Dept Stat, Sch Social Work, Ctr Stat & Social Sci, Seattle, WA 98195 USA
[2] Carnegie Mellon Univ, Dept Stat, Ctr Automated Learning & Discovery, Ctr Comp & Commun Secur, Pittsburgh, PA 15213 USA
来源
CLASSIFICATION - THE UBIQUITOUS CHALLENGE | 2005年
关键词
D O I
10.1007/3-540-28084-7_2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The paper describes and applies a fully Bayesian approach to soft clustering and classification using mixed membership models. Our model structure has assumptions on four levels: population, subject, latent variable, and sampling scheme. Population level assumptions describe the general structure of the population that is common to all subjects. Subject level assumptions specify the distribution of observable responses given individual membership scores. Membership scores are usually unknown and hence we can also view them as latent variables, treating. them as either fixed or random in the model. Finally, the last level of assumptions specifies the number of distinct observed characteristics and the number of replications for each characteristic. We illustrate the flexibility and utility of the general model through two applications using data from: (i) the National Long Term Care Survey where we explore types of disability; (ii) abstracts and bibliographies from articles published in The Proceedings of the National Academy of Sciences. In the first application we use a Monte Carlo Markov chain implementation for sampling from the posterior distribution. In the second application, because of the size and complexity of the data base, we use a variational approximation to the posterior. We also include a guide to other applications of mixed membership modeling.
引用
收藏
页码:11 / 26
页数:16
相关论文
共 30 条
[1]   Matching words and pictures [J].
Barnard, K ;
Duygulu, P ;
Forsyth, D ;
de Freitas, N ;
Blei, DM ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) :1107-1135
[2]  
Blei D., 2003, P 26 ANN INT ACM SIG, P127, DOI DOI 10.1145/860435.860460
[3]  
Blei DM, 2003, BAYESIAN STATISTICS 7, P25
[4]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[5]   Individual tree-based species classification in high spatial resolution aerial images of forests using fuzzy sets [J].
Brandtberg, T .
FUZZY SETS AND SYSTEMS, 2002, 132 (03) :371-387
[6]  
COHN D, 2001, NEURAL INFORMATION P
[7]   Using the conditional grade-of-membership model to assess judgment accuracy [J].
Cooil, B ;
Varki, S .
PSYCHOMETRIKA, 2003, 68 (03) :453-471
[8]  
Denison D. G. T, 2002, BAYESIAN METHODS NON
[9]  
Erosheva E.A., 2002, THESIS CARNEGIE MELL
[10]  
Erosheva EA, 2003, BAYESIAN STATISTICS 7, P501