Probabilistic clustering of time-evolving distance data

被引:0
作者
Julia E. Vogt
Marius Kloft
Stefan Stark
Sudhir S. Raman
Sandhya Prabhakaran
Volker Roth
Gunnar Rätsch
机构
[1] Memorial Sloan-Kettering Cancer Center,Computational Biology
[2] Humboldt University of Berlin,Department of Computer Science
[3] University of Zurich and ETH Zurich,Translational Neuromodeling Unit (TNU), Institute for Biomedical Engineering
[4] University of Basel,Department of Mathematics and Computer Science
来源
Machine Learning | 2015年 / 100卷
关键词
Cluster Model; Pairwise Distance; Dirichlet Process; Memorial Sloan Kettering Cancer Center; Wishart Distribution;
D O I
暂无
中图分类号
学科分类号
摘要
We present a novel probabilistic clustering model for objects that are represented via pairwise distances and observed at different time points. The proposed method utilizes the information given by adjacent time points to find the underlying cluster structure and obtain a smooth cluster evolution. This approach allows the number of objects and clusters to differ at every time point, and no identification on the identities of the objects is needed. Further, the model does not require the number of clusters being specified in advance—they are instead determined automatically using a Dirichlet process prior. We validate our model on synthetic data showing that the proposed method is more accurate than state-of-the-art clustering methods. Finally, we use our dynamic clustering model to analyze and illustrate the evolution of brain cancer patients over time.
引用
收藏
页码:635 / 654
页数:19
相关论文
共 35 条
[1]  
Anderson TW(1946)The non-central wishart distribution and certain problems of multivariate statistics The Annals of Mathematical Statistics 17 409-431
[2]  
Blei D(2006)Variational inference for Dirichlet process mixtures Bayesian Analysis 1 121-144
[3]  
Jordan M(2011)Distance dependent chinese restaurant processes Journal of Machine Learning Reseach 12 2461-2488
[4]  
Blei DM(1998)Cluster analysis and display of genome-wide expression patterns Proceedings of the National Academy of Sciences 95 14863-14868
[5]  
Frazier P(1972)The sampling theory of selectively neutral alleles Theoretical Population Biology 3 87-112
[6]  
Eisen MB(1973)A bayesian analysis of some nonparametric problems Annals of Statistics 1 209-230
[7]  
Spellman PT(1999)Learning the parts of objects by non-negative matrix factorization Nature 401 788-791
[8]  
Brown PO(2003)Mismatch string kernel for discriminative protein classification Bioinformatics 1 1-10
[9]  
Botstein D(1994)Estimating normal means with a conjugate-style Dirichlet process prior Communication in Statistics: Simulation and Computation 23 727-741
[10]  
Ewens WJ(2009)Marginal likelihood for distance matrices Statistica Sinica 19 631-649