Bayesian infinite mixture model based clustering of gene expression profiles

被引:187
作者
Medvedovic, M
Sivaganesan, S
机构
[1] Univ Cincinnati, Med Ctr, Dept Environm Hlth, Ctr Genome Informat, Cincinnati, OH 45267 USA
[2] Univ Cincinnati, Dept Math Sci, Cincinnati, OH 45241 USA
关键词
D O I
10.1093/bioinformatics/18.9.1194
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The biologic significance of results obtained through cluster analyses of gene expression data generated in microarray experiments have been demonstrated in many studies. In this article we focus on the development of a clustering procedure based on the concept of Bayesian model-averaging and a precise statistical model of expression data. Results: We developed a clustering procedure based on the Bayesian infinite mixture model and applied it to clustering gene expression profiles. Clusters of genes with similar expression patterns are identified from the posterior distribution of clusterings defined implicitly by the stochastic data-generation model. The posterior distribution of clusterings is estimated by a Gibbs sampler. We summarized the posterior distribution of clusterings by calculating posterior pairwise probabilities of co-expression and used the complete linkage principle to create clusters. This approach has several advantages over usual clustering procedures. The analysis allows for incorporation of a reasonable probabilistic model for generating data. The method does not require specifying the number of clusters and resulting optimal clustering is obtained by averaging over models with all possible numbers of clusters. Expression profiles that are not similar to any other profile are automatically detected, the method incorporates experimental replicates, and it can be extended to accommodate missing data. This approach represents a qualitative shift in the model-based cluster analysis of expression data because it allows for incorporation of uncertainties involved in the model selection in the final assessment of confidence in similarities of expression profiles. We also demonstrated the importance of incorporating the information on experimental variability into the clustering model.
引用
收藏
页码:1194 / 1206
页数:13
相关论文
共 44 条
[1]  
[Anonymous], 1998, INTRO BOOTSTRAP
[2]   Identifying differentially expressed genes in cDNA microarray experiments [J].
Baggerly, KA ;
Coombes, KR ;
Hess, KR ;
Stivers, DN ;
Abruzzo, LV ;
Zhang, W .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2001, 8 (06) :639-659
[3]   A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes [J].
Baldi, P ;
Long, AD .
BIOINFORMATICS, 2001, 17 (06) :509-519
[4]  
Baldi P., 1998, Bioinformatics: The machine learning approach
[5]   Choosing models in model-based clustering and discriminant analysis [J].
Biernacki, C ;
Govaert, G .
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 1999, 64 (01) :49-71
[6]   A genome-wide transcriptional analysis of the mitotic cell cycle [J].
Cho, RJ ;
Campbell, MJ ;
Winzeler, EA ;
Steinmetz, L ;
Conway, A ;
Wodicka, L ;
Wolfsberg, TG ;
Gabrielian, AE ;
Landsman, D ;
Lockhart, DJ ;
Davis, RW .
MOLECULAR CELL, 1998, 2 (01) :65-73
[7]  
Cowell R.G., 1999, PROBABILISTIC NETWOR
[8]   Genetic network inference: from co-expression clustering to reverse engineering [J].
D'haeseleer, P ;
Liang, SD ;
Somogyi, R .
BIOINFORMATICS, 2000, 16 (08) :707-726
[9]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[10]   BAYESIAN DENSITY-ESTIMATION AND INFERENCE USING MIXTURES [J].
ESCOBAR, MD ;
WEST, M .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1995, 90 (430) :577-588