Validating clustering for gene expression data

被引:443
作者
Yeung, KY [1 ]
Haynor, DR [1 ]
Ruzzo, WL [1 ]
机构
[1] Univ Washington, Seattle, WA 98195 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/17.4.309
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Many clustering algorithms have been proposed for the analysis of gene expression data, but little guidance is available to help choose among them. We provide a systematic framework for assessing the results of clustering algorithms. Clustering algorithms attempt to partition the genes into groups exhibiting similar patterns of variation in expression level. Our methodology is to apply a clustering algorithm to the data from all but one experimental condition. The remaining condition is used to assess the predictive power of the resulting clusters-meaningful clusters should exhibit less variation in the remaining condition than clusters formed by chance. Results: We successfully applied our methodology to compare six clustering algorithms on four gene expression data sets. We found our quantitative measures of cluster quality to be positively correlated with external standards of cluster quality.
引用
收藏
页码:309 / 318
页数:10
相关论文
共 22 条
[1]  
Anderberg M.R., 1973, Probability and Mathematical Statistics
[2]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[3]  
BARRETT MT, 2000, UWCSE001101 DEP COMP
[4]  
BENDOR A, 1999, RECOMB99
[5]   A genome-wide transcriptional analysis of the mitotic cell cycle [J].
Cho, RJ ;
Campbell, MJ ;
Winzeler, EA ;
Steinmetz, L ;
Conway, A ;
Wodicka, L ;
Wolfsberg, TG ;
Gabrielian, AE ;
Landsman, D ;
Lockhart, DJ ;
Davis, RW .
MOLECULAR CELL, 1998, 2 (01) :65-73
[6]   Exploring the metabolic and genetic control of gene expression on a genomic scale [J].
DeRisi, JL ;
Iyer, VR ;
Brown, PO .
SCIENCE, 1997, 278 (5338) :680-686
[7]  
Efron B., 1982, SOC IND APPL MATH CB, V38, DOI [10.1137/1.9781611970319, DOI 10.1137/1.9781611970319]
[8]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[9]   COMPARING PARTITIONS [J].
HUBERT, L ;
ARABIE, P .
JOURNAL OF CLASSIFICATION, 1985, 2 (2-3) :193-218
[10]   BOOTSTRAP TECHNIQUE IN CLUSTER-ANALYSIS [J].
JAIN, AK ;
MOREAU, JV .
PATTERN RECOGNITION, 1987, 20 (05) :547-568