Evaluation and comparison of gene clustering methods in microarray analysis

被引：200

作者：

Thalamuthu, Anbupalam

Mukhopadhyay, Indranil

Zheng, Xiaojing

Tseng, George C. ^{[1
]}

机构：

[1] Univ Pittsburgh, Dept Human Genet, Pittsburgh, PA 15260 USA

[2] Univ Pittsburgh, Dept Biostat, Pittsburgh, PA 15261 USA

来源：

BIOINFORMATICS | 2006年 / 22卷 / 19期

基金：

美国国家卫生研究院;

关键词：

D O I：

10.1093/bioinformatics/btl406

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: Microarray technology has been widely applied in biological and clinical studies for simultaneous monitoring of gene expression in thousands of genes. Gene clustering analysis is found useful for discovering groups of correlated genes potentially co-regulated or associated to the disease or conditions under investigation. Many clustering methods including hierarchical clustering, K-means, PAM, SOM, mixture model-based clustering and tight clustering have been widely used in the literature. Yet no comprehensive comparative study has been performed to evaluate the effectiveness of these methods. Results: In this paper, six gene clustering methods are evaluated by simulated data from a hierarchical log-normal model with various degrees of perturbation as well as four real datasets. A weighted Rand index is proposed for measuring similarity of two clustering results with possible scattered genes (i.e. a set of noise genes not being clustered). Performance of the methods in the real data is assessed by a predictive accuracy analysis through verified gene annotations. Our results show that tight clustering and model-based clustering consistently outperform other clustering methods both in simulated and real data while hierarchical clustering and SOM perform among the worst. Our analysis provides deep insight to the complicated gene clustering problem of expression profile and serves as a practical guideline for routine microarray cluster analysis.

引用

页码：2405 / 2412

页数：8

共 40 条

[1] [Anonymous], 2002, MCLUST SOFTWARE MODE
[2] Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses
Bhattacharjee, A
Richards, WG
Staunton, J
Li, C
Monti, S
Vasa, P
Ladd, C
Beheshti, J
Bueno, R
Gillette, M
Loda, M
Weber, G
Mark, EJ
Lander, ES
Wong, W
Johnson, BE
Golub, TR
Sugarbaker, DJ
Meyerson, M
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) : 13790 - 13795
[3] Exploring the new world of the genome with DNA microarrays
Brown, PO
Botstein, D
[J]. NATURE GENETICS, 1999, 21 (Suppl 1) : 33 - 37
[4] Remodeling of yeast genome expression in response to environmental changes
Causton, HC
Ren, B
Koh, SS
Harbison, CT
Kanin, E
Jennings, EG
Lee, TI
True, HL
Lander, ES
Young, RA
[J]. MOLECULAR BIOLOGY OF THE CELL, 2001, 12 (02) : 323 - 337
[5] CHENG Y, 2000, P 8 INT C INT SYST M, P93
[6] Fuzzy C-means method for clustering microarray data
Dembélé, D
Kastner, P
[J]. BIOINFORMATICS, 2003, 19 (08) : 973 - 980
[7] Dudoit S, 2002, GENOME BIOL, V3
[8] Cluster analysis and display of genome-wide expression patterns
Eisen, MB
Spellman, PT
Brown, PO
Botstein, D
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) : 14863 - 14868
[9] Model-based clustering, discriminant analysis, and density estimation
Fraley, C
Raftery, AE
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (458) : 611 - 631
[10] Computational cluster validation in post-genomic data analysis
Handl, J
Knowles, J
Kell, DB
[J]. BIOINFORMATICS, 2005, 21 (15) : 3201 - 3212

← 1 2 3 4 →