Evaluation and comparison of gene clustering methods in microarray analysis

被引:200
作者
Thalamuthu, Anbupalam
Mukhopadhyay, Indranil
Zheng, Xiaojing
Tseng, George C. [1 ]
机构
[1] Univ Pittsburgh, Dept Human Genet, Pittsburgh, PA 15260 USA
[2] Univ Pittsburgh, Dept Biostat, Pittsburgh, PA 15261 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/btl406
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Microarray technology has been widely applied in biological and clinical studies for simultaneous monitoring of gene expression in thousands of genes. Gene clustering analysis is found useful for discovering groups of correlated genes potentially co-regulated or associated to the disease or conditions under investigation. Many clustering methods including hierarchical clustering, K-means, PAM, SOM, mixture model-based clustering and tight clustering have been widely used in the literature. Yet no comprehensive comparative study has been performed to evaluate the effectiveness of these methods. Results: In this paper, six gene clustering methods are evaluated by simulated data from a hierarchical log-normal model with various degrees of perturbation as well as four real datasets. A weighted Rand index is proposed for measuring similarity of two clustering results with possible scattered genes (i.e. a set of noise genes not being clustered). Performance of the methods in the real data is assessed by a predictive accuracy analysis through verified gene annotations. Our results show that tight clustering and model-based clustering consistently outperform other clustering methods both in simulated and real data while hierarchical clustering and SOM perform among the worst. Our analysis provides deep insight to the complicated gene clustering problem of expression profile and serves as a practical guideline for routine microarray cluster analysis.
引用
收藏
页码:2405 / 2412
页数:8
相关论文
共 40 条
  • [1] [Anonymous], 2002, MCLUST SOFTWARE MODE
  • [2] Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses
    Bhattacharjee, A
    Richards, WG
    Staunton, J
    Li, C
    Monti, S
    Vasa, P
    Ladd, C
    Beheshti, J
    Bueno, R
    Gillette, M
    Loda, M
    Weber, G
    Mark, EJ
    Lander, ES
    Wong, W
    Johnson, BE
    Golub, TR
    Sugarbaker, DJ
    Meyerson, M
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) : 13790 - 13795
  • [3] Exploring the new world of the genome with DNA microarrays
    Brown, PO
    Botstein, D
    [J]. NATURE GENETICS, 1999, 21 (Suppl 1) : 33 - 37
  • [4] Remodeling of yeast genome expression in response to environmental changes
    Causton, HC
    Ren, B
    Koh, SS
    Harbison, CT
    Kanin, E
    Jennings, EG
    Lee, TI
    True, HL
    Lander, ES
    Young, RA
    [J]. MOLECULAR BIOLOGY OF THE CELL, 2001, 12 (02) : 323 - 337
  • [5] CHENG Y, 2000, P 8 INT C INT SYST M, P93
  • [6] Fuzzy C-means method for clustering microarray data
    Dembélé, D
    Kastner, P
    [J]. BIOINFORMATICS, 2003, 19 (08) : 973 - 980
  • [7] Dudoit S, 2002, GENOME BIOL, V3
  • [8] Cluster analysis and display of genome-wide expression patterns
    Eisen, MB
    Spellman, PT
    Brown, PO
    Botstein, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) : 14863 - 14868
  • [9] Model-based clustering, discriminant analysis, and density estimation
    Fraley, C
    Raftery, AE
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (458) : 611 - 631
  • [10] Computational cluster validation in post-genomic data analysis
    Handl, J
    Knowles, J
    Kell, DB
    [J]. BIOINFORMATICS, 2005, 21 (15) : 3201 - 3212