Evaluation of clustering algorithms for gene expression data using gene ontology annotations

被引:3
作者
Ma Ning [1 ]
Zhang Zheng-guo [1 ]
机构
[1] Chinese Acad Med Sci, Peking Union Med Coll, Inst Basic Med Sci, Dept Biomed Engn,Sch Basic Med, Beijing 100005, Peoples R China
关键词
microarray; gene expression; clustering; gene ontology; TOOL;
D O I
10.3760/cma.j.issn.0366-6999.2012.17.015
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background Clustering is a useful exploratory technique for interpreting gene expression data to reveal groups of genes sharing common functional attributes. Biologists frequently face the problem of choosing an appropriate algorithm. We aimed to provide a standalone, easily accessible and biologically oriented criterion for expression data clustering evaluation. Methods An external criterion utilizing annotation based similarities between genes is proposed in this work. Gene ontology information is employed as the annotation source. Comparisons among six widely used clustering algorithms over various types of gene expression data sets were carried out based on the criterion proposed. Results The rank of these algorithms given by the criterion coincides with our common knowledge. Single-linkage has significantly poorer performance, even worse than the random algorithm. Ward's method archives the best performance in most cases. Conclusions The criterion proposed has a strong ability to distinguish among different clustering algorithms with different distance measurements. It is also demonstrated that analyzing main contributors of the criterion may offer some guidelines in finding local compact clusters. As an addition, we suggest using Ward's algorithm for gene expression data analysis. Chin Med J 2012;125(17):3048-3052
引用
收藏
页码:3048 / 3052
页数:5
相关论文
共 24 条
[1]   Improved scoring of functional groups from gene expression data by decorrelating GO graph structure [J].
Alexa, Adrian ;
Rahnenfuehrer, Joerg ;
Lengauer, Thomas .
BIOINFORMATICS, 2006, 22 (13) :1600-1607
[2]  
[Anonymous], 2005, FINDING GROUPS DATA, DOI DOI 10.1002/9780470316801
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]   The Universal Protein Resource (UniProt) [J].
Bairoch, Amos ;
Bougueleret, Lydie ;
Altairac, Severine ;
Amendolia, Valeria ;
Auchincloss, Andrea ;
Puy, Ghislaine Argoud ;
Axelsen, Kristian ;
Baratin, Delphine ;
Blatter, Marie-Claude ;
Boeckmann, Brigitte ;
Bollondi, Laurent ;
Boutet, Emmanuel ;
Quintaje, Silvia Braconi ;
Breuza, Lionel ;
Bridge, Alan ;
Saux, Virginie Bulliard-Le ;
decastro, Edouard ;
Ciampina, Luciane ;
Coral, Danielle ;
Coudert, Elisabeth ;
Cusin, Isabelle ;
David, Fabrice ;
Delbard, Gwennaelle ;
Dornevil, Dolnide ;
Duek-Roggli, Paula ;
Duvaud, Severine ;
Estreicher, Anne ;
Famiglietti, Livia ;
Farriol-Mathis, Nathalie ;
Ferro, Serenella ;
Feuermann, Marc ;
Gasteiger, Elisabeth ;
Gateau, Alain ;
Gehant, Sebastian ;
Gerritsen, Vivienne ;
Gos, Arnaud ;
Gruaz-Gumowski, Nadine ;
Hinz, Ursula ;
Hulo, Chantal ;
Hulo, Nicolas ;
Innocenti, Alessandro ;
James, Janet ;
Jain, Eric ;
Jimenez, Silvia ;
Jungo, Florence ;
Junker, Vivien ;
Keller, Guillaume ;
Lachaize, Corinne ;
Lane-Guermonprez, Lydie ;
Langendijk-Genevaux, Petra .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D190-D195
[5]   Ontologizer 2.0 - a multifunctional tool for GO term enrichment analysis and data exploration [J].
Bauer, Sebastian ;
Grossmann, Steffen ;
Vingron, Martin ;
Robinson, Peter N. .
BIOINFORMATICS, 2008, 24 (14) :1650-1651
[6]   Towards knowledge-based gene expression data mining [J].
Bellazzi, Riccardo ;
Zupan, Blaz .
JOURNAL OF BIOMEDICAL INFORMATICS, 2007, 40 (06) :787-802
[7]   Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival [J].
Chiaretti, S ;
Li, XC ;
Gentleman, R ;
Vitale, A ;
Vignetti, M ;
Mandelli, F ;
Ritz, J ;
Foa, R .
BLOOD, 2004, 103 (07) :2771-2778
[8]   A genome-wide transcriptional analysis of the mitotic cell cycle [J].
Cho, RJ ;
Campbell, MJ ;
Winzeler, EA ;
Steinmetz, L ;
Conway, A ;
Wodicka, L ;
Wolfsberg, TG ;
Gabrielian, AE ;
Landsman, D ;
Lockhart, DJ ;
Davis, RW .
MOLECULAR CELL, 1998, 2 (01) :65-73
[9]   Clustering cancer gene expression data: a comparative study [J].
de Souto, Marcilio C. P. ;
Costa, Ivan G. ;
de Araujo, Daniel S. A. ;
Ludermir, Teresa B. ;
Schliep, Alexander .
BMC BIOINFORMATICS, 2008, 9 (1)
[10]  
Everitt B., 1993, Cluster analysis, Vthird