Fuzzy C-means method for clustering microarray data

被引:344
作者
Dembélé, D [1 ]
Kastner, P [1 ]
机构
[1] ULP, CNRS, IMSERM, Inst Genet & Biol Mol & Cellulaire, F-67404 Illkirch Graffenstaden, France
关键词
D O I
10.1093/bioinformatics/btg119
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Clustering analysis of data from DNA microarray hybridization studies is essential for identifying biologically relevant groups of genes. Partitional clustering methods such as K-means or self-organizing maps assign each gene to a single cluster. However, these methods do not provide information about the influence of a given gene for the overall shape of clusters. Here we apply a fuzzy partitioning method, Fuzzy C-means (FCM), to attribute cluster membership values to genes. Results: A major problem in applying the FCM method for clustering microarray data is the choice of the fuzziness parameter m. We show that the commonly used value m = 2 is not appropriate for some data sets, and that optimal values for m vary widely from one data set to another. We propose an empirical method, based on the distribution of distances between genes in a given data set, to determine an adequate value for m. By setting threshold levels for the membership values, genes which are tigthly associated to a given cluster can be selected. Using a yeast cell cycle data set as an example, we show that this selection increases the overall biological significance of the genes within the cluster.
引用
收藏
页码:973 / 980
页数:8
相关论文
共 14 条
[1]  
[Anonymous], Pattern Recognition With Fuzzy Objective Function Algorithms
[2]   ON SOME SIGNIFICANCE TESTS IN CLUSTER-ANALYSIS [J].
BOCK, HH .
JOURNAL OF CLASSIFICATION, 1985, 2 (01) :77-108
[3]   A genome-wide transcriptional analysis of the mitotic cell cycle [J].
Cho, RJ ;
Campbell, MJ ;
Winzeler, EA ;
Steinmetz, L ;
Conway, A ;
Wodicka, L ;
Wolfsberg, TG ;
Gabrielian, AE ;
Landsman, D ;
Lockhart, DJ ;
Davis, RW .
MOLECULAR CELL, 1998, 2 (01) :65-73
[4]   Inference from clustering with application to gene-expression microarrays [J].
Dougherty, ER ;
Barrera, J ;
Brun, M ;
Kim, S ;
Cesar, RM ;
Chen, YD ;
Bittner, M ;
Trent, JM .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2002, 9 (01) :105-126
[5]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[6]  
HOAGLIN DC, 2002, UNDERSTANDING ROBUST
[7]   The transcriptional program in the response of human fibroblasts to serum [J].
Iyer, VR ;
Eisen, MB ;
Ross, DT ;
Schuler, G ;
Moore, T ;
Lee, JCF ;
Trent, JM ;
Staudt, LM ;
Hudson, J ;
Boguski, MS ;
Lashkari, D ;
Shalon, D ;
Botstein, D ;
Brown, PO .
SCIENCE, 1999, 283 (5398) :83-87
[8]   APPLICATION OF FUZZY-SETS TO CLIMATIC CLASSIFICATION [J].
MCBRATNEY, AB ;
MOORE, AW .
AGRICULTURAL AND FOREST METEOROLOGY, 1985, 35 (1-4) :165-185
[9]   NONUNIQUENESS AND INVERSIONS IN CLUSTER-ANALYSIS [J].
MORGAN, BJT ;
RAY, APG .
APPLIED STATISTICS-JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C, 1995, 44 (01) :117-134
[10]   SILHOUETTES - A GRAPHICAL AID TO THE INTERPRETATION AND VALIDATION OF CLUSTER-ANALYSIS [J].
ROUSSEEUW, PJ .
JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 1987, 20 :53-65