Modified fuzzy gap statistic for estimating preferable number of clusters in fuzzy k-means clustering

被引：28

作者：

Arima, Chinatsu ^{[1
]}

Hakamada, Kazumi ^{[1
]}

Okamoto, Masahiro ^{[1
]}

Hanai, Taizo ^{[1
]}

机构：

[1] Kyushu Univ, Grad Sch Syst Life Sci, Higashi Ku, Fukuoka 8128581, Japan

来源：

JOURNAL OF BIOSCIENCE AND BIOENGINEERING | 2008年 / 105卷 / 03期

关键词：

clustering; validity index; fuzzy k-means; microarray data analysis; estimation of number of clusters;

D O I：

10.1263/jbb.105.273

中图分类号：

Q81 [生物工程学（生物技术）]; Q93 [微生物学];

学科分类号：

071005 ; 0836 ; 090102 ; 100705 ;

摘要：

In clustering methods, the estimation of the optimal number of clusters is significant for subsequent analysis. Without detailed biological information on the genes involved, the evaluation of the number of clusters becomes difficult, and we have to rely on an internal measure that is based on the distribution of the data of the clustering result. The Gap statistic has been proposed as a superior method for estimating the number of clusters in crisp clustering. In this study, we proposed a modified Fuzzy Gap statistic (MFGS) and applied it to fuzzy k-means clustering. For estimating the number of clusters, fuzzy k-means clustering with the MFGS was applied to two artificial data sets with noise and to two experimentally observed gene expression data sets. For the artificial data sets, compared with other internal measures, the MFGS showed a higher performance in terms of robustness against noise for estimating the optimal number of clusters. Moreover, it could be used to estimate the optimal number of clusters in experimental data sets. It was confirmed that the proposed MFGS is a useful method for estimating the number of clusters for microarray data sets.

引用

页码：273 / 281

页数：9

共 36 条

[1] Model order selection for bio-molecular data clustering [J].

Bertoni, Alberto ;

Valentini, Giorgio .

BMC BIOINFORMATICS, 2007, 8 (Suppl 2)

[2]

Bezdek J. C., 1973, Journal of Cybernetics, V3, P58, DOI 10.1080/01969727308546047

[3]

Bezdek J. C, 1975, P 8 INT C NUM TAX SA, P143

[4] A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment [J].

Campello, R. J. G. B. .

PATTERN RECOGNITION LETTERS, 2007, 28 (07) :833-841

[5] A genome-wide transcriptional analysis of the mitotic cell cycle [J].

Cho, RJ ;

Campbell, MJ ;

Winzeler, EA ;

Steinmetz, L ;

Conway, A ;

Wodicka, L ;

Wolfsberg, TG ;

Gabrielian, AE ;

Landsman, D ;

Lockhart, DJ ;

Davis, RW .

MOLECULAR CELL, 1998, 2 (01) :65-73

[6] Transcriptional regulation and function during the human cell cycle [J].

Cho, RJ ;

Huang, MX ;

Campbell, MJ ;

Dong, HL ;

Steinmetz, L ;

Sapinoso, L ;

Hampton, G ;

Elledge, SJ ;

Davis, RW ;

Lockhart, DJ .

NATURE GENETICS, 2001, 27 (01) :48-54

[7] The transcriptional program of sporulation in budding yeast [J].

Chu, S ;

DeRisi, J ;

Eisen, M ;

Mulholland, J ;

Botstein, D ;

Brown, PO ;

Herskowitz, I .

SCIENCE, 1998, 282 (5389) :699-705

[8] CLUSTER SEPARATION MEASURE [J].

DAVIES, DL ;

BOULDIN, DW .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1979, 1 (02) :224-227

[9] Fuzzy C-means method for clustering microarray data [J].

Dembélé, D ;

Kastner, P .

BIOINFORMATICS, 2003, 19 (08) :973-980

[10]

Denceud L, 2006, STUD CLASS DATA ANAL, P21

← 1 2 3 4 →