Efficiently mining gene expression data via a novel parameterless clustering method

被引:89
作者
Tseng, VS [1 ]
Kao, CP [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 701, Taiwan
关键词
machine learning; data mining; clustering; mining methods and algorithms;
D O I
10.1109/TCBB.2005.56
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Clustering analysis has been an important research topic in the machine learning field due to the wide applications. In recent years, it has even become a valuable and useful tool for in-silico analysis of microarray or gene expression data. Although a number of clustering methods have been proposed, they are confronted with difficulties in meeting the requirements of automation, high quality, and high efficiency at the same time. In this paper, we propose a novel, parameterless and efficient clustering algorithm, namely, Correlation Search Technique (CST), which fits for analysis of gene expression data. The unique feature of CST is it incorporates the validation techniques into the clustering process so that high quality clustering results can be produced on the fly. Through experimental evaluation, CST is shown to outperform other clustering methods greatly in terms of clustering quality, efficiency, and automation on both of synthetic and real data sets.
引用
收藏
页码:355 / 365
页数:11
相关论文
共 21 条
[1]  
Aldenderfer M., 1984, Cluster Analysis, DOI DOI 10.4135/9781412983648
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]   Clustering gene expression patterns [J].
Ben-Dor, A ;
Shamir, R ;
Yakhini, Z .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (3-4) :281-297
[4]   A MASSIVELY PARALLEL ARCHITECTURE FOR A SELF-ORGANIZING NEURAL PATTERN-RECOGNITION MACHINE [J].
CARPENTER, GA ;
GROSSBERG, S .
COMPUTER VISION GRAPHICS AND IMAGE PROCESSING, 1987, 37 (01) :54-115
[5]   Data mining: An overview from a database perspective [J].
Chen, MS ;
Han, JW ;
Yu, PS .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1996, 8 (06) :866-883
[6]   CLUSTER SEPARATION MEASURE [J].
DAVIES, DL ;
BOULDIN, DW .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1979, 1 (02) :224-227
[7]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[8]  
Guha S., 1998, SIGMOD Record, V27, P73, DOI 10.1145/276305.276312
[9]   ROCK: A robust clustering algorithm for categorical attributes [J].
Guha, S ;
Rastogi, R ;
Shim, K .
15TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1999, :512-521
[10]   Visual cluster validity for prototype generator clustering models [J].
Hathaway, RJ ;
Bezdek, JC .
PATTERN RECOGNITION LETTERS, 2003, 24 (9-10) :1563-1569