Comparison of Clustering Approaches for Gene Expression Data

被引:8
作者
Borg, Anton [1 ]
Lavesson, Niklas [1 ]
Boeva, Veselka [2 ]
机构
[1] Blekinge Inst Technol, Sch Comp, SE-37179 Karlskrona, Sweden
[2] Tech Univ, Comp Syst & Technol Dept, Sofia, Bulgaria
来源
TWELFTH SCANDINAVIAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (SCAI 2013) | 2013年 / 257卷
关键词
gene expression data; graph-based clustering algorithm; minimum cut clustering; partitioning algorithm; dynamic time warping; VALIDATION;
D O I
10.3233/978-1-61499-330-8-55
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering algorithms have been used to divide genes into groups according to the degree of their expression similarity. Such a grouping may suggest that the respective genes are correlated and/or co-regulated, and subsequently indicates that the genes could possibly share a common biological role. In this paper, four clustering algorithms are investigated: k-means, cut-clustering, spectral and expectation-maximization. The algorithms are benchmarked against each other. The performance of the four clustering algorithms is studied on time series expression data using Dynamic Time Warping distance in order to measure similarity between gene expression profiles. Four different cluster validation measures are used to evaluate the clustering algorithms: Connectivity and Silhouette Index for estimating the quality of clusters, Jaccard Index for evaluating the stability of a cluster method and Rand Index for assessing the accuracy. The obtained results are analyzed by Friedman's test and the Nemenyi post-hoc test. K-means is demonstrated to be significantly better than the spectral clustering algorithm under the Silhouette and Rand validation indices.
引用
收藏
页码:55 / 64
页数:10
相关论文
共 25 条
[1]  
[Anonymous], 2009, Proceedings of the Eighth Australasian Data Mining Conference-, DOI DOI 10.1007/s10115-004-0154-9
[2]  
Boeva V, 2010, STUD COMPUT INTELL, V299, P445
[3]   Gene expression clustering: a novel graph partitioning approach [J].
Chen, Yanhua ;
Dong, Ming ;
Rege, Manjeet .
2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, :1542-1547
[4]  
Cohen J, 2011, LECT NOTES COMPUT SC, V7916, P258, DOI 10.1007/978-3-642-24650-0_22
[5]   Comparisons and validation of statistical clustering techniques for microarray gene expression data [J].
Datta, S ;
Datta, S .
BIOINFORMATICS, 2003, 19 (04) :459-466
[6]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[7]  
Flake G. W., 2004, Internet Mathematics, V1, P385, DOI DOI 10.1080/15427951.2004.10129093
[8]  
Görke R, 2009, LECT NOTES COMPUT SC, V5664, P339, DOI 10.1007/978-3-642-03367-4_30
[9]   Gene Expression Data Cluster Analysis [J].
Guo, Ping ;
Deng, Xiao-yan .
2009 WASE INTERNATIONAL CONFERENCE ON INFORMATION ENGINEERING, ICIE 2009, VOL I, 2009, :99-102
[10]  
Hall M., 2009, SIGKDD Explorations, V11, P10, DOI DOI 10.1145/1656274.1656278