K-walks: clustering gene-expression data using a K-means clustering algorithm optimised by random walks

被引：3

作者：

Yao, Min ^{[1
]}

Wu, Qinghua ^{[2
]}

Li, Juan ^{[3
]}

Huang, Tinghua ^{[1
]}

机构：

[1] Yangtze Univ, Coll Anim Sci, Jingzhou 434025, Hubei, Peoples R China

[2] Yangtze Univ, Coll Life Sci, Jingzhou 434025, Hubei, Peoples R China

[3] Xiangtan Univ, Coll Chem, Xiangtan 411105, Hunan, Peoples R China

来源：

INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS | 2016年 / 16卷 / 02期

基金：

中国国家自然科学基金;

关键词：

gene expression; K-means; random walks; DISCOVERY; TISSUES;

D O I：

10.1504/IJDMB.2016.080039

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Gene-expression data obtained from the biological experiments always have thousands of dimensions, which can be very confusing and perplexing to biologists when viewed as a whole. Clustering analysis is an explorative data-mining technique for statistical data analysis that is widely used in gene-expression data analysis. Practical approaches employed for solving the clustering problem use iterative procedures such as K-means, which typically converge to one of many local minima. Here, we propose a simulated annealing approximation algorithm that is optimised using random walks to solve the K-means clustering problem. The algorithm is verified with synthetic and real-world data sets and compared with other well-known K-means variants. The new algorithm is less sensitive to initial cluster centres, and the primary strength of our algorithm is its ability to produce high-quality clustering results for thousands of high-dimensional data. However, the algorithm is computationally intensive.

引用

页码：121 / 140

页数：20

共 37 条

[1] Anderberg M.R., 1973, CLUSTER ANAL APPL, DOI [10.1016/c2013-0-06161-0, DOI 10.1016/C2013-0-06161-0]
[2] Arthur D., 2007, P 18 ANN ACM SIAM S, DOI DOI 10.1145/1283383.1283494
[3] Bache K., 2013, UCI Machine Learning Repository
[4] A CLUSTERING TECHNIQUE FOR SUMMARIZING MULTIVARIATE DATA
BALL, GH
HALL, DJ
[J]. BEHAVIORAL SCIENCE, 1967, 12 (02): : 153 - &
[5] NCBI GEO: archive for functional genomics data sets-update
Barrett, Tanya
Wilhite, Stephen E.
Ledoux, Pierre
Evangelista, Carlos
Kim, Irene F.
Tomashevsky, Maxim
Marshall, Kimberly A.
Phillippy, Katherine H.
Sherman, Patti M.
Holko, Michelle
Yefanov, Andrey
Lee, Hyeseung
Zhang, Naigong
Robertson, Cynthia L.
Serova, Nadezhda
Davis, Sean
Soboleva, Alexandra
[J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) : D991 - D995
[6] Bradley P. S., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P9
[7] A comparative study of efficient initialization methods for the k-means clustering algorithm
Celebi, M. Emre
Kingravi, Hassan A.
Vela, Patricio A.
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (01) : 200 - 210
[8] Clustering and selecting suppliers based on simulated annealing algorithms
Che, Z. H.
[J]. COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2012, 63 (01) : 228 - 238
[9] Using Hybrid Hierarchical K-means (HHK) clustering algorithm for protein sequence motif Super-Rule-Tree (SRT) structure construction
Chen, Bernard
He, Jieyue
Pellicer, Stephen
Pan, Yi
[J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2010, 4 (03) : 316 - 330
[10] The transcriptional program of sporulation in budding yeast
Chu, S
DeRisi, J
Eisen, M
Mulholland, J
Botstein, D
Brown, PO
Herskowitz, I
[J]. SCIENCE, 1998, 282 (5389) : 699 - 705

← 1 2 3 4 →