K-walks: clustering gene-expression data using a K-means clustering algorithm optimised by random walks

被引:3
作者
Yao, Min [1 ]
Wu, Qinghua [2 ]
Li, Juan [3 ]
Huang, Tinghua [1 ]
机构
[1] Yangtze Univ, Coll Anim Sci, Jingzhou 434025, Hubei, Peoples R China
[2] Yangtze Univ, Coll Life Sci, Jingzhou 434025, Hubei, Peoples R China
[3] Xiangtan Univ, Coll Chem, Xiangtan 411105, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
gene expression; K-means; random walks; DISCOVERY; TISSUES;
D O I
10.1504/IJDMB.2016.080039
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Gene-expression data obtained from the biological experiments always have thousands of dimensions, which can be very confusing and perplexing to biologists when viewed as a whole. Clustering analysis is an explorative data-mining technique for statistical data analysis that is widely used in gene-expression data analysis. Practical approaches employed for solving the clustering problem use iterative procedures such as K-means, which typically converge to one of many local minima. Here, we propose a simulated annealing approximation algorithm that is optimised using random walks to solve the K-means clustering problem. The algorithm is verified with synthetic and real-world data sets and compared with other well-known K-means variants. The new algorithm is less sensitive to initial cluster centres, and the primary strength of our algorithm is its ability to produce high-quality clustering results for thousands of high-dimensional data. However, the algorithm is computationally intensive.
引用
收藏
页码:121 / 140
页数:20
相关论文
共 37 条
  • [1] Anderberg M.R., 1973, CLUSTER ANAL APPL, DOI [10.1016/c2013-0-06161-0, DOI 10.1016/C2013-0-06161-0]
  • [2] Arthur D., 2007, P 18 ANN ACM SIAM S, DOI DOI 10.1145/1283383.1283494
  • [3] Bache K., 2013, UCI Machine Learning Repository
  • [4] A CLUSTERING TECHNIQUE FOR SUMMARIZING MULTIVARIATE DATA
    BALL, GH
    HALL, DJ
    [J]. BEHAVIORAL SCIENCE, 1967, 12 (02): : 153 - &
  • [5] NCBI GEO: archive for functional genomics data sets-update
    Barrett, Tanya
    Wilhite, Stephen E.
    Ledoux, Pierre
    Evangelista, Carlos
    Kim, Irene F.
    Tomashevsky, Maxim
    Marshall, Kimberly A.
    Phillippy, Katherine H.
    Sherman, Patti M.
    Holko, Michelle
    Yefanov, Andrey
    Lee, Hyeseung
    Zhang, Naigong
    Robertson, Cynthia L.
    Serova, Nadezhda
    Davis, Sean
    Soboleva, Alexandra
    [J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) : D991 - D995
  • [6] Bradley P. S., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P9
  • [7] A comparative study of efficient initialization methods for the k-means clustering algorithm
    Celebi, M. Emre
    Kingravi, Hassan A.
    Vela, Patricio A.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (01) : 200 - 210
  • [8] Clustering and selecting suppliers based on simulated annealing algorithms
    Che, Z. H.
    [J]. COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2012, 63 (01) : 228 - 238
  • [9] Using Hybrid Hierarchical K-means (HHK) clustering algorithm for protein sequence motif Super-Rule-Tree (SRT) structure construction
    Chen, Bernard
    He, Jieyue
    Pellicer, Stephen
    Pan, Yi
    [J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2010, 4 (03) : 316 - 330
  • [10] The transcriptional program of sporulation in budding yeast
    Chu, S
    DeRisi, J
    Eisen, M
    Mulholland, J
    Botstein, D
    Brown, PO
    Herskowitz, I
    [J]. SCIENCE, 1998, 282 (5389) : 699 - 705