Finding reproducible cluster partitions for the k-means algorithm

被引:8
|
作者
Lisboa, Paulo J. G. [1 ]
Etchells, Terence A. [1 ]
Jarman, Ian H. [1 ]
Chambers, Simon J. [1 ]
机构
[1] Liverpool John Moores Univ, Sch Comp & Math Sci, Liverpool L3 3AF, Merseyside, England
来源
BMC BIOINFORMATICS | 2013年 / 14卷
关键词
STABILITY;
D O I
10.1186/1471-2105-14-S1-S8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
K-means clustering is widely used for exploratory data analysis. While its dependence on initialisation is well-known, it is common practice to assume that the partition with lowest sum-of-squares (SSQ) total i.e. within cluster variance, is both reproducible under repeated initialisations and also the closest that k-means can provide to true structure, when applied to synthetic data. We show that this is generally the case for small numbers of clusters, but for values of k that are still of theoretical and practical interest, similar values of SSQ can correspond to markedly different cluster partitions. This paper extends stability measures previously presented in the context of finding optimal values of cluster number, into a component of a 2-d map of the local minima found by the k-means algorithm, from which not only can values of k be identified for further analysis but, more importantly, it is made clear whether the best SSQ is a suitable solution or whether obtaining a consistently good partition requires further application of the stability index. The proposed method is illustrated by application to five synthetic datasets replicating a real world breast cancer dataset with varying data density, and a large bioinformatics dataset.
引用
收藏
页数:19
相关论文
共 21 条
  • [1] CNAK: Cluster number assisted K-means
    Saha, Jayasree
    Mukherjee, Jayanta
    PATTERN RECOGNITION, 2021, 110
  • [2] HOW THE INITIALIZATION AFFECTS THE STABILITY OF THE k-MEANS ALGORITHM
    Bubeck, Sebastien
    Meila, Marina
    von Luxburg, Ulrike
    ESAIM-PROBABILITY AND STATISTICS, 2012, 16 : 436 - 452
  • [3] Feature selection for k-means clustering stability: theoretical analysis and an algorithm
    Mavroeidis, Dimitrios
    Marchiori, Elena
    DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 28 (04) : 918 - 960
  • [4] Feature selection for k-means clustering stability: theoretical analysis and an algorithm
    Dimitrios Mavroeidis
    Elena Marchiori
    Data Mining and Knowledge Discovery, 2014, 28 : 918 - 960
  • [5] Hybridization of Chaos and Flower Pollination Algorithm over K-Means for data clustering
    Kaur, Arvinder
    Pal, Saibal Kumar
    Singh, Amrit Pal
    APPLIED SOFT COMPUTING, 2020, 97 (97)
  • [6] t-k-means: A ROBUST AND STABLE k-means VARIANT
    Li, Yiming
    Zhang, Yang
    Tang, Qingtao
    Huang, Weipeng
    Jiang, Yong
    Xia, Shu-Tao
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3120 - 3124
  • [7] A notion of stability for k-means clustering
    Le Gouic, T.
    Paris, Q.
    ELECTRONIC JOURNAL OF STATISTICS, 2018, 12 (02): : 4239 - 4263
  • [8] Stability and model selection in k-means clustering
    Shamir, Ohad
    Tishby, Naftali
    MACHINE LEARNING, 2010, 80 (2-3) : 213 - 243
  • [9] Stability and model selection in k-means clustering
    Ohad Shamir
    Naftali Tishby
    Machine Learning, 2010, 80 : 213 - 243
  • [10] Initializing FWSA K-Means With Feature Level Constraints
    He, Zhenfeng
    IEEE ACCESS, 2022, 10 : 132976 - 132987