Finding reproducible cluster partitions for the k-means algorithm

被引:0
|
作者
Paulo JG Lisboa
Terence A Etchells
Ian H Jarman
Simon J Chambers
机构
[1] Liverpool John Moores University,School of Computing and Mathematical Sciences
来源
BMC Bioinformatics | / 14卷
关键词
Cluster Solution; Adjust Rand Index; Cluster Partition; Point Dataset; Dual Measure;
D O I
暂无
中图分类号
学科分类号
摘要
K-means clustering is widely used for exploratory data analysis. While its dependence on initialisation is well-known, it is common practice to assume that the partition with lowest sum-of-squares (SSQ) total i.e. within cluster variance, is both reproducible under repeated initialisations and also the closest that k-means can provide to true structure, when applied to synthetic data. We show that this is generally the case for small numbers of clusters, but for values of k that are still of theoretical and practical interest, similar values of SSQ can correspond to markedly different cluster partitions.
引用
收藏
相关论文
empty
未找到相关数据