Finding reproducible cluster partitions for the k-means algorithm

被引：0

作者：

Paulo JG Lisboa

Terence A Etchells

Ian H Jarman

Simon J Chambers

机构：

[1] Liverpool John Moores University,School of Computing and Mathematical Sciences

来源：

BMC Bioinformatics | / 14卷

关键词：

Cluster Solution; Adjust Rand Index; Cluster Partition; Point Dataset; Dual Measure;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

K-means clustering is widely used for exploratory data analysis. While its dependence on initialisation is well-known, it is common practice to assume that the partition with lowest sum-of-squares (SSQ) total i.e. within cluster variance, is both reproducible under repeated initialisations and also the closest that k-means can provide to true structure, when applied to synthetic data. We show that this is generally the case for small numbers of clusters, but for values of k that are still of theoretical and practical interest, similar values of SSQ can correspond to markedly different cluster partitions.

引用