Subspace K-means clustering

被引:0
|
作者
Marieke E. Timmerman
Eva Ceulemans
Kim De Roover
Karla Van Leeuwen
机构
[1] University of Groningen,Heymans Institute for Psychology, Psychometrics & Statistics
[2] K.U. Leuven,Educational Sciences
[3] K.U. Leuven,Parenting and Special Education
来源
Behavior Research Methods | 2013年 / 45卷
关键词
Cluster analysis; Cluster recovery; Multivariate data; Reduced ; -means; means; Factorial ; -means; Mixtures of factor analyzers; MCLUST;
D O I
暂无
中图分类号
学科分类号
摘要
To achieve an insightful clustering of multivariate data, we propose subspace K-means. Its central idea is to model the centroids and cluster residuals in reduced spaces, which allows for dealing with a wide range of cluster types and yields rich interpretations of the clusters. We review the existing related clustering methods, including deterministic, stochastic, and unsupervised learning approaches. To evaluate subspace K-means, we performed a comparative simulation study, in which we manipulated the overlap of subspaces, the between-cluster variance, and the error variance. The study shows that the subspace K-means algorithm is sensitive to local minima but that the problem can be reasonably dealt with by using partitions of various cluster procedures as a starting point for the algorithm. Subspace K-means performs very well in recovering the true clustering across all conditions considered and appears to be superior to its competitor methods: K-means, reduced K-means, factorial K-means, mixtures of factor analyzers (MFA), and MCLUST. The best competitor method, MFA, showed a performance similar to that of subspace K-means in easy conditions but deteriorated in more difficult ones. Using data from a study on parental behavior, we show that subspace K-means analysis provides a rich insight into the cluster characteristics, in terms of both the relative positions of the clusters (via the centroids) and the shape of the clusters (via the within-cluster residuals).
引用
收藏
页码:1011 / 1023
页数:12
相关论文
共 50 条
  • [41] A variable-selection heuristic for K-means clustering
    Michael J. Brusco
    J. Dennis Cradit
    Psychometrika, 2001, 66 : 249 - 270
  • [42] A variable-selection heuristic for K-means clustering
    Brusco, MJ
    Cradit, JD
    PSYCHOMETRIKA, 2001, 66 (02) : 249 - 270
  • [43] An extension of the K-means algorithm to clustering skewed data
    Melnykov, Volodymyr
    Zhu, Xuwen
    COMPUTATIONAL STATISTICS, 2019, 34 (01) : 373 - 394
  • [44] An improved preconditioned unsupervised K-means clustering algorithm
    Sun, Tiantian
    Peng, Xiaofei
    Ge, Wenxiu
    Xu, Weiwei
    COMPUTATIONAL STATISTICS, 2025,
  • [45] Investigation of Internal Validity Measures for K-Means Clustering
    Baarsch, Jonathan
    Celebi, M. Emre
    INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, IMECS 2012, VOL I, 2012, : 471 - 476
  • [46] An extension of the K-means algorithm to clustering skewed data
    Volodymyr Melnykov
    Xuwen Zhu
    Computational Statistics, 2019, 34 : 373 - 394
  • [47] Stochastic Backward Euler: An Implicit Gradient Descent Algorithm for k-Means Clustering
    Penghang Yin
    Minh Pham
    Adam Oberman
    Stanley Osher
    Journal of Scientific Computing, 2018, 77 : 1133 - 1146
  • [48] Snipping for robust k-means clustering under component-wise contamination
    Alessio Farcomeni
    Statistics and Computing, 2014, 24 : 907 - 919
  • [49] CPI-model-based analysis of sparse k-means clustering algorithms
    Kazuo Aoyama
    Kazumi Saito
    Tetsuo Ikeda
    International Journal of Data Science and Analytics, 2021, 12 : 229 - 248
  • [50] A spatial-temporal clustering for low ocean renewable energy resources using K-means clustering
    Uti, Mat Nizam
    Din, Ami Hassan Md
    Yusof, Norhakim
    Yaakob, Omar
    RENEWABLE ENERGY, 2023, 219