In search of deterministic methods for initializing K-means and Gaussian mixture clustering

被引:84
|
作者
Su, Ting [1 ]
Dy, Jennifer G. [1 ]
机构
[1] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA
关键词
K-means; Gaussian mixture; initialization; PCA; clustering;
D O I
10.3233/IDA-2007-11402
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of K-means and Gaussian mixture model (GMM) clustering depends on the initial guess of partitions. Typically, clustering algorithms are initialized by random starts. In our search for a deterministic method, we found two promising approaches: principal component analysis (PCA) partitioning and Var-Part (Variance Partitioning). K-means clustering tries to minimize the sum-squared-error criterion. The largest eigenvector with the largest eigenvalue is the component which contributes to the largest sum-squared-error. Hence, a good candidate direction to project a cluster for splitting is the direction of the cluster's largest eigenvector, the basis for PCA partitioning. Similarly, GMM clustering maximizes the likelihood; minimizing the determinant of the covariance matrices of each cluster helps to increase the likelihood. The largest eigenvector contributes to the largest determinant and is thus a good candidate direction for splitting. However, PCA is computationally expensive. We, thus, introduce Var-Part, which is computationally less complex (with complexity equal to one K-means iteration) and approximates PCA partitioning assuming diagonal covariance matrix. Experiments reveal that Var-Part has similar performance with PCA partitioning, sometimes better, and leads K-means (and GMM) to yield sum-squared-error (and maximum-likelihood) values close to the optimum values obtained by several random-start runs and often at faster convergence rates.
引用
收藏
页码:319 / 338
页数:20
相关论文
共 50 条
  • [21] Initializing Cluster Center for K-Means Using Biogeography Based Optimization
    Kumar, Vijay
    Chhabra, Jitender Kumar
    Kumar, Dinesh
    ADVANCES IN COMPUTING, COMMUNICATION AND CONTROL, 2011, 125 : 448 - +
  • [22] Improved fuzzy art method for initializing K-means
    Ilhan S.
    Duru N.
    Adali E.
    International Journal of Computational Intelligence Systems, 2010, 3 (3) : 274 - 279
  • [23] Initializing FWSA K-Means With Feature Level Constraints
    He, Zhenfeng
    IEEE ACCESS, 2022, 10 : 132976 - 132987
  • [24] K*-Means: An Effective and Efficient K-means Clustering Algorithm
    Qi, Jianpeng
    Yu, Yanwei
    Wang, Lihong
    Liu, Jinglei
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCES ON BIG DATA AND CLOUD COMPUTING (BDCLOUD 2016) SOCIAL COMPUTING AND NETWORKING (SOCIALCOM 2016) SUSTAINABLE COMPUTING AND COMMUNICATIONS (SUSTAINCOM 2016) (BDCLOUD-SOCIALCOM-SUSTAINCOM 2016), 2016, : 242 - 249
  • [25] Unsupervised K-Means Clustering Algorithm
    Sinaga, Kristina P.
    Yang, Miin-Shen
    IEEE ACCESS, 2020, 8 : 80716 - 80727
  • [26] APPLICATION OF METAHEURISTICS TO K-MEANS CLUSTERING
    Lisin, A. V.
    Faizullin, R. T.
    COMPUTER OPTICS, 2015, 39 (03) : 406 - 412
  • [27] Modified k-Means Clustering Algorithm
    Patel, Vaishali R.
    Mehta, Rupa G.
    COMPUTATIONAL INTELLIGENCE AND INFORMATION TECHNOLOGY, 2011, 250 : 307 - +
  • [28] The MinMax k-Means clustering algorithm
    Tzortzis, Grigorios
    Likas, Aristidis
    PATTERN RECOGNITION, 2014, 47 (07) : 2505 - 2516
  • [29] A notion of stability for k-means clustering
    Le Gouic, T.
    Paris, Q.
    ELECTRONIC JOURNAL OF STATISTICS, 2018, 12 (02): : 4239 - 4263
  • [30] Importance of Initialization in K-Means Clustering
    Gupta, Anubhav
    Tomer, Antriksh
    Dahiya, Sonika
    2022 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL, COMPUTING, COMMUNICATION AND SUSTAINABLE TECHNOLOGIES (ICAECT), 2022,