In search of deterministic methods for initializing K-means and Gaussian mixture clustering

被引:84
|
作者
Su, Ting [1 ]
Dy, Jennifer G. [1 ]
机构
[1] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA
关键词
K-means; Gaussian mixture; initialization; PCA; clustering;
D O I
10.3233/IDA-2007-11402
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of K-means and Gaussian mixture model (GMM) clustering depends on the initial guess of partitions. Typically, clustering algorithms are initialized by random starts. In our search for a deterministic method, we found two promising approaches: principal component analysis (PCA) partitioning and Var-Part (Variance Partitioning). K-means clustering tries to minimize the sum-squared-error criterion. The largest eigenvector with the largest eigenvalue is the component which contributes to the largest sum-squared-error. Hence, a good candidate direction to project a cluster for splitting is the direction of the cluster's largest eigenvector, the basis for PCA partitioning. Similarly, GMM clustering maximizes the likelihood; minimizing the determinant of the covariance matrices of each cluster helps to increase the likelihood. The largest eigenvector contributes to the largest determinant and is thus a good candidate direction for splitting. However, PCA is computationally expensive. We, thus, introduce Var-Part, which is computationally less complex (with complexity equal to one K-means iteration) and approximates PCA partitioning assuming diagonal covariance matrix. Experiments reveal that Var-Part has similar performance with PCA partitioning, sometimes better, and leads K-means (and GMM) to yield sum-squared-error (and maximum-likelihood) values close to the optimum values obtained by several random-start runs and often at faster convergence rates.
引用
收藏
页码:319 / 338
页数:20
相关论文
共 50 条
  • [1] Initializing K-means Clustering Using Affinity Propagation
    Zhu, Yan
    Yu, Jian
    Jia, Caiyan
    HIS 2009: 2009 NINTH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS, VOL 1, PROCEEDINGS, 2009, : 338 - 343
  • [2] K-means and gaussian mixture modeling with a separation constraint
    Jiang, He
    Arias-Castro, Ery
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024,
  • [3] A local search approximation algorithm for k-means clustering
    Kanungo, T
    Mount, DM
    Netanyahu, NS
    Piatko, CD
    Silverman, R
    Wu, AY
    COMPUTATIONAL GEOMETRY-THEORY AND APPLICATIONS, 2004, 28 (2-3): : 89 - 112
  • [4] Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering
    de Amorim, Renato Cordeiro
    Mirkin, Boris
    PATTERN RECOGNITION, 2012, 45 (03) : 1061 - 1075
  • [5] Random Projection for k-means Clustering
    Sieranoja, Sami
    Franti, Pasi
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2018, PT I, 2018, 10841 : 680 - 689
  • [6] In Search of a New Initialization of K-Means Clustering for Color Quantization
    Frackiewicz, Mariusz
    Palus, Henryk
    EIGHTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2015), 2015, 9875
  • [7] Faster Mahalanobis K-Means Clustering for Gaussian Distributions
    Chokniwal, Ankita
    Singh, Manoj
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 947 - 952
  • [8] Green Clustering Analyzing Logistics Performance and Carbon Emissions with K-Means and Gaussian Mixture Models
    Chomjinda, Jiratchaya
    Piladaeng, Janjira
    Kulthon, Tanayot
    Thongnim, Pattharaporn
    2024 INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS, AND COMMUNICATIONS, ITC-CSCC 2024, 2024,
  • [9] DETERMINISTIC INITIALIZATION OF THE K-MEANS ALGORITHM USING HIERARCHICAL CLUSTERING
    Celebi, M. Emre
    Kingravi, Hassan A.
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2012, 26 (07)
  • [10] Initializing k-Means Efficiently: Benefits for Exploratory Cluster Analysis
    Fritz, Manuel
    Schwarz, Holger
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2019 CONFERENCES, 2019, 11877 : 146 - 163