In search of deterministic methods for initializing K-means and Gaussian mixture clustering

被引：84

作者：

Su, Ting ^{[1
]}

Dy, Jennifer G. ^{[1
]}

机构：

[1] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA

来源：

INTELLIGENT DATA ANALYSIS | 2007年 / 11卷 / 04期

关键词：

K-means; Gaussian mixture; initialization; PCA; clustering;

D O I：

10.3233/IDA-2007-11402

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The performance of K-means and Gaussian mixture model (GMM) clustering depends on the initial guess of partitions. Typically, clustering algorithms are initialized by random starts. In our search for a deterministic method, we found two promising approaches: principal component analysis (PCA) partitioning and Var-Part (Variance Partitioning). K-means clustering tries to minimize the sum-squared-error criterion. The largest eigenvector with the largest eigenvalue is the component which contributes to the largest sum-squared-error. Hence, a good candidate direction to project a cluster for splitting is the direction of the cluster's largest eigenvector, the basis for PCA partitioning. Similarly, GMM clustering maximizes the likelihood; minimizing the determinant of the covariance matrices of each cluster helps to increase the likelihood. The largest eigenvector contributes to the largest determinant and is thus a good candidate direction for splitting. However, PCA is computationally expensive. We, thus, introduce Var-Part, which is computationally less complex (with complexity equal to one K-means iteration) and approximates PCA partitioning assuming diagonal covariance matrix. Experiments reveal that Var-Part has similar performance with PCA partitioning, sometimes better, and leads K-means (and GMM) to yield sum-squared-error (and maximum-likelihood) values close to the optimum values obtained by several random-start runs and often at faster convergence rates.

引用

页码：319 / 338

页数：20

共 50 条

[21] Initializing Cluster Center for K-Means Using Biogeography Based Optimization
Kumar, Vijay
Chhabra, Jitender Kumar
Kumar, Dinesh
ADVANCES IN COMPUTING, COMMUNICATION AND CONTROL, 2011, 125 : 448 - +
[22] Improved fuzzy art method for initializing K-means
Ilhan S.
Duru N.
Adali E.
International Journal of Computational Intelligence Systems, 2010, 3 (3) : 274 - 279
[23] Initializing FWSA K-Means With Feature Level Constraints
He, Zhenfeng
IEEE ACCESS, 2022, 10 : 132976 - 132987
[24] K*-Means: An Effective and Efficient K-means Clustering Algorithm
Qi, Jianpeng
Yu, Yanwei
Wang, Lihong
Liu, Jinglei
PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCES ON BIG DATA AND CLOUD COMPUTING (BDCLOUD 2016) SOCIAL COMPUTING AND NETWORKING (SOCIALCOM 2016) SUSTAINABLE COMPUTING AND COMMUNICATIONS (SUSTAINCOM 2016) (BDCLOUD-SOCIALCOM-SUSTAINCOM 2016), 2016, : 242 - 249
[25] Unsupervised K-Means Clustering Algorithm
Sinaga, Kristina P.
Yang, Miin-Shen
IEEE ACCESS, 2020, 8 : 80716 - 80727
[26] APPLICATION OF METAHEURISTICS TO K-MEANS CLUSTERING
Lisin, A. V.
Faizullin, R. T.
COMPUTER OPTICS, 2015, 39 (03) : 406 - 412
[27] Modified k-Means Clustering Algorithm
Patel, Vaishali R.
Mehta, Rupa G.
COMPUTATIONAL INTELLIGENCE AND INFORMATION TECHNOLOGY, 2011, 250 : 307 - +
[28] The MinMax k-Means clustering algorithm
Tzortzis, Grigorios
Likas, Aristidis
PATTERN RECOGNITION, 2014, 47 (07) : 2505 - 2516
[29] A notion of stability for k-means clustering
Le Gouic, T.
Paris, Q.
ELECTRONIC JOURNAL OF STATISTICS, 2018, 12 (02): : 4239 - 4263
[30] Importance of Initialization in K-Means Clustering
Gupta, Anubhav
Tomer, Antriksh
Dahiya, Sonika
2022 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL, COMPUTING, COMMUNICATION AND SUSTAINABLE TECHNOLOGIES (ICAECT), 2022,

← 1 2 3 4 5 →