In search of deterministic methods for initializing K-means and Gaussian mixture clustering

被引：84

作者：

Su, Ting ^{[1
]}

Dy, Jennifer G. ^{[1
]}

机构：

[1] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA

来源：

INTELLIGENT DATA ANALYSIS | 2007年 / 11卷 / 04期

关键词：

K-means; Gaussian mixture; initialization; PCA; clustering;

D O I：

10.3233/IDA-2007-11402

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The performance of K-means and Gaussian mixture model (GMM) clustering depends on the initial guess of partitions. Typically, clustering algorithms are initialized by random starts. In our search for a deterministic method, we found two promising approaches: principal component analysis (PCA) partitioning and Var-Part (Variance Partitioning). K-means clustering tries to minimize the sum-squared-error criterion. The largest eigenvector with the largest eigenvalue is the component which contributes to the largest sum-squared-error. Hence, a good candidate direction to project a cluster for splitting is the direction of the cluster's largest eigenvector, the basis for PCA partitioning. Similarly, GMM clustering maximizes the likelihood; minimizing the determinant of the covariance matrices of each cluster helps to increase the likelihood. The largest eigenvector contributes to the largest determinant and is thus a good candidate direction for splitting. However, PCA is computationally expensive. We, thus, introduce Var-Part, which is computationally less complex (with complexity equal to one K-means iteration) and approximates PCA partitioning assuming diagonal covariance matrix. Experiments reveal that Var-Part has similar performance with PCA partitioning, sometimes better, and leads K-means (and GMM) to yield sum-squared-error (and maximum-likelihood) values close to the optimum values obtained by several random-start runs and often at faster convergence rates.

引用

页码：319 / 338

页数：20

共 50 条

[41] Selection of K in K-means clustering [J].

Pham, DT ;

Dimov, SS ;

Nguyen, CD .

PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART C-JOURNAL OF MECHANICAL ENGINEERING SCIENCE, 2005, 219 (01) :103-119

[42] Balanced K-Means for Clustering [J].

Malinen, Mikko I. ;

Franti, Pasi .

STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2014, 8621 :32-41

[43] Subspace K-means clustering [J].

Timmerman, Marieke E. ;

Ceulemans, Eva ;

De Roover, Kim ;

Van Leeuwen, Karla .

BEHAVIOR RESEARCH METHODS, 2013, 45 (04) :1011-1023

[44] Spherical k-Means Clustering [J].

Hornik, Kurt ;

Feinerer, Ingo ;

Kober, Martin ;

Buchta, Christian .

JOURNAL OF STATISTICAL SOFTWARE, 2012, 50 (10) :1-22

[45] Clustering of Image Data Using K-Means and Fuzzy K-Means [J].

Rahmani, Md. Khalid Imam ;

Pal, Naina ;

Arora, Kamiya .

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) :160-163

[46] An Empirical comparison of Clustering using Hierarchical methods and K-means [J].

Praveen, P. ;

Rama, B. .

PROCEEDINGS OF THE 2016 IEEE 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL & ELECTRONICS, INFORMATION, COMMUNICATION & BIO INFORMATICS (IEEE AEEICB-2016), 2016, :445-449

[47] K-Means Initialization Methods for Improving Clustering by Simulated Annealing [J].

Perim, Gabriela Trazzi ;

Wandekokem, Estefhan Dazzi ;

Varejao, Flavio Miguel .

ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2008, PROCEEDINGS, 2008, 5290 :133-142

[48] Deterministic Coresets for k-Means of Big Sparse Data [J].

Barger, Artem ;

Feldman, Dan .

ALGORITHMS, 2020, 13 (04)

[49] Global k-means plus plus : an effective relaxation of the global k-means clustering algorithm [J].

Vardakas, Georgios ;

Likas, Aristidis .

APPLIED INTELLIGENCE, 2024, 54 (19) :8876-8888

[50] PSO Aided k-Means Clustering: Introducing Connectivity in k-Means [J].

Breaban, Mihaela Elena ;

Luchian, Henri .

GECCO-2011: PROCEEDINGS OF THE 13TH ANNUAL GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2011, :1227-1234

← 1 2 3 4 5 →