Unsupervised K-Means Clustering Algorithm

被引：1059

作者：

Sinaga, Kristina P. ^{[1
]}

Yang, Miin-Shen ^{[1
]}

机构：

[1] Chung Yuan Christian Univ, Dept Appl Math, Taoyuan 32023, Taiwan

来源：

IEEE ACCESS | 2020年 / 8卷

关键词：

Clustering algorithms; Indexes; Linear programming; Entropy; Clustering methods; Unsupervised learning; Machine learning algorithms; Clustering; K-means; number of clusters; initializations; unsupervised learning schema; Unsupervised k-means (U-k-means); VALIDATION; INFORMATION; SELECTION; NUMBER; EM;

D O I：

10.1109/ACCESS.2020.2988796

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The k-means algorithm is generally the most known and used clustering method. There are various extensions of k-means to be proposed in the literature. Although it is an unsupervised learning to clustering in pattern recognition and machine learning, the k-means algorithm and its extensions are always influenced by initializations with a necessary number of clusters a priori. That is, the k-means algorithm is not exactly an unsupervised clustering method. In this paper, we construct an unsupervised learning schema for the k-means algorithm so that it is free of initializations without parameter selection and can also simultaneously find an optimal number of clusters. That is, we propose a novel unsupervised k-means (U-k-means) clustering algorithm with automatically finding an optimal number of clusters without giving any initialization and parameter selection. The computational complexity of the proposed U-k-means clustering algorithm is also analyzed. Comparisons between the proposed U-k-means and other existing methods are made. Experimental results and comparisons actually demonstrate these good aspects of the proposed U-k-means clustering algorithm.

引用

页码：80716 / 80727

页数：12

共 37 条

[1] Revisiting K-Means and Topic Modeling, a Comparison Study to Cluster Arabic Documents [J].

Alhawarat, M. ;

Hegazi, M. .

IEEE ACCESS, 2018, 6 :42740-42749

[2]

[Anonymous], 2010, International Journal of Computer Science & Engineering Survey, DOI DOI 10.5121/IJCSES.2010.1207

[3]

[Anonymous], 2000, P 17 INT C MACH LEAR

[4] MODEL SELECTION AND AKAIKE INFORMATION CRITERION (AIC) - THE GENERAL-THEORY AND ITS ANALYTICAL EXTENSIONS [J].

BOZDOGAN, H .

PSYCHOMETRIKA, 1987, 52 (03) :345-370

[5]

Cai D., 2007, IEEE C COMP VIS ICCV, P1, DOI DOI 10.1109/CVPR.2007.383054

[6] Graph Regularized Nonnegative Matrix Factorization for Data Representation [J].

Cai, Deng ;

He, Xiaofei ;

Han, Jiawei ;

Huang, Thomas S. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (08) :1548-1560

[7]

Calinski R, 1974, COMMUN STAT, V3, P1, DOI [DOI 10.1080/03610927408827101, 10.1080/03610927408827101]

[8] CLUSTER SEPARATION MEASURE [J].

DAVIES, DL ;

BOULDIN, DW .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1979, 1 (02) :224-227

[9] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[10]

Dunn J. C., 1973, Journal of Cybernetics, V3, P32, DOI 10.1080/01969727308546046

← 1 2 3 4 →