Unsupervised K-Means Clustering Algorithm

被引:1059
作者
Sinaga, Kristina P. [1 ]
Yang, Miin-Shen [1 ]
机构
[1] Chung Yuan Christian Univ, Dept Appl Math, Taoyuan 32023, Taiwan
关键词
Clustering algorithms; Indexes; Linear programming; Entropy; Clustering methods; Unsupervised learning; Machine learning algorithms; Clustering; K-means; number of clusters; initializations; unsupervised learning schema; Unsupervised k-means (U-k-means); VALIDATION; INFORMATION; SELECTION; NUMBER; EM;
D O I
10.1109/ACCESS.2020.2988796
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The k-means algorithm is generally the most known and used clustering method. There are various extensions of k-means to be proposed in the literature. Although it is an unsupervised learning to clustering in pattern recognition and machine learning, the k-means algorithm and its extensions are always influenced by initializations with a necessary number of clusters a priori. That is, the k-means algorithm is not exactly an unsupervised clustering method. In this paper, we construct an unsupervised learning schema for the k-means algorithm so that it is free of initializations without parameter selection and can also simultaneously find an optimal number of clusters. That is, we propose a novel unsupervised k-means (U-k-means) clustering algorithm with automatically finding an optimal number of clusters without giving any initialization and parameter selection. The computational complexity of the proposed U-k-means clustering algorithm is also analyzed. Comparisons between the proposed U-k-means and other existing methods are made. Experimental results and comparisons actually demonstrate these good aspects of the proposed U-k-means clustering algorithm.
引用
收藏
页码:80716 / 80727
页数:12
相关论文
共 37 条
[1]   Revisiting K-Means and Topic Modeling, a Comparison Study to Cluster Arabic Documents [J].
Alhawarat, M. ;
Hegazi, M. .
IEEE ACCESS, 2018, 6 :42740-42749
[2]  
[Anonymous], 2010, International Journal of Computer Science & Engineering Survey, DOI DOI 10.5121/IJCSES.2010.1207
[3]  
[Anonymous], 2000, P 17 INT C MACH LEAR
[5]  
Cai D., 2007, IEEE C COMP VIS ICCV, P1, DOI DOI 10.1109/CVPR.2007.383054
[6]   Graph Regularized Nonnegative Matrix Factorization for Data Representation [J].
Cai, Deng ;
He, Xiaofei ;
Han, Jiawei ;
Huang, Thomas S. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (08) :1548-1560
[7]  
Calinski R, 1974, COMMUN STAT, V3, P1, DOI [DOI 10.1080/03610927408827101, 10.1080/03610927408827101]
[8]   CLUSTER SEPARATION MEASURE [J].
DAVIES, DL ;
BOULDIN, DW .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1979, 1 (02) :224-227
[9]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[10]  
Dunn J. C., 1973, Journal of Cybernetics, V3, P32, DOI 10.1080/01969727308546046