DISCERN: diversity-based selection of centroids for k-estimation and rapid non-stochastic clustering

被引:8
作者
Hassani, Ali [1 ]
Iranmanesh, Amir [1 ]
Eftekhari, Mahdi [4 ]
Salemi, Abbas [2 ,3 ]
机构
[1] Shahid Bahonar Univ Kerman, Dept Comp Sci, Pajoohesh Sq, Kerman 7616914111, Iran
[2] Shahid Bahonar Univ Kerman, Dept Appl Math, Pajoohesh Sq, Kerman 7616914111, Iran
[3] Shahid Bahonar Univ Kerman, Mahani Math Res Ctr, Pajoohesh Sq, Kerman 7616914111, Iran
[4] Shahid Bahonar Univ Kerman, Dept Comp Engn, Pajoohesh Sq, Kerman 7616914111, Iran
关键词
Clustering; K-means initialization; Estimating the number of clusters; Unsupervised learning; Deterministic K-means;
D O I
10.1007/s13042-020-01193-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the applications of center-based clustering algorithms such as K-means is partitioning data points intoKclusters. In some examples, the feature space relates to the underlying problem we are trying to solve, and sometimes we can obtain a suitable feature space. Nevertheless, while K-means is one of the most efficient offline clustering algorithms, it is not equipped to estimate the number of clusters, which is useful in some practical cases. Other practical methods which do are simply too complex, as they require at least one run of K-means for each possibleK. In order to address this issue, we propose a K-means initialization similar to K-means++, which would be able to estimateKbased on the feature space while finding suitable initial centroids for K-means in a deterministic manner. Then we compare the proposed method, DISCERN, with a few of the most practicalKestimation methods, while also comparing clustering results of K-means when initialized randomly, using K-means++ and using DISCERN. The results show improvement in both the estimation and final clustering performance.
引用
收藏
页码:635 / 649
页数:15
相关论文
共 41 条
[21]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[22]   COMPARING PARTITIONS [J].
HUBERT, L ;
ARABIE, P .
JOURNAL OF CLASSIFICATION, 1985, 2 (2-3) :193-218
[23]  
Jain Aaditya, 2018, 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), P1209, DOI 10.1109/ICECA.2018.8474757
[24]   A Graph Kernel Approach for Detecting Core Patents and Patent Groups [J].
Kim, Dohyun ;
Lee, Bangrae ;
Lee, Hyuck Jai ;
Lee, Sang Pil ;
Moon, Yeongho ;
Jeong, Myong K. .
IEEE INTELLIGENT SYSTEMS, 2014, 29 (04) :44-51
[25]   A Multiscale Spectral Method for Learning Number of Clusters [J].
Little, Anna ;
Byrd, Alicia .
2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, :457-460
[26]  
Maggioni M, 2019, J MACH LEARN RES, V20
[27]  
Nefian A. V., 1999, Georgia tech face database
[28]  
Novikov A., 2019, J OPEN SOURCE SOFTW, V4, P1230, DOI [DOI 10.21105/JOSS.01230, 10 . 21105 / joss . 01230]
[29]  
Paszke A, 2019, ADV NEUR IN, V32
[30]  
Pedregosa F, 2011, J MACH LEARN RES, V2830