A Clustering Algorithm for Automatically Determining the Number of Clusters Based on Coefficient of Variation

被引:4
作者
Liu, Tengteng [1 ]
Qu, Shouning [2 ]
Zhang, Kun [1 ]
机构
[1] Univ Jinan, Sch Informat Sci & Engn, Jinan, Shandong, Peoples R China
[2] Univ Jinan, Sch Informat Sci & Engn, Shandong Prov Key Lab Network Based Intelligent, Jinan 250022, Shandong, Peoples R China
来源
PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON BIG DATA RESEARCH (ICBDR 2018) | 2018年
关键词
Clustering; K-means plus; Density index; Coefficient of variation;
D O I
10.1145/3291801.3291825
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The k-means algorithm is a typical clustering algorithm based on partition. The k-means++ algorithm is a high-quality clustering algorithm, and it is used to solve the problem that the traditional k-means algorithm is sensitive to initial centers. However, the original k-means++ algorithm is sensitive to outliers and needs to manually set the number of clusters. We propose an improved k-means++ clustering algorithm that automatically determine the number of clusters based on coefficient of variation, named CV-means++. Firstly, we propose a method to confirm initial centers by using density index of data points to avoid selection of abnormal data. Secondly, we introduce the concept of coefficient of variation, and calculate the relationship between the average intra-cluster coefficient of variation and the smallest inter-cluster coefficient of variation of k(+) (k+ >> k) clusters to determine whether the number of clusters is optimal. Experiments performed.
引用
收藏
页码:100 / 106
页数:7
相关论文
共 15 条
[1]  
Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
[2]   A simple density with distance based initial seed selection technique for K means algorithm [J].
Azimuddin S.S. ;
Desikan K. .
Journal of Computing and Information Technology, 2017, 25 (04) :291-300
[3]  
BALA C, 2015, ICACCI, P759
[4]  
Bao Li-Ming, 2017, COMPUTER TECHNOLOGY, V27
[5]  
Duda R.O., 1973, Pattern Classification and Scene Analysis, V3
[6]  
Dunn J. C., 1974, Journal of Cybernetics, V4, P95, DOI 10.1080/01969727408546059
[7]  
Han Ling-Bo, 2012, J SICHUAN U SCI ENG, V25, P77
[8]  
Jian Di, 2018, Journal of Computers, V13, P588, DOI 10.17706/jcp.13.6.588-595
[9]  
Li Yong-sen, 2006, Journal of System Simulation, V18, P573
[10]   Locality Sensitive K-means Clustering [J].
Liu, Chlen-Liang ;
Hsai, Wen-Hoar ;
Chang, Tao-Hsing .
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2018, 34 (01) :289-305