An improved K-means algorithm for big data

被引:13
作者
Moodi, Fatemeh [1 ]
Saadatfar, Hamid [2 ]
机构
[1] Hormozan Higher Educ Inst, Comp Engn Dept, Birjand, Iran
[2] Univ Birjand, Comp Engn Dept, Univ Blvd, Birjand, Southern Khoras, Iran
关键词
Iterative methods - K-means clustering;
D O I
10.1049/sfw2.12032
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
An improved version of K-means clustering algorithm that can be applied to big data through lower processing loads with acceptable precision rates is presented here. In this method, the distances from one point to its two nearest centroids were used along with their variations in the last two iterations. Points with an equidistance threshold greater than the equidistance index were eliminated from the distance calculations and were stabilised in the cluster. Although these points are compared with the research index -cluster radius-again in the algorithm iteration, the excluded points are again included in the calculations if their distances from the stabilised cluster centroid are longer than the cluster radius. This can improve the clustering quality. Computerised tests as well as synthetic and real samples show that this method is able to improve the clustering quality by up to 41.85% in the best-case scenario. According to the findings, the proposed method is very beneficial to big data.
引用
收藏
页码:48 / 59
页数:12
相关论文
共 28 条
[1]  
Ailon N., 2009, ADV NEURAL INFORM PR, V22, P10
[2]  
AlDaoud M.B., 2007, J COMPUT SCI ENG, V1, P1031
[3]  
Alguliyev R., 2016, P 2016 IEEE 10 INT C
[4]  
Bandyopadhyay SS, 2017, 2017 IEEE CALCUTTA CONFERENCE (CALCON), P452, DOI 10.1109/CALCON.2017.8280774
[5]   Centroid Update Approach to K-Means Clustering [J].
Borlea, Ioan-Daniel ;
Precup, Radu-Emil ;
Dragan, Florin ;
Borlea, Alexandra-Bianca .
ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2017, 17 (04) :3-10
[6]  
Bradley P. S., 1998, Machine Learning. Proceedings of the Fifteenth International Conference (ICML'98), P91
[7]   k-Means clustering with a new divergence-based distance metric: Convergence and performance analysis [J].
Chakraborty, Saptarshi ;
Das, Swagatam .
PATTERN RECOGNITION LETTERS, 2017, 100 :67-73
[8]   Optimized big data K-means clustering using MapReduce [J].
Cui, Xiaoli ;
Zhu, Pingfei ;
Yang, Xin ;
Li, Keqiu ;
Ji, Changqing .
JOURNAL OF SUPERCOMPUTING, 2014, 70 (03) :1249-1259
[9]  
Ding YF, 2015, PR MACH LEARN RES, V37, P579
[10]  
Fabregas Aleta C., 2017, International Journal of Information Technology and Computer Science, V9, P26, DOI 10.5815/ijitcs.2017.01.04