RETRACTED ARTICLE: Innovative study on clustering center and distance measurement of K-means algorithm: mapreduce efficient parallel algorithm based on user data of JD mall

被引:0
作者
Yang Liu
Xinxin Du
Shuaifeng Ma
机构
[1] Southwestern University of Finance and Economics,School of Statistics
[2] Jingdong Century Trading Co.,Big Data Operation Center
[3] Ltd,undefined
来源
Electronic Commerce Research | 2023年 / 23卷
关键词
K-means; Clustering center; Distance measurement; MapReduce; Parallel computing;
D O I
暂无
中图分类号
学科分类号
摘要
The traditional K-means algorithm is very sensitive to the selection of clustering centers and the calculation of distances, so the algorithm easily converges to a locally optimal solution. In addition, the traditional algorithm has slow convergence speed and low clustering accuracy, as well as memory bottleneck problems when processing massive data. Therefore, an improved K-means algorithm is proposed in this paper. In this algorithm, the selection of the initial points in the traditional clustering algorithm is improved first, and then a new global measure, the effective distance measure, is proposed. Its main idea is to calculate the effective distance between two data samples by sparse reconstruction. Finally, on the basis of the MapReduce framework, the efficiency of the algorithm is further improved by adjusting the Hadoop cluster. Based on the real customer data from the JD Mall dataset, this paper introduces the DBI, Rand and other indicators to evaluate the clustering effects of various algorithms. The results show that the proposed algorithm not only has good convergence and accuracy but also achieves better performances than those of other compared algorithms.
引用
收藏
页码:43 / 73
页数:30
相关论文
共 83 条
[1]  
Chakraborty S(2017)k Means clustering with a new divergence-based distance measure: Convergence and performance analysis Pattern Recognition Letters 100 67-73
[2]  
Das S(2000)LOF: Identifying density-based local outliers Acm Sigmod Record 29 93-104
[3]  
Breunig MM(2013)A comparative study of efficient initialization methods for the k-means clustering algorithm Expert Systems with Applications: An International Journal 40 200-210
[4]  
Kriegel HP(2017)K means Clustering with a New divergence-based distance measure: convergence and performance analysis Pattern Recognition Letters 100 67-73
[5]  
Ng R(2006)Rough set-based clustering with refinement using Shannon’s entropy theory Computers and Mathematics with Applications 52 1563-1576
[6]  
Sander J(2017)Application of effective distance in clustering algorithm Computer Science and Exploration 11 406-413
[7]  
Celebi ME(2015)Image segmentation using K-means clustering algorithm and subtractive clustering algorithm Procedia Computer Science 54 764-771
[8]  
Kingravi HA(2018)Locally Weighted Ensemble Clustering IEEE Transactions on Cybernetics 48 1460-1473
[9]  
Vela PA(2017)Clustering of college students based on improved K-means algorithm Journal of Computers (Taiwan) 28 195-203
[10]  
Chakraborty S(2007)Clustering by passing messages between data points Science 315 972-976