Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm

被引:0
|
作者
Mao Y.-M. [1 ]
Gu S.-Q. [1 ]
机构
[1] College of Information Engineering, Jiangxi University of Science and Technology, Ganzhou
关键词
density clustering; density-based spatial dutering of apptications with noise; MapReduce; optimization cuckoo algorithm; resist noise ability;
D O I
10.13229/j.cnki.jdxbgxb.20220601
中图分类号
学科分类号
摘要
In the process of parallel density clustering,the boundary points of clusters with different densities are divided fuzzy and there is data noise,which affects the clustering performance and makes the clustering results subject to the influence of local optimization. Therefore,a parallel density clustering algorithm MCS-KDBSCAN(maprule based parallel maximization cuckoo search K-means DBSCAN)based on MapReduce and optimized cuckoo algorithm is proposed. Firstly,the algorithm combines the strategy KDBSCAN(K-means DBSCAN),which is based on the idea of nearest neighbor and inverse nearest neighbor in k-means. By calculating the influence space of each data point,the expansion conditions of clustering clusters in DBSCAN algorithm are redefined to avoid the problem of fuzzy boundary points of clustering clusters with different densities;Then,combined with the nearest neighbor idea in KDBSCAN density clustering,a feasible iterative noise point processing strategy is proposed to reduce the impact of noise points in data on the performance of clustering algorithm; Secondly, the optimization and improvement strategy MCS (maximization cuckoo search) based on the traditional cuckoo algorithm is proposed. By attenuating the weight of the probability of finding nests,with the increase of the number of iterative searches, the convergence speed of the algorithm is improved, and the influence of local optimization on the clustering results is solved;Finally,combined with MapReduce,a parallel density clustering strategy MCS-KDBSCAN is proposed. By parallelizing the operation of density clustering algorithm,the communication burden of local optimal solution transmission of parallel clustering algorithm is reduced and the performance of the algorithm is improved. Experiments show that the proposed mcskdbscan parallel density clustering algorithm is superior in clustering accuracy and clustering running time. © 2023 Editorial Board of Jilin University. All rights reserved.
引用
收藏
页码:2909 / 2916
页数:7
相关论文
共 9 条
  • [1] Wang Yan, Peng Tao, Han Jia-yu, Et al., Density-based distributed clustering method, Journal of Software, 28, 11, pp. 2836-2850, (2017)
  • [2] Liu P, Zhou D, Wu N., VDBSCAN: varied density based spatial clustering of applications with noise, International Conference on Service Systems and Service Management, pp. 1-4, (2007)
  • [3] Chinta S, Sivaram A, Rengaswamy R., Prediction error-based clustering approach for multiple-model learning using statistical testing, Engineering Applications of Artificial Intelligence, 77, 1, pp. 125-135, (2019)
  • [4] Ros F, Guillaume S., A hierarchical clustering algorithm and an improvement of the single linkage criterion to deal with noise, Expert Systems with Applications, 128, 1, pp. 96-108, (2019)
  • [5] Brown D, Japa A, Shi Y., A fast density-grid based clustering method, IEEE 9th Annual Computing and Communication Workshop and Conference (CC-WC), pp. 48-54, (2019)
  • [6] Manogaran G, Vijayakumar V, Varatharajan R, Et al., Machine learning based big data processing framework for cancer diagnosis using hidden Markov model and GM clustering, Wireless Personal Communications, 102, 3, pp. 2099-2116, (2018)
  • [7] Kriegel H P, Kroger P, Sander J, Et al., Density ‐ based clustering, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1, 3, pp. 231-240, (2011)
  • [8] Chen Min, Gao Xue-dong, Luan Shao-jun, Et al., Parallel clustering algorithm based on density, Computer Engineering, 36, 11, pp. 8-10, (2010)
  • [9] Hu Wei-hua, Research on parallel data stream clustering algorithm based on grid and density[C], International Conference on Computer Science & Mechanical Automation, (2015)