Parallel Implementation of Density Peaks Clustering Algorithm Based on Spark

被引:8
|
作者
Liu, Rui [1 ]
Li, Xiaoge [1 ]
Du, Liping [1 ]
Zhi, Shuting [1 ]
Wei, Mian [2 ]
机构
[1] Xian Univ Posts & Telecommun, Sch Comp, Xian 710121, Shaanxi, Peoples R China
[2] Tulane Univ, New Orleans, LA 70118 USA
来源
ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGY | 2017年 / 107卷
关键词
density peaks; clustering; Spark; GraphX; big data;
D O I
10.1016/j.procs.2017.03.138
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering algorithm is widely used in data mining. It attempt to classify elements into several clusters, and the elements in the same cluster are more similar to each other meanwhile the elements belonging to other clusters are not similar. The recently published density peaks clustering algorithm can overcome the disadvantage of the distance-based algorithm that can only find clusters of nearly-circular shapes, instead it can discover clusters of arbitrary shapes and it is insensitive to noise data. However it needs calculate distances between all pairs of data points and is not scalable to the big data, in order to reduce the computational cost of the algorithm we propose an efficient distributed density peaks clustering algorithm based on Spark's GraphX. This paper proves the effectiveness of the method based on two different data set. The experimental results show our system can improve the performance significantly (up to 10x) comparing to MapReduce implementation. We also evaluate our system expansibility and scalability.
引用
收藏
页码:442 / 447
页数:6
相关论文
共 50 条
  • [1] Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit
    Ke-shi Ge
    Hua-you Su
    Dong-sheng Li
    Xi-cheng Lu
    Frontiers of Information Technology & Electronic Engineering, 2017, 18 : 915 - 927
  • [2] Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit
    Ge, Ke-shi
    Su, Hua-you
    Li, Dong-sheng
    Lu, Xi-cheng
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2017, 18 (07) : 915 - 927
  • [3] A Tabu search based clustering algorithm and its parallel implementation on Spark
    Lu, Yinhao
    Cao, Buyang
    Rego, Cesar
    Glover, Fred
    APPLIED SOFT COMPUTING, 2018, 63 : 97 - 109
  • [4] An Improved Density Peaks Clustering Algorithm Based On Density Ratio
    Zou, Yujuan
    Wang, Zhijian
    Xu, Pengfei
    Lv, Taizhi
    COMPUTER JOURNAL, 2024, 67 (07): : 2515 - 2528
  • [5] A spectral clustering algorithm based on attribute fluctuation and density peaks clustering algorithm
    Xin Song
    Shuhua Li
    Ziqiang Qi
    Jianlin Zhu
    Applied Intelligence, 2023, 53 : 10520 - 10534
  • [6] A spectral clustering algorithm based on attribute fluctuation and density peaks clustering algorithm
    Song, Xin
    Li, Shuhua
    Qi, Ziqiang
    Zhu, Jianlin
    APPLIED INTELLIGENCE, 2023, 53 (09) : 10520 - 10534
  • [7] Coflow scheduling algorithm based density peaks clustering
    Li, Chenghao
    Zhang, Huyin
    Zhou, Tianying
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 97 : 805 - 813
  • [8] Cosine kernel based density peaks clustering algorithm
    Wang, Jiayuan
    Lv, Li
    Wu, Runxiu
    Fan, Tanghuai
    Lee, Ivan
    INTERNATIONAL JOURNAL OF COMPUTING SCIENCE AND MATHEMATICS, 2020, 12 (01) : 1 - 20
  • [9] A text clustering algorithm based on find of density peaks
    Liu, Peiyu
    Liu, Yingying
    Hou, Xiuyan
    Li, Qingqing
    Zhu, Zhenfang
    2015 7TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY IN MEDICINE AND EDUCATION (ITME), 2015, : 348 - 352
  • [10] Improved density peaks clustering based on firefly algorithm
    Zhao J.
    Tang J.
    Shi A.
    Fan T.
    Xu L.
    Xu, Lizhong (lxu0530@126.com), 1600, Inderscience Enterprises Ltd. (15): : 24 - 42