Distributed Multi-Exemplar Affinity Propagation Based on MapReduce

被引:1
|
作者
Yang, Yu-Bo
Wang, Chang-Dong [1 ]
Lai, Jian-Huang
机构
[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou, Guangdong, Peoples R China
来源
2017 THIRD IEEE INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2017) | 2017年
关键词
Clustering; Multi-exemplar; Affinity propagation; Parallel system; MapReduce; PARALLEL ALGORITHMS;
D O I
10.1109/BigDataService.2017.33
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering algorithm is one of the fundamental techniques in data mining, which plays a crucial role in various applications, such as pattern recognition, document retrieval, and computer vision. As so far, many effective algorithms have been proposed. Affinity Propagation is an algorithm requires no parameter indicating the number of clusters, which is the most distinguishing advantage compared to the k-means clustering algorithm. Multi-Exemplar Affinity Propagation (MEAP) extends the single-exemplar model to the multi-exemplar model, which could describe the dataset with more complex structure. With the amount of data increasing rapidly, the growing size of dataset makes the clustering problem become more and more challenging. To solve this problem, the parallel computing framework is widely used, such as MapReduce. However, for the MEAP algorithm, it is not a straightforward task to implement the updating of MEAP messages in MapReduce, which without proper design would be time-consuming. In this paper, we propose to utilize the stability of data distribution to apply the MEAP algorithm on the MapReduce platform and develop an efficient Distributed Multi-Exemplar Affinity Propagation (DisMEAP) clustering algorithm by using three MapReduce stages. The experiment results demonstrate that our algorithm can perform well in processing large-scale datasets and could achieve the same accuracy as the original MEAP algorithm.
引用
收藏
页码:191 / 197
页数:7
相关论文
共 50 条
  • [41] Design and Implement of Distributed Document Clustering Based on MapReduce
    Wan, Jian
    Yu, Wenming
    Xu, Xianghua
    PROCEEDINGS OF INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND COMPUTATIONAL TECHNOLOGY (ISCSCT 2009), 2009, : 278 - 280
  • [42] MapReduce-based distributed tensor clustering algorithm
    Hongjun Zhang
    Peng Li
    Fanshuo Meng
    Weibei Fan
    Zhuangzhuang Xue
    Neural Computing and Applications, 2023, 35 : 24633 - 24649
  • [43] MapReduce-based distributed tensor clustering algorithm
    Zhang, Hongjun
    Li, Peng
    Meng, Fanshuo
    Fan, Weibei
    Xue, Zhuangzhuang
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (35): : 24633 - 24649
  • [44] A MAPREDUCE BASED DISTRIBUTED LSI FOR SCALABLE INFORMATION RETRIEVAL
    Liu, Yang
    Li, Maozhen
    Khan, Mukhtaj
    Qi, Man
    COMPUTING AND INFORMATICS, 2014, 33 (02) : 259 - 280
  • [45] ChainMR Crawler: A Distributed Vertical Crawler Based on MapReduce
    Liu, Xixia
    Jin, Zhengping
    SECURITY, PRIVACY AND ANONYMITY IN COMPUTATION, COMMUNICATION AND STORAGE, (SPACCS 2016), 2016, 0067 : 33 - 39
  • [46] Query Optimization of Distributed RDF Data Based on MapReduce
    Zhang, Yanqin
    Wang, Jingbin
    MACHINERY ELECTRONICS AND CONTROL ENGINEERING III, 2014, 441 : 970 - 973
  • [47] Distributed Extreme Learning Machine with kernels based on MapReduce
    Bi, Xin
    Zhao, Xiangguo
    Wang, Guoren
    Zhang, Pan
    Wang, Chao
    NEUROCOMPUTING, 2015, 149 : 456 - 463
  • [48] Distributed forests for MapReduce-based machine learning
    Wakayama, Ryoji
    Murata, Ryuei
    Kimura, Akisato
    Yamashita, Takayoshi
    Yamauchi, Yuji
    Fujiyoshi, Hironobu
    PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, 2015, : 276 - 280
  • [49] Affinity Propagation Algorithm Based Multi-Source Localization Method for Binary Detection
    Wang, Yan
    Cheng, Long
    Zhang, Jian
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (08): : 1916 - 1919
  • [50] Incremental clustering based on affinity propagation
    Xing, Yan
    Meng, Fanrong
    Zhou, Yong
    Journal of Computational Information Systems, 2013, 9 (19): : 7955 - 7965