Distributed Multi-Exemplar Affinity Propagation Based on MapReduce

被引：1

作者：

Yang, Yu-Bo

Wang, Chang-Dong ^{[1
]}

Lai, Jian-Huang

机构：

[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou, Guangdong, Peoples R China

来源：

2017 THIRD IEEE INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2017) | 2017年

关键词：

Clustering; Multi-exemplar; Affinity propagation; Parallel system; MapReduce; PARALLEL ALGORITHMS;

D O I：

10.1109/BigDataService.2017.33

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Clustering algorithm is one of the fundamental techniques in data mining, which plays a crucial role in various applications, such as pattern recognition, document retrieval, and computer vision. As so far, many effective algorithms have been proposed. Affinity Propagation is an algorithm requires no parameter indicating the number of clusters, which is the most distinguishing advantage compared to the k-means clustering algorithm. Multi-Exemplar Affinity Propagation (MEAP) extends the single-exemplar model to the multi-exemplar model, which could describe the dataset with more complex structure. With the amount of data increasing rapidly, the growing size of dataset makes the clustering problem become more and more challenging. To solve this problem, the parallel computing framework is widely used, such as MapReduce. However, for the MEAP algorithm, it is not a straightforward task to implement the updating of MEAP messages in MapReduce, which without proper design would be time-consuming. In this paper, we propose to utilize the stability of data distribution to apply the MEAP algorithm on the MapReduce platform and develop an efficient Distributed Multi-Exemplar Affinity Propagation (DisMEAP) clustering algorithm by using three MapReduce stages. The experiment results demonstrate that our algorithm can perform well in processing large-scale datasets and could achieve the same accuracy as the original MEAP algorithm.

引用

页码：191 / 197

页数：7

共 50 条

[41] Design and Implement of Distributed Document Clustering Based on MapReduce
Wan, Jian
Yu, Wenming
Xu, Xianghua
PROCEEDINGS OF INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND COMPUTATIONAL TECHNOLOGY (ISCSCT 2009), 2009, : 278 - 280
[42] MapReduce-based distributed tensor clustering algorithm
Hongjun Zhang
Peng Li
Fanshuo Meng
Weibei Fan
Zhuangzhuang Xue
Neural Computing and Applications, 2023, 35 : 24633 - 24649
[43] MapReduce-based distributed tensor clustering algorithm
Zhang, Hongjun
Li, Peng
Meng, Fanshuo
Fan, Weibei
Xue, Zhuangzhuang
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (35): : 24633 - 24649
[44] A MAPREDUCE BASED DISTRIBUTED LSI FOR SCALABLE INFORMATION RETRIEVAL
Liu, Yang
Li, Maozhen
Khan, Mukhtaj
Qi, Man
COMPUTING AND INFORMATICS, 2014, 33 (02) : 259 - 280
[45] ChainMR Crawler: A Distributed Vertical Crawler Based on MapReduce
Liu, Xixia
Jin, Zhengping
SECURITY, PRIVACY AND ANONYMITY IN COMPUTATION, COMMUNICATION AND STORAGE, (SPACCS 2016), 2016, 0067 : 33 - 39
[46] Query Optimization of Distributed RDF Data Based on MapReduce
Zhang, Yanqin
Wang, Jingbin
MACHINERY ELECTRONICS AND CONTROL ENGINEERING III, 2014, 441 : 970 - 973
[47] Distributed Extreme Learning Machine with kernels based on MapReduce
Bi, Xin
Zhao, Xiangguo
Wang, Guoren
Zhang, Pan
Wang, Chao
NEUROCOMPUTING, 2015, 149 : 456 - 463
[48] Distributed forests for MapReduce-based machine learning
Wakayama, Ryoji
Murata, Ryuei
Kimura, Akisato
Yamashita, Takayoshi
Yamauchi, Yuji
Fujiyoshi, Hironobu
PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, 2015, : 276 - 280
[49] Affinity Propagation Algorithm Based Multi-Source Localization Method for Binary Detection
Wang, Yan
Cheng, Long
Zhang, Jian
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (08): : 1916 - 1919
[50] Incremental clustering based on affinity propagation
Xing, Yan
Meng, Fanrong
Zhou, Yong
Journal of Computational Information Systems, 2013, 9 (19): : 7955 - 7965

← 1 2 3 4 5 →