A MapReduce-based artificial bee colony for large-scale data clustering

被引:39
作者
Banharnsakun, Anan [1 ]
机构
[1] Kasetsart Univ, Computat Intelligence Res Lab, Dept Comp Engn, Fac Engn Sriracha, Sriracha Campus, Chon Buri 20230, Thailand
关键词
Artificial Bee Colony (ABC); MapReduce; Data mining; Clustering; Distributed computing; Hadoop;
D O I
10.1016/j.patrec.2016.07.027
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The progress of technology has been a significant factor in increasing the growth of digital data. Therefore, good data analysis is a necessity for making better decisions. Clustering is one of the most important elements in the field of data analysis. However, the clustering of very large datasets is considered a primary concern. The improvement of computational models along with the ability to cluster huge volumes of data within a reasonable amount of time is thus required. MapReduce is a powerful programming model and an associated implement for processing large datasets with a parallel, distributed algorithm in a computing cluster. In this paper, a MapReduce-based artificial bee colony called MR-ABC is proposed for data clustering. The ABC is implemented based on the MapReduce model in the Hadoop framework and utilized to optimize the assignment of the large data instances to clusters with the objective of minimizing the sum of the squared Euclidean distance between each data instance and the centroid of the cluster to which it belongs. The experimental results demonstrate that our proposed algorithm is well suited for dealing with massive amounts of data, while the quality level of the clustering results is still maintained. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:78 / 84
页数:7
相关论文
共 20 条
[1]  
Aljarah I, 2012, WOR CONG NAT BIOL, P104, DOI 10.1109/NaBIC.2012.6402247
[2]   The k-Nearest Neighbor Algorithm Using MapReduce Paradigm [J].
Anchalia, Prajesh P. ;
Roy, Kaushik .
PROCEEDINGS FIFTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, MODELLING AND SIMULATION, 2014, :513-518
[3]  
[Anonymous], 2003, Introduction to Parallel Computing
[4]   The best-so-far ABC with multiple patrilines for clustering problems [J].
Banharnsakun, Anan ;
Sirinaovakul, Booncharoen ;
Achalakul, Tiranee .
NEUROCOMPUTING, 2013, 116 :355-366
[5]   A particle swarm optimization approach to clustering [J].
Cura, Tunchan .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (01) :1582-1588
[6]   Automatic clustering using an improved differential evolution algorithm [J].
Das, Swagatam ;
Abraham, Ajith ;
Konar, Amit .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2008, 38 (01) :218-237
[7]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[8]   Agreement, the F-measure, and reliability in information retrieval [J].
Hripcsak, G ;
Rothschild, AS .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2005, 12 (03) :296-298
[9]   Data clustering: A review [J].
Jain, AK ;
Murty, MN ;
Flynn, PJ .
ACM COMPUTING SURVEYS, 1999, 31 (03) :264-323
[10]   A comprehensive survey: artificial bee colony (ABC) algorithm and applications [J].
Karaboga, Dervis ;
Gorkemli, Beyza ;
Ozturk, Celal ;
Karaboga, Nurhan .
ARTIFICIAL INTELLIGENCE REVIEW, 2014, 42 (01) :21-57