A Spark-based Artificial Bee Colony Algorithm for Large-scale Data Clustering

被引:4
作者
Wang, Yanjie [1 ]
Qian, Quan [1 ,2 ,3 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China
[2] Shanghai Univ, Shanghai Inst Adv Commun & Data Sci, Shanghai 200444, Peoples R China
[3] Shanghai Univ, Mat Genome Inst, Shanghai 200444, Peoples R China
来源
IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS) | 2018年
基金
上海市自然科学基金;
关键词
Clustering; Artificial bee colony; Spark; OPTIMIZATION;
D O I
10.1109/HPCC/SmartCity/DSS.2018.00204
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is one of the most common data analysis methods which aims to partition data into a certain number of clusters, so that the data within the same cluster are similar and dissimilar from data in other clusters. Our research goal is to find more efficient clustering algorithms for large-scale data. Spark is the most popular distributed computing platform which provides a series of high-level API to make high-performance parallel applications. The Spark-based artificial bee algorithm proposed in this paper combines the robust artificial bee colony algorithm with the powerful Spark framework, which is very suitable for clustering large-scale data. To verify the effectiveness of this method, we adopt KDD CUP 99 data, an open competition dataset as the experimental data. The experimental results illustrate that our algorithm can get a good clustering quality and almost ideal speedup compared with the serial algorithms.
引用
收藏
页码:1213 / 1218
页数:6
相关论文
共 27 条
  • [1] Aljarah I, 2012, WOR CONG NAT BIOL, P104, DOI 10.1109/NaBIC.2012.6402247
  • [2] [Anonymous], COMP INT SSCI 2017 I
  • [3] [Anonymous], 2000, P DARPA INFORM SURVI, DOI [DOI 10.1109/DISCEX.2000.821515, 10.1109/DISCEX.2000.821515]
  • [4] [Anonymous], 2016, Adv Neural Inf Process Syst
  • [5] Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
  • [6] A MapReduce-based artificial bee colony for large-scale data clustering
    Banharnsakun, Anan
    [J]. PATTERN RECOGNITION LETTERS, 2017, 93 : 78 - 84
  • [7] A comparative study of efficient initialization methods for the k-means clustering algorithm
    Celebi, M. Emre
    Kingravi, Hassan A.
    Vela, Patricio A.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (01) : 200 - 210
  • [8] Cup K., 1999, DATASET, V72
  • [9] A particle swarm optimization approach to clustering
    Cura, Tunchan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (01) : 1582 - 1588
  • [10] Ant system: Optimization by a colony of cooperating agents
    Dorigo, M
    Maniezzo, V
    Colorni, A
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1996, 26 (01): : 29 - 41