A Spark-based Artificial Bee Colony Algorithm for Large-scale Data Clustering

被引:4
作者
Wang, Yanjie [1 ]
Qian, Quan [1 ,2 ,3 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China
[2] Shanghai Univ, Shanghai Inst Adv Commun & Data Sci, Shanghai 200444, Peoples R China
[3] Shanghai Univ, Mat Genome Inst, Shanghai 200444, Peoples R China
来源
IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS) | 2018年
基金
上海市自然科学基金;
关键词
Clustering; Artificial bee colony; Spark; OPTIMIZATION;
D O I
10.1109/HPCC/SmartCity/DSS.2018.00204
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is one of the most common data analysis methods which aims to partition data into a certain number of clusters, so that the data within the same cluster are similar and dissimilar from data in other clusters. Our research goal is to find more efficient clustering algorithms for large-scale data. Spark is the most popular distributed computing platform which provides a series of high-level API to make high-performance parallel applications. The Spark-based artificial bee algorithm proposed in this paper combines the robust artificial bee colony algorithm with the powerful Spark framework, which is very suitable for clustering large-scale data. To verify the effectiveness of this method, we adopt KDD CUP 99 data, an open competition dataset as the experimental data. The experimental results illustrate that our algorithm can get a good clustering quality and almost ideal speedup compared with the serial algorithms.
引用
收藏
页码:1213 / 1218
页数:6
相关论文
共 27 条
  • [21] IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), P333, DOI [10.1109/HPCC-SmartCity-DSS.2016.0055, 10.1109/HPCC-SmartCity-DSS.2016.148]
  • [22] An ant colony approach for clustering
    Shelokar, PS
    Jayaraman, VK
    Kulkarni, BD
    [J]. ANALYTICA CHIMICA ACTA, 2004, 509 (02) : 187 - 195
  • [23] Spark A., 2016, APACHE SPARK LIGHTNI
  • [24] Scaling Genetic Algorithms using Map Reduce
    Verma, Abhishek
    Llora, Xavier
    Goldberg, David E.
    Campbell, Roy H.
    [J]. 2009 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2009, : 13 - +
  • [25] Wang JJ, 2012, PROCEEDINGS OF 2012 IEEE 14TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY, P1203, DOI 10.1109/ICCT.2012.6511380
  • [26] Survey of clustering algorithms
    Xu, R
    Wunsch, D
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2005, 16 (03): : 645 - 678
  • [27] An artificial bee colony approach for clustering
    Zhang, Changsheng
    Ouyang, Dantong
    Ning, Jiaxu
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (07) : 4761 - 4767