Parallel Spectral Clustering Based on MapReduce

被引:4
|
作者
Qiwei Zhong [1 ]
Yunlong Lin [1 ]
Junyang Zou [1 ]
Kuangyan Zhu [1 ]
Qiao Wang [1 ]
Lei Hu [2 ]
机构
[1] School of Information Science and Engineering,Southeast University
[2] ZTE Corporation
关键词
spectral clustering; parallel implementation; massive dataset; Hadoop MapReduce; data mining;
D O I
暂无
中图分类号
TP311.13 [];
学科分类号
1201 ;
摘要
Clustering is one of the most widely used techniques for exploratory data analysis.Spectral clustering algorithm,a popular modern clustering algorithm,has been shown to be more effective in detecting clusters than many traditional algorithms.It has applications ranging from computer vision and information retrieval to social science and biology.With the size of databases soaring,clustering algorithms have scaling computational time and memory use.In this paper,we propose a parallel spectral clustering implementation based on MapReduce.Both the computation and data storage are distributed,which solves the scalability problems for most existing algorithms.We empirically analyze the proposed implementation on both benchmark networks and a real social network dataset of about two million vertices and two billion edges crawled from Sina Weibo.It is shown that the proposed implementation scales well,speeds up the clustering without sacrificing quality,and processes massive datasets efficiently on commodity machine clusters.
引用
收藏
页码:45 / 50
页数:6
相关论文
共 50 条
  • [11] Parallel Bat Algorithm-Based Clustering Using MapReduce
    Ashish, Tripathi
    Kapil, Sharma
    Manju, Bala
    NETWORKING COMMUNICATION AND DATA KNOWLEDGE ENGINEERING, VOL 2, 2018, 4 : 73 - 82
  • [12] Parallel CSA-FCM Clustering Algorithm Based on MapReduce
    Cui, Chunchun
    Zhang, Runtong
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON SPORTS, ARTS, EDUCATION AND MANAGEMENT ENGINEERING (SAEME 2017), 2017, 105 : 544 - 548
  • [13] Parallel Glowworm Swarm Optimization Clustering Algorithm based on MapReduce
    Al-Madi, Nailah
    Aljarah, Ibrahim
    Ludwig, Simone A.
    2014 IEEE SYMPOSIUM ON SWARM INTELLIGENCE (SIS), 2014, : 189 - 196
  • [14] Parallel Gene Clustering Using MapReduce
    Islam, A. K. M. Tauhidul
    Lim, Chae-Gyun
    Jeong, Byeong-Soo
    WEB-AGE INFORMATION MANAGEMENT: WAIM 2014 INTERNATIONAL WORKSHOPS, 2014, 8597 : 372 - 381
  • [15] A Parallel Cop-Kmeans Clustering Algorithm Based on MapReduce Framework
    Lin, Chao
    Yang, Yan
    Rutayisire, Tonny
    KNOWLEDGE ENGINEERING AND MANAGEMENT, 2011, 123 : 93 - 102
  • [16] Parallel Particle Swarm Optimization Clustering Algorithm based on MapReduce Methodology
    Aljarah, Ibrahim
    Ludwig, Simone A.
    PROCEEDINGS OF THE 2012 FOURTH WORLD CONGRESS ON NATURE AND BIOLOGICALLY INSPIRED COMPUTING (NABIC), 2012, : 104 - 111
  • [17] Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm
    Mao Y.-M.
    Gu S.-Q.
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2023, 53 (10): : 2909 - 2916
  • [18] Parallel Clustering Optimization Algorithm Based on MapReduce in Big Data Mining
    Zhang, Huajie
    Song, Lei
    Zhang, Sen
    IAENG International Journal of Applied Mathematics, 2023, 53 (01):
  • [19] Kernelized Spectral Clustering based Conditional MapReduce function with big data
    Maheswari K.
    Ramakrishnan M.
    International Journal of Computers and Applications, 2021, 43 (07) : 601 - 611
  • [20] Parallel Spectral Clustering
    Song, Yangqiu
    Chen, Wen-Yen
    Bai, Hongjie
    Lin, Chih-Jen
    Chang, Edward Y.
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART II, PROCEEDINGS, 2008, 5212 : 374 - +