Parallel Spectral Clustering in Distributed Systems

被引:385
|
作者
Chen, Wen-Yen [1 ]
Song, Yangqiu [2 ]
Bai, Hongjie [3 ]
Lin, Chih-Jen [4 ]
Chang, Edward Y. [5 ]
机构
[1] Yahoo Inc, Sunnyvale, CA 94089 USA
[2] Microsoft Res Asia, Beijing 100193, Peoples R China
[3] Google Informat Technol China Co Ltd, Beijing 100084, Peoples R China
[4] Natl Taiwan Univ, Dept Comp Sci, Taipei 106, Taiwan
[5] Google Res, Palo Alto, CA 94306 USA
基金
美国国家科学基金会;
关键词
Parallel spectral clustering; distributed computing; normalized cuts; nearest neighbors; Nystrom approximation; CUTS;
D O I
10.1109/TPAMI.2010.88
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional algorithms, such as k-means. However, spectral clustering suffers from a scalability problem in both memory use and computational time when the size of a data set is large. To perform clustering on large data sets, we investigate two representative ways of approximating the dense similarity matrix. We compare one approach by sparsifying the matrix with another by the Nystrom method. We then pick the strategy of sparsifying the matrix via retaining nearest neighbors and investigate its parallelization. We parallelize both memory use and computation on distributed computers. Through an empirical study on a document data set of 193,844 instances and a photo data set of 2,121,863, we show that our parallel algorithm can effectively handle large problems.
引用
收藏
页码:568 / 586
页数:19
相关论文
共 50 条
  • [21] Parallel Sparse Spectral Clustering for SAR Image Segmentation
    Gou, Shuiping
    Zhuang, Xiong
    Zhu, Huming
    Yu, Tiantian
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2013, 6 (04) : 1949 - 1963
  • [22] Parallel Spectral Clustering for the Segmentation of cDNA Microarray Images
    Mouysset, Sandrine
    Guivarch, Ronan
    Noailles, Joseph
    Ruiz, Daniel
    6TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS, 2012, 154 : 1 - +
  • [23] Digraph Spectral Clustering with Applications in Distributed Sensor Validation
    Du, Yue-Jin
    Lu, Hui
    Zhai, Li-Dong
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2014,
  • [24] Scalability in distributed systems, parallel systems and supercomputers
    Kremien, O
    HIGH-PERFORMANCE COMPUTING AND NETWORKING, 1995, 919 : 532 - 541
  • [26] Simple Parallel and Distributed Algorithms for Spectral Graph Sparsification
    Koutis, Ioannis
    PROCEEDINGS OF THE 26TH ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES (SPAA'14), 2014, : 61 - 66
  • [27] Parallel bindings in distributed multimedia systems
    Repplinger, M
    Winter, F
    Lohse, M
    Slusallek, P
    25TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS, PROCEEDINGS, 2005, : 714 - 720
  • [28] Parallel Cost Analysis of Distributed Systems
    Albert, Elvira
    Correas, Jesus
    Johnsen, Einar Broch
    Roman-Diez, Guillermo
    STATIC ANALYSIS (SAS 2015), 2015, 9291 : 275 - 292
  • [29] PARALLEL AND DISTRIBUTED COMPUTING FOR INTELLIGENT SYSTEMS
    RAO, NSV
    GULATI, S
    IYENGAR, SS
    MADAN, RN
    COMPUTERS & ELECTRICAL ENGINEERING, 1993, 19 (06) : R5 - R8
  • [30] Software engineering for parallel & distributed systems
    Burkhart, H
    Decker, KM
    Fekete, A
    Potter, JM
    Gomaa, H
    Kramer, J
    Schmidt, DC
    Stankovic, JA
    IEEE CONCURRENCY, 1997, 5 (03): : 16 - 27