Parallel Spectral Clustering in Distributed Systems

被引:385
|
作者
Chen, Wen-Yen [1 ]
Song, Yangqiu [2 ]
Bai, Hongjie [3 ]
Lin, Chih-Jen [4 ]
Chang, Edward Y. [5 ]
机构
[1] Yahoo Inc, Sunnyvale, CA 94089 USA
[2] Microsoft Res Asia, Beijing 100193, Peoples R China
[3] Google Informat Technol China Co Ltd, Beijing 100084, Peoples R China
[4] Natl Taiwan Univ, Dept Comp Sci, Taipei 106, Taiwan
[5] Google Res, Palo Alto, CA 94306 USA
基金
美国国家科学基金会;
关键词
Parallel spectral clustering; distributed computing; normalized cuts; nearest neighbors; Nystrom approximation; CUTS;
D O I
10.1109/TPAMI.2010.88
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional algorithms, such as k-means. However, spectral clustering suffers from a scalability problem in both memory use and computational time when the size of a data set is large. To perform clustering on large data sets, we investigate two representative ways of approximating the dense similarity matrix. We compare one approach by sparsifying the matrix with another by the Nystrom method. We then pick the strategy of sparsifying the matrix via retaining nearest neighbors and investigate its parallelization. We parallelize both memory use and computation on distributed computers. Through an empirical study on a document data set of 193,844 instances and a photo data set of 2,121,863, we show that our parallel algorithm can effectively handle large problems.
引用
收藏
页码:568 / 586
页数:19
相关论文
共 50 条
  • [41] Bias in parallel and distributed simulation systems
    Kiesling, T
    Lüthi, J
    Khayari, RE
    PROCEEDINGS OF THE 2005 WINTER SIMULATION CONFERENCE, VOLS 1-4, 2005, : 384 - 393
  • [42] Parallel program model for distributed systems
    Tran, VD
    Hluchy, L
    Nguyen, GT
    RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, PROCEEDINGS, 2000, 1908 : 250 - 257
  • [43] Spectral factorization for distributed parameter systems
    Callier, FM
    Winkin, J
    PROCEEDINGS OF THE 36TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5, 1997, : 4406 - 4408
  • [44] Parallel multi-view concept clustering in distributed computing
    Wang, Hao
    Yang, Yan
    Zhang, Xiaobo
    Peng, Bo
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (10): : 5621 - 5631
  • [45] Parallel and distributed clustering framework for big spatial data mining
    Bendechache, Malika
    Tari, A-Kamel
    Kechadi, M-Tahar
    INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2019, 34 (06) : 671 - 689
  • [46] Parallel multi-view concept clustering in distributed computing
    Hao Wang
    Yan Yang
    Xiaobo Zhang
    Bo Peng
    Neural Computing and Applications, 2020, 32 : 5621 - 5631
  • [47] Spectral Efficiency of Distributed MIMO Systems
    Wang, Dongming
    Wang, Jiangzhou
    You, Xiaohu
    Wang, Yan
    Chen, Ming
    Hou, Xiaoyun
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2013, 31 (10) : 2112 - 2127
  • [48] Parallel Markov Chain Monte Carlo via Spectral Clustering
    Basse, Guillaume
    Pillai, Natesh
    Smith, Aaron
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 1318 - 1327
  • [49] Segmentation of cDNA Microarray Images using Parallel Spectral Clustering
    Mouysset, Sandrine
    Guivarch, Ronan
    Noailles, Joseph
    Ruiz, Daniel
    ADCAIJ-ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL, 2013, 2 (01): : 1 - 8
  • [50] Location Based Distributed Spectral Clustering for Wireless Sensor Networks
    Muniraju, Gowtham
    Zhang, Sai
    Tepedelenlioglu, Cihan
    Banavar, Mahesh K.
    Spanias, Andreas
    Vargas-Rosales, Cesar
    Villalpando-Hernandez, Rafaela
    2017 SENSOR SIGNAL PROCESSING FOR DEFENCE CONFERENCE (SSPD), 2017, : 40 - 44