Parallel Spectral Clustering in Distributed Systems

被引:385
|
作者
Chen, Wen-Yen [1 ]
Song, Yangqiu [2 ]
Bai, Hongjie [3 ]
Lin, Chih-Jen [4 ]
Chang, Edward Y. [5 ]
机构
[1] Yahoo Inc, Sunnyvale, CA 94089 USA
[2] Microsoft Res Asia, Beijing 100193, Peoples R China
[3] Google Informat Technol China Co Ltd, Beijing 100084, Peoples R China
[4] Natl Taiwan Univ, Dept Comp Sci, Taipei 106, Taiwan
[5] Google Res, Palo Alto, CA 94306 USA
基金
美国国家科学基金会;
关键词
Parallel spectral clustering; distributed computing; normalized cuts; nearest neighbors; Nystrom approximation; CUTS;
D O I
10.1109/TPAMI.2010.88
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional algorithms, such as k-means. However, spectral clustering suffers from a scalability problem in both memory use and computational time when the size of a data set is large. To perform clustering on large data sets, we investigate two representative ways of approximating the dense similarity matrix. We compare one approach by sparsifying the matrix with another by the Nystrom method. We then pick the strategy of sparsifying the matrix via retaining nearest neighbors and investigate its parallelization. We parallelize both memory use and computation on distributed computers. Through an empirical study on a document data set of 193,844 instances and a photo data set of 2,121,863, we show that our parallel algorithm can effectively handle large problems.
引用
收藏
页码:568 / 586
页数:19
相关论文
共 50 条
  • [31] Parallel Initialization on Distributed Simulation Systems
    Menezes, Nilo Ney Coutinho
    2014 IEEE/ACM 18TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS (DS-RT 2014), 2014, : 18 - 24
  • [32] Distributed and parallel systems: Environments and tools
    Haring, G
    Kacsuk, P
    Kotsis, G
    PARALLEL COMPUTING, 1997, 22 (13) : 1699 - 1701
  • [33] Distributed and parallel systems engineering in MANIFOLD
    Univ of Cyprus, Nicosia, Cyprus
    Parallel Comput, 7 (1137-1160):
  • [34] Parallel Control of Distributed Parameter Systems
    Song, Yuhua
    He, Xiuyu
    Liu, Zhijie
    He, Wei
    Sun, Changyin
    Wang, Fei-Yue
    IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (12) : 3291 - 3301
  • [35] Distributed and parallel systems engineering in MANIFOLD
    Papadopoulos, GA
    PARALLEL COMPUTING, 1998, 24 (07) : 1137 - 1160
  • [36] Parallel Lisp compilation for distributed systems
    Feng, MD
    Yuen, CK
    AUSTRALIAN COMPUTER JOURNAL, 1995, 27 (03): : 77 - 91
  • [37] SYNCHRONIZATION OF PARALLEL PROCESSES IN DISTRIBUTED SYSTEMS
    MAKHANIOK, M
    MANNER, R
    LECTURE NOTES IN COMPUTER SCIENCE, 1992, 634 : 157 - 162
  • [38] Scheduling problems for parallel and distributed systems
    Rusanova, O
    Korochkin, A
    ACM SIGADA ANNUAL INTERNATIONAL CONFERENCE (SIGADA'99) - PROCEEDINGS, 1999, 19 (03): : 195 - 201
  • [39] A scheduling tool for parallel and distributed systems
    Shirazi, BA
    Marquis, J
    1996 IEEE AEROSPACE APPLICATIONS CONFERENCE, PROCEEDINGS, VOL 3, 1996, : 247 - 259
  • [40] Programming models for parallel and distributed systems
    Scott, ML
    ACM SIGPLAN NOTICES, 2002, 37 (10) : 235 - 235