Scaling Graph Clustering with Distributed Sketches

被引:1
|
作者
Priest, Benjamin W. [1 ]
Dunton, Alec [2 ]
Sanders, Geoffrey [1 ]
机构
[1] Lawrence Livermore Natl Lab, Ctr Appl Sci Comp, Livermore, CA 94550 USA
[2] Univ Colorado, Dept Appl Math, Boulder, CO 80309 USA
关键词
streaming algorithms; random matrices; graph clustering;
D O I
10.1109/hpec43674.2020.9286252
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The unsupervised learning of community structure, in particular the partitioning vertices into clusters or communities, is a canonical and well-studied problem in exploratory graph analysis. However, like most graph analyses the introduction of immense scale presents challenges to traditional methods. Spectral clustering in distributed memory, for example, requires hundreds of expensive bulk-synchronous communication rounds to compute an embedding of vertices to a few eigenvectors of a graph associated matrix. Furthermore, the whole computation might need to be repeated if the underlying graph changes some low percentage of edge updates. We present a method inspired by spectral clustering where we instead use matrix sketches derived from random dimension-reducing projections. We show that our method produces embeddings that yield performant clustering results given a fully-dynamic stochastic block model stream using both the fast Johnson-Lindenstrauss and CountSketch transforms. We also discuss the effects of stochastic block model parameters upon the required dimensionality of the subsequent embeddings, and show how random projections could significantly improve the performance of graph clustering in distributed memory.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] Labeled graph sketches: Keeping up with real-time graph streams
    Song, Chunyao
    Ge, Tingjian
    Ge, Yao
    Zhang, Haowen
    Yuan, Xiaojie
    INFORMATION SCIENCES, 2019, 503 : 469 - 492
  • [42] GLASS: A Graph Laplacian Autoencoder with Subspace Clustering Regularization for Graph Clustering
    Sun, Dengdi
    Liu, Liang
    Luo, Bin
    Ding, Zhuanlian
    COGNITIVE COMPUTATION, 2023, 15 (03) : 803 - 821
  • [43] Efficient distributed computation of distance sketches in networks
    Atish Das Sarma
    Michael Dinitz
    Gopal Pandurangan
    Distributed Computing, 2015, 28 : 309 - 320
  • [44] Deterministic Clustering in High Dimensional Spaces: Sketches and Approximation
    Cohen-Addad, Vincent
    Saulpic, David
    Schwiegelshohn, Chris
    2023 IEEE 64TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, FOCS, 2023, : 1105 - 1130
  • [45] Enel: Context-Aware Dynamic Scaling of Distributed Dataflow Jobs using Graph Propagation
    Scheinert, Dominik
    Zhu, Houkun
    Thamsen, Lauritz
    Geldenhuys, Morgan K.
    Will, Jonathan
    Acker, Alexander
    Kao, Odej
    2021 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE (IPCCC), 2021,
  • [46] Deterministic Clustering in High Dimensional Spaces: Sketches and Approximation
    Cohen-Addad, Vincent
    Saulpic, David
    Schwiegelshohn, Chris
    Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS, 2023, : 1105 - 1130
  • [47] Fuzzy graph clustering
    Peng, Yong
    Zhu, Xin
    Nie, Feiping
    Kong, Wanzeng
    Ge, Yuan
    INFORMATION SCIENCES, 2021, 571 : 38 - 49
  • [48] Improved Graph Clustering
    Chen, Yudong
    Sanghavi, Sujay
    Xu, Huan
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2014, 60 (10) : 6440 - 6455
  • [49] Clustering by Creating a Graph
    Wang, Yiwen
    Zhang, Haolan
    Huang, Ke
    Yu, Changbin
    Cui, Honglei
    PROCEEDINGS OF 2016 12TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2016, : 499 - 502
  • [50] Replicator Graph Clustering
    Donoser, Michael
    PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2013, 2013,