Cluster-preserving sampling algorithm for large-scale graphs

被引:4
作者
Zhang, Jianpeng [1 ]
Chen, Hongchang [1 ]
Yu, Dingjiu [1 ,2 ]
Pei, Yulong [3 ]
Deng, Yingjun [4 ]
机构
[1] Informat Engn Univ, Natl Digital Switching Syst E&T Res Ctr, Zhengzhou 450001, Peoples R China
[2] Network Syst Dept Strateg Support Force, Beijing 100091, Peoples R China
[3] Eindhoven Univ Technol, Sch Comp Sci & Technol, NL-5612 AE Eindhoven, Netherlands
[4] Tianjin Univ, Ctr Appl Math, Tianjin 300072, Peoples R China
基金
中国博士后科学基金;
关键词
graph sampling; clustering structure; top-leader nodes; expansion strategies; large-scale graphs; NETWORKS;
D O I
10.1007/s11432-021-3370-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Graph sampling is a very effective method to deal with scalability issues when analyzing large-scale graphs. Lots of sampling algorithms have been proposed, and sampling qualities have been quantified using explicit properties (e.g., degree distribution) of the sample. However, the existing sampling techniques are inadequate for the current sampling task: sampling the clustering structure, which is a crucial property of the current networks. In this paper, using different expansion strategies, two novel top-leader sampling methods (i.e., TLS-e and TLS-i) are proposed to obtain representative samples, and they are capable of effectively preserving the clustering structure. The rationale behind them is to select top-leader nodes of most clusters into the sample and then heuristically incorporate peripheral nodes into the sample using specific expansion strategies. Extensive experiments are conducted to investigate how well sampling techniques preserve the clustering structure of graphs. Our empirical results show that the proposed sampling algorithms can preserve the population's clustering structure well and provide feasible solutions to sample the clustering structure from large-scale graphs.
引用
收藏
页数:17
相关论文
共 26 条
  • [1] Network Sampling: From Static to Streaming Graphs
    Ahmed, Nesreen K.
    Neville, Jennifer
    Kompella, Ramana
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2014, 8 (02)
  • [2] Emergence of scaling in random networks
    Barabási, AL
    Albert, R
    [J]. SCIENCE, 1999, 286 (5439) : 509 - 512
  • [3] Fast unfolding of communities in large networks
    Blondel, Vincent D.
    Guillaume, Jean-Loup
    Lambiotte, Renaud
    Lefebvre, Etienne
    [J]. JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
  • [4] Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
    Emmons, Scott
    Kobourov, Stephen
    Gallant, Mike
    Borner, Katy
    [J]. PLOS ONE, 2016, 11 (07):
  • [5] Community detection in networks: A user guide
    Fortunato, Santo
    Hric, Darko
    [J]. PHYSICS REPORTS-REVIEW SECTION OF PHYSICS LETTERS, 2016, 659 : 1 - 44
  • [6] Community detection in networks: Structural communities versus ground truth
    Hric, Darko
    Darst, Richard K.
    Fortunato, Santo
    [J]. PHYSICAL REVIEW E, 2014, 90 (06)
  • [7] GraphSDH: A General Graph Sampling Framework with Distribution and Hierarchy
    Hu, Jingbo
    Dai, Guohao
    Wang, Yu
    Yang, Huazhong
    [J]. 2020 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2020,
  • [8] Metropolis Algorithms for Representative Subgraph Sampling
    Huebler, Christian
    Kriegel, Hans-Peter
    Borgwardt, Karsten
    Ghahramani, Zoubin
    [J]. ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 283 - +
  • [9] Graph sampling for Internet topologies using normalized Laplacian spectral features
    Jiao, Bo
    Shi, Jianmai
    Zhang, Wensheng
    Xing, Lining
    [J]. INFORMATION SCIENCES, 2019, 481 : 574 - 603
  • [10] On clusterings: Good, bad and spectral
    Kannan, R
    Vempala, S
    Vetta, A
    [J]. JOURNAL OF THE ACM, 2004, 51 (03) : 497 - 515