Cluster-preserving sampling algorithm for large-scale graphs

被引：4

作者：

Zhang, Jianpeng ^{[1
]}

Chen, Hongchang ^{[1
]}

Yu, Dingjiu ^{[1
,2
]}

Pei, Yulong ^{[3
]}

Deng, Yingjun ^{[4
]}

机构：

[1] Informat Engn Univ, Natl Digital Switching Syst E&T Res Ctr, Zhengzhou 450001, Peoples R China

[2] Network Syst Dept Strateg Support Force, Beijing 100091, Peoples R China

[3] Eindhoven Univ Technol, Sch Comp Sci & Technol, NL-5612 AE Eindhoven, Netherlands

[4] Tianjin Univ, Ctr Appl Math, Tianjin 300072, Peoples R China

来源：

SCIENCE CHINA-INFORMATION SCIENCES | 2023年 / 66卷 / 01期

基金：

中国博士后科学基金;

关键词：

graph sampling; clustering structure; top-leader nodes; expansion strategies; large-scale graphs; NETWORKS;

D O I：

10.1007/s11432-021-3370-4

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Graph sampling is a very effective method to deal with scalability issues when analyzing large-scale graphs. Lots of sampling algorithms have been proposed, and sampling qualities have been quantified using explicit properties (e.g., degree distribution) of the sample. However, the existing sampling techniques are inadequate for the current sampling task: sampling the clustering structure, which is a crucial property of the current networks. In this paper, using different expansion strategies, two novel top-leader sampling methods (i.e., TLS-e and TLS-i) are proposed to obtain representative samples, and they are capable of effectively preserving the clustering structure. The rationale behind them is to select top-leader nodes of most clusters into the sample and then heuristically incorporate peripheral nodes into the sample using specific expansion strategies. Extensive experiments are conducted to investigate how well sampling techniques preserve the clustering structure of graphs. Our empirical results show that the proposed sampling algorithms can preserve the population's clustering structure well and provide feasible solutions to sample the clustering structure from large-scale graphs.

引用

页数：17

共 26 条

[1] Network Sampling: From Static to Streaming Graphs
Ahmed, Nesreen K.
Neville, Jennifer
Kompella, Ramana
[J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2014, 8 (02)
[2] Emergence of scaling in random networks
Barabási, AL
Albert, R
[J]. SCIENCE, 1999, 286 (5439) : 509 - 512
[3] Fast unfolding of communities in large networks
Blondel, Vincent D.
Guillaume, Jean-Loup
Lambiotte, Renaud
Lefebvre, Etienne
[J]. JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
[4] Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Emmons, Scott
Kobourov, Stephen
Gallant, Mike
Borner, Katy
[J]. PLOS ONE, 2016, 11 (07):
[5] Community detection in networks: A user guide
Fortunato, Santo
Hric, Darko
[J]. PHYSICS REPORTS-REVIEW SECTION OF PHYSICS LETTERS, 2016, 659 : 1 - 44
[6] Community detection in networks: Structural communities versus ground truth
Hric, Darko
Darst, Richard K.
Fortunato, Santo
[J]. PHYSICAL REVIEW E, 2014, 90 (06)
[7] GraphSDH: A General Graph Sampling Framework with Distribution and Hierarchy
Hu, Jingbo
Dai, Guohao
Wang, Yu
Yang, Huazhong
[J]. 2020 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2020,
[8] Metropolis Algorithms for Representative Subgraph Sampling
Huebler, Christian
Kriegel, Hans-Peter
Borgwardt, Karsten
Ghahramani, Zoubin
[J]. ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 283 - +
[9] Graph sampling for Internet topologies using normalized Laplacian spectral features
Jiao, Bo
Shi, Jianmai
Zhang, Wensheng
Xing, Lining
[J]. INFORMATION SCIENCES, 2019, 481 : 574 - 603
[10] On clusterings: Good, bad and spectral
Kannan, R
Vempala, S
Vetta, A
[J]. JOURNAL OF THE ACM, 2004, 51 (03) : 497 - 515

← 1 2 3 →