How to Count Triangles, without Seeing the Whole Graph

被引:19
作者
Bera, Suman K. [1 ]
Seshadhri, C. [1 ]
机构
[1] UC Santa Cruz, Santa Cruz, CA 95064 USA
来源
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING | 2020年
关键词
Triangle counting; Graph sampling; Random walks; STREAMING ALGORITHMS;
D O I
10.1145/3394486.3403073
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Triangle counting is a fundamental problem in the analysis of large graphs. There is a rich body of work on this problem, in varying streaming and distributed models, yet all these algorithms require reading the whole input graph. In many scenarios, we do not have access to the whole graph, and can only sample a small portion of the graph (typically through crawling). In such a setting, how can we accurately estimate the triangle count of the graph? We formally study triangle counting in the random walk access model introduced by Dasgupta et al (WWW '14) and Chierichetti et al (WWW '16). We have access to an arbitrary seed vertex of the graph, and can only perform random walks. This model is restrictive in access and captures the challenges of collecting realworld graphs. Even sampling a uniform random vertex is a hard task in this model. Despite these challenges, we design a provable and practical algorithm, TETRIS, for triangle counting in this model. TETRIS is the first provably sublinear algorithm (for most natural parameter settings) that approximates the triangle count in the random walk model, for graphs with low mixing time. Our result builds on recent advances in the theory of sublinear algorithms. The final sample built by TETRIS is a careful mix of random walks and degree-biased sampling of neighborhoods. Empirically, TETRIS accurately counts triangles on a variety of large graphs, getting estimates within 5% relative error by looking at 3% of the number of edges.
引用
收藏
页码:306 / 316
页数:11
相关论文
共 56 条
  • [31] Katzir Liran., 2011, Proceedings of the 20th International Conference Companion on World Wide Web, P597, DOI DOI 10.1145/1963405.1963489
  • [32] Khan A., 2011, SIGMOD 11, P901
  • [33] DUALSIM: Parallel Subgraph Enumeration in a Massive Graph on a Single Machine
    Kim, Hyeonji
    Lee, Juneyoung
    Bhowmick, Sourav S.
    Han, Wook-Shin
    Lee, JeongHoon
    Ko, Seongyun
    Jarrah, Moath H. A.
    [J]. SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 1231 - 1245
  • [34] COUNTING TRIANGLES IN MASSIVE GRAPHS WITH MAPREDUCE
    Kolda, Tamara G.
    Pinar, Ali
    Plantenga, Todd
    Seshadhri, C.
    Task, Christine
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2014, 36 (05) : S48 - S77
  • [35] Efficient Triangle Counting in Large Graphs via Degree-Based Vertex Partitioning
    Kolountzakis, Mihail N.
    Miller, Gary L.
    Peng, Richard
    Tsourakakis, Charalampos E.
    [J]. INTERNET MATHEMATICS, 2012, 8 (1-2) : 161 - 185
  • [36] Leskovec J., 2006, KDD
  • [37] Maiya A.S., 2011, P 17 ACM SIGKDD INT, P105
  • [38] Better Algorithms for Counting Triangles in Data Streams
    McGregor, Andrew
    Vorotnikova, Sofya
    Vu, Hoa T.
    [J]. PODS'16: PROCEEDINGS OF THE 35TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2016, : 401 - 411
  • [39] Network motifs: Simple building blocks of complex networks
    Milo, R
    Shen-Orr, S
    Itzkovitz, S
    Kashtan, N
    Chklovskii, D
    Alon, U
    [J]. SCIENCE, 2002, 298 (5594) : 824 - 827
  • [40] The structure and function of complex networks
    Newman, MEJ
    [J]. SIAM REVIEW, 2003, 45 (02) : 167 - 256