Better Algorithms for Counting Triangles in Data Streams

被引:36
作者
McGregor, Andrew [1 ]
Vorotnikova, Sofya [1 ]
Vu, Hoa T. [1 ]
机构
[1] Univ Massachusetts, Amherst, MA 01003 USA
来源
PODS'16: PROCEEDINGS OF THE 35TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS | 2016年
关键词
data streams; triangles; clustering coefficients; GRAPH; SUBGRAPH;
D O I
10.1145/2902251.2902283
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present space-efficient data stream algorithms for approximating the number of triangles in a graph up to a factor 1 + epsilon. While it can be shown that determining whether a graph is triangle-free is not possible in sub-linear space, a large body of work has focused on minimizing the space required in terms of the number of triangles T (or a lower bound on this quantity) and other parameters including the number of nodes n and the number of edges m. Two models are important in the literature: the arbitrary order model in which the stream consists of the edges of the graph in arbitrary order and the adjacency list order model in which all edges incident to the same node appear consecutively. We improve over the state of the art results in both models. For the adjacency list order model, we show that (O) over tilde (c(-2)mR/root T) space is sufficient in one pass and (O) over tilde(epsilon(-2)m(3/2)/T) space is sufficient in two passes where the (O) over tilde(.) notation suppresses log factors. For the arbitrary order model, we show that (O) over tilde (epsilon(-2)m/root T) space suffices given two passes and that (O) over tilde(epsilon(-2)m(3/2)/T) space suffices given three passes and oracle access to the degrees. Finally, we show how to efficiently implement the "wedge sampling" approach to triangle estimation in the arbitrary order model. To do this, we develop the first algorithm for fp sampling such that multiple independent samples can be generated with O (polylog n) update time; this primitive is widely applicable and this result may be of independent interest.
引用
收藏
页码:401 / 411
页数:11
相关论文
共 44 条
  • [1] Finding and counting given length cycles
    Alon, N
    Yuster, R
    Zwick, U
    [J]. ALGORITHMICA, 1997, 17 (03) : 209 - 223
  • [2] Streaming Algorithms via Precision Sampling
    Andoni, Alexandr
    Krauthgamer, Robert
    Onak, Krzysztof
    [J]. 2011 IEEE 52ND ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS 2011), 2011, : 363 - 372
  • [3] [Anonymous], 2011, INT WORLD WIDE WEB C, DOI DOI 10.1145/1963405.1963491
  • [4] [Anonymous], 2005, Journal of Graph Algorithms and Applications, DOI DOI 10.7155/JGAA.00108
  • [5] [Anonymous], 2006, Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
  • [6] [Anonymous], 2008, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM
  • [7] [Anonymous], 2012, P S PRINC DAT SYST P
  • [8] [Anonymous], 2011, P 30 ACM SIGMOD SIGA
  • [9] Arge L., 2010, 24 INT PARALLEL DIST, P1
  • [10] PATRIC: A Parallel Algorithm for Counting Triangles in Massive Networks
    Arifuzzaman, Shaikh
    Khan, Maleq
    Marathe, Madhav
    [J]. PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 529 - 538