Counting Graphlets: Space vs Time

被引:48
作者
Bressan, Marco [1 ]
Chierichetti, Flavio [1 ,3 ]
Kumar, Ravi [2 ]
Leucci, Stefano [1 ]
Panconesi, Alessandro [1 ]
机构
[1] Sapienza Univ Rome, Dipartimento Informat, Rome, Italy
[2] Google Res, Mountain View, CA USA
[3] Google, Mountain View, CA USA
来源
WSDM'17: PROCEEDINGS OF THE TENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING | 2017年
关键词
D O I
10.1145/3018661.3018732
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Counting graphlets is a well-studied problem in graph mining and social network analysis. Recently, several papers explored very simple and natural approaches based on Monte Carlo sampling of Markov Chains (MC), and reported encouraging results. We show, perhaps surprisingly, that this approach is outperformed by a carefully engineered version of color coding (CC) [1], a sophisticated algorithmic technique that we extend to the case of graphlet sampling and for which we prove strong statistical guarantees. Our computational experiments on graphs with millions of nodes show CC to be more accurate than MC. Furthermore, we formally show that the mixing time of the MC approach is too high in general, even when the input graph has high conductance. All this comes at a price however. While MC is very efficient in terms of space, CC's memory requirements become demanding when the size of the input graph and that of the graphlets grow. And yet, our experiments show that a careful implementation of CC can push the limits of the state of the art, both in terms of the size of the input graph and of that of the graphlets.
引用
收藏
页码:557 / 566
页数:10
相关论文
共 26 条
  • [1] COLOR-CODING
    ALON, N
    YUSTER, R
    ZWICK, U
    [J]. JOURNAL OF THE ASSOCIATION FOR COMPUTING MACHINERY, 1995, 42 (04): : 844 - 856
  • [2] [Anonymous], 2011, INT WORLD WIDE WEB C, DOI DOI 10.1145/1963405.1963491
  • [3] [Anonymous], ARXIV E PRINTS
  • [4] [Anonymous], 2009, American Mathematical Soc.
  • [5] [Anonymous], 2009, KDD09 15 ACM SIGKDD
  • [6] [Anonymous], 2016, CORR
  • [7] GUISE: Uniform Sampling of Graphlets for Large Graph Analysis
    Bhuiyan, Mansurul A.
    Rahman, Mahmudur
    Rahman, Mahmuda
    Al Hasan, Mohammad
    [J]. 12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, : 91 - 100
  • [8] Boldi P., 2004, P 13 INT C WORLD WID, P595, DOI DOI 10.1145/988672.988752
  • [9] Boldi P., 2011, P 20 INT C WORLD WID, P587, DOI [DOI 10.1145/1963405.1963488, 10.1145/1963405.1963488.]
  • [10] Subgraph Counting: Color Coding Beyond Trees
    Chakaravarthy, Venkatesan T.
    Kapralov, Michael
    Murali, Prakash
    Petrini, Fabrizio
    Que, Xinyu
    Sabharwal, Yogish
    Schieber, Baruch
    [J]. 2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, : 2 - 11