GUISE: Uniform Sampling of Graphlets for Large Graph Analysis

被引:65
作者
Bhuiyan, Mansurul A. [1 ]
Rahman, Mahmudur [1 ]
Rahman, Mahmuda [2 ]
Al Hasan, Mohammad [1 ]
机构
[1] Indiana Univ Purdue Univ, Dept Comp Sci, Indianapolis, IN 46202 USA
[2] Syracuse Univ, Dept Comp Sci, New York, NY USA
来源
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012) | 2012年
关键词
D O I
10.1109/ICDM.2012.87
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Graphlet frequency distribution (GFD) has recently become popular for characterizing large networks. However, the computation of GFD for a network requires the exact count of embedded graphlets in that network, which is a computationally expensive task. As a result, it is practically infeasible to compute the GFD for even a moderately large network. In this paper, we propose GUISE, which uses a Markov Chain Monte Carlo (MCMC) sampling method for constructing the approximate GFD of a large network. Our experiments on networks with millions of nodes show that GUISE obtains the GFD within few minutes, whereas the exhaustive counting based approach takes several days.
引用
收藏
页码:91 / 100
页数:10
相关论文
共 21 条
  • [1] [Anonymous], P 2 NSF NIJ S INT SE
  • [2] [Anonymous], SCIENCE
  • [3] [Anonymous], 1995, RANDOMIZE ALGORITHMS
  • [4] [Anonymous], 2011, SOCIAL NETWORK DATA
  • [5] Network Analysis in the Social Sciences
    Borgatti, Stephen P.
    Mehra, Ajay
    Brass, Daniel J.
    Labianca, Giuseppe
    [J]. SCIENCE, 2009, 323 (5916) : 892 - 895
  • [6] Chung RK, 1997, SPECTRAL GRAPH THEOR
  • [7] Eberle W, 2009, P 5 ANN WORKSH CYB S
  • [8] Faloutsos M, 1999, COMP COMM R, V29, P251, DOI 10.1145/316194.316229
  • [9] KASHTAN N, 2004, BIOINFORMATICS
  • [10] GraphCrunch 2: Software tool for network modeling, alignment and clustering
    Kuchaiev, Oleksii
    Stevanovic, Aleksandar
    Hayes, Wayne
    Przulj, Natasa
    [J]. BMC BIOINFORMATICS, 2011, 12