Walking in Facebook: A Case Study of Unbiased Sampling of OSNs

被引:344
作者
Gjoka, Minas [1 ]
Kurant, Maciej [2 ]
Butts, Carter T. [3 ]
Markopoulou, Athina [4 ]
机构
[1] Univ Calif Irvine, Irvine, CA 92717 USA
[2] Ecole Polytech Fed Lausanne, Sch Comp & Commun Sci, Lausanne, Switzerland
[3] Univ Calif Irvine, Sociol Dept, Irvine, CA USA
[4] Univ Calif Irvine, EECS Dept, Irvine, CA USA
来源
2010 PROCEEDINGS IEEE INFOCOM | 2010年
基金
美国国家科学基金会;
关键词
Measurements; online social networks; Facebook; graph sampling; crawling; bias;
D O I
10.1109/infcom.2010.5462078
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With more than 250 million active users [1], Facebook (FB) is currently one of the most important online social networks. Our goal in this paper is to obtain a representative (unbiased) sample of Facebook users by crawling its social graph. In this quest, we consider and implement several candidate techniques. Two approaches that are found to perform well are the Metropolis-Hasting random walk (MHRW) and a re-weighted random walk (RWRW). Both have pros and cons, which we demonstrate through a comparison to each other as well as to the "ground-truth" (UNI - obtained through true uniform sampling of FB userIDs). In contrast, the traditional Breadth-First-Search (BFS) and Random Walk (RW) perform quite poorly, producing substantially biased results. In addition to offline performance assessment, we introduce online formal convergence diagnostics to assess sample quality during the data collection process. We show how these can be used to effectively determine when a random walk sample is of adequate size and quality for subsequent use (i.e., when it is safe to cease sampling). Using these methods, we collect the first, to the best of our knowledge, unbiased sample of Facebook. Finally, we use one of our representative datasets, collected through MHRW, to characterize several key properties of Facebook.
引用
收藏
页数:9
相关论文
共 33 条
  • [1] [Anonymous], ANN MATH STAT
  • [2] [Anonymous], 2008, PROBABILITY STAT RAN
  • [3] [Anonymous], 2006, P ACM SIGKDD
  • [4] [Anonymous], 2009, Uniform sampling of facebook users
  • [5] [Anonymous], ARXIV08090960
  • [6] [Anonymous], 1992, BAYESIAN STAT
  • [7] [Anonymous], Inside facebook
  • [8] [Anonymous], 2007, P 16 INT C WWW BANFF
  • [9] [Anonymous], 1995, Markov Chain Monte Carlo in Practice
  • [10] Becchetti Luca, 2006, LINKKDD