On Sampling the Wisdom of Crowds: Random vs. Expert Sampling of the Twitter Stream

被引:28
作者
Ghosh, Saptarshi [1 ,2 ]
Zafar, Muhammad Bilal [2 ]
Bhattacharya, Parantapa [1 ,2 ]
Sharma, Naveen [3 ]
Ganguly, Niloy [1 ]
Gummadi, Krishna P. [2 ]
机构
[1] IIT Kharagpur, Kharagpur, W Bengal, India
[2] MPI SWS, Saarbrucken, Germany
[3] Univ Washington, Seattle, WA 98195 USA
来源
PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13) | 2013年
关键词
Sampling content streams; Twitter; random sampling; sampling from experts; Twitter Lists;
D O I
10.1145/2505515.2505615
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Several applications today rely upon content streams crowd-sourced from online social networks. Since real-time processing of large amounts of data generated on these sites is difficult, analytics companies and researchers are increasingly resorting to sampling. In this paper, we investigate the crucial question of how to sample the data generated by users in social networks. The traditional method is to randomly sample all the data. We analyze a different sampling methodology, where content is gathered only from a relatively small subset (< 1%) of the user population namely, the expert users. Over the duration of a month, we gathered tweets from over 500,000 Twitter users who are identified as experts on a diverse set of topics, and compared the resulting expert-sampled tweets with the 1% randomly sampled tweets provided publicly by Twitter. We compared the sampled datasets along several dimensions, including the diversity, timeliness, and trustworthiness of the information contained within them, and find important differences between the datasets. Our observations have major implications for applications such as topical search, trustworthy content recommendations, and breaking news detection.
引用
收藏
页码:1739 / 1744
页数:6
相关论文
共 13 条
[1]  
[Anonymous], 2013, 7 INT AAAI C WEBL SO
[2]  
[Anonymous], 2012, TWITTER NOW AVERAGIN
[3]  
[Anonymous], 2009, P 17 ACM SIGSP INT C
[4]  
Ardon S., 2011, Proceedings of the 22nd ACM international conference on Information Knowledge Management, P219
[5]  
Cha Meeyoung, 2010, ICWSM 2010, V4
[6]  
Choudhury M. D., 2011, P ICWSM
[7]   EFFICIENT CAPITAL MARKETS - REVIEW OF THEORY AND EMPIRICAL WORK [J].
FAMA, EF .
JOURNAL OF FINANCE, 1970, 25 (02) :383-423
[8]  
Ghosh S, 2012, SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P575, DOI 10.1145/2348283.2348361
[9]   @spam: The Underground on 140 Characters or Less [J].
Grier, Chris ;
Thomas, Kurt ;
Paxson, Vern ;
Zhang, Michael .
PROCEEDINGS OF THE 17TH ACM CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (CCS'10), 2010, :27-37
[10]  
Lin J., 2011, P 17 ACM SIGKDD INT, P422, DOI [10.1145/2020408.2020476, DOI 10.1145/2020408.2020476]