Early detection of promoted campaigns on social media

被引:0
作者
Onur Varol
Emilio Ferrara
Filippo Menczer
Alessandro Flammini
机构
[1] Indiana University,School of Informatics and Computing
[2] University of Southern California,Information Sciences Institute
[3] Indiana University Network Science Institute,undefined
来源
EPJ Data Science | / 6卷
关键词
social media; information campaigns; advertising; early detection;
D O I
暂无
中图分类号
学科分类号
摘要
Social media expose millions of users every day to information campaigns - some emerging organically from grassroots activity, others sustained by advertising or other coordinated efforts. These campaigns contribute to the shaping of collective opinions. While most information campaigns are benign, some may be deployed for nefarious purposes, including terrorist propaganda, political astroturf, and financial market manipulation. It is therefore important to be able to detect whether a meme is being artificially promoted at the very moment it becomes wildly popular. This problem has important social implications and poses numerous technical challenges. As a first step, here we focus on discriminating between trending memes that are either organic or promoted by means of advertisement. The classification is not trivial: ads cause bursts of attention that can be easily mistaken for those of organic trends. We designed a machine learning framework to classify memes that have been labeled as trending on Twitter. After trending, we can rely on a large volume of activity data. Early detection, occurring immediately at trending time, is a more challenging problem due to the minimal volume of activity data that is available prior to trending. Our supervised learning framework exploits hundreds of time-varying features to capture changing network and diffusion patterns, content and sentiment information, timing signals, and user meta-data. We explore different methods for encoding feature time series. Using millions of tweets containing trending hashtags, we achieve 75% AUC score for early detection, increasing to above 95% after trending. We evaluate the robustness of the algorithms by introducing random temporal shifts on the trend time series. Feature selection analysis reveals that content cues provide consistently useful signals; user features are more informative for early detection, while network and timing features are more helpful once more data is available.
引用
收藏
相关论文
共 133 条
  • [1] Bond RM(2012)A 61-million-person experiment in social influence and political mobilization Nature 489 295-298
  • [2] Fariss CJ(2016)The rise of social bots Commun ACM 59 96-104
  • [3] Jones JJ(2016)Social bots distort the 2016 us presidential election online discussion First Monday 21 232-238
  • [4] Kramer AD(2015)Science vs conspiracy: collective narratives in the age of misinformation PLoS ONE 10 1-8
  • [5] Marlow C(2015)Computational fact checking from knowledge networks PLoS ONE 10 1191-1207
  • [6] Settle JE(2016)Tweets as impact indicators: examining the implications of automated “bot” accounts on Twitter J Assoc Inf Sci Technol 67 1157-1182
  • [7] Fowler JH(2013)Virality prediction and community structure in social networks Sci Rep 3 861-874
  • [8] Ferrara E(2004)Endogenous versus exogenous shocks in complex networks: an empirical test using book sale rankings Phys Rev Lett 93 358-386
  • [9] Varol O(2013)The Twitter of Babel: mapping world languages through microblogging platforms PLoS ONE 8 21-27
  • [10] Davis C(2015)Quantifying crowd size with mobile phone and Twitter data R Soc Open Sci 2 107-144