Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts

被引:375
作者
Zhao, Zhe [1 ]
Resnick, Paul [2 ]
Mei, Qiaozhu [2 ]
机构
[1] Univ Michigan, Dept EECS, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Sch Informat, Ann Arbor, MI 48109 USA
来源
PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW 2015) | 2015年
基金
美国国家科学基金会;
关键词
Rumor Detection; Enquiry Tweets; Social Media;
D O I
10.1145/2736277.2741637
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many previous techniques identify trending topics in social media, even topics that are not pre-defined. We present a technique to identify trending rumors, which we define as topics that include disputed factual claims. Putting aside any attempt to assess whether the rumors are true or false, it is valuable to identify trending rumors as early as possible. It is extremely difficult to accurately classify whether every individual post is or is not making a disputed factual claim. We are able to identify trending rumors by recasting the problem as finding entire clusters of posts whose topic is a disputed factual claim. The key insight is that when there is a rumor, even though most posts do not raise questions about it, there may be a few that do. If we can find signature text phrases that are used by a few people to express skepticism about factual claims and are rarely used to express anything else, we can use those as detectors for rumor clusters. Indeed, we have found a few phrases that seem to be used exactly that way, including: "Is this true?", "Really?", and "What?". Relatively few posts related to any particular rumor use any of these enquiry phrases, but lots of rumor diffusion processes have some posts that do and have them quite early in the diffusion. We have developed a technique based on searching for the enquiry phrases, clustering similar posts together, and then collecting related posts that do not contain these simple phrases. We then rank the clusters by their likelihood of really containing a disputed factual claim. The detector, which searches for the very rare but very informative phrases, combined with clustering and a classifier on the clusters, yields surprisingly good performance. On a typical day of Twitter, about a third of the top 50 clusters were judged to be rumors, a high enough precision that human analysts might be willing to sift through them.
引用
收藏
页码:1395 / 1405
页数:11
相关论文
共 37 条
[1]  
[Anonymous], 2011, ICWSM
[2]  
[Anonymous], SPIE DEFENSE SECUR S
[3]  
[Anonymous], 2011, Fifth International AAAI Conference on Weblogs and Social Media, DOI 10.1609/icwsm.v5i1.14127
[4]  
[Anonymous], 2010, Proceedings of the 2010 international conference on Management of data
[5]  
[Anonymous], 2012, P ACM 2012 C COMP SU, DOI [DOI 10.1145/2145204.2145274, 10.1145/2145204.2145274]
[6]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[7]   On the resemblance and containment of documents [J].
Broder, AZ .
COMPRESSION AND COMPLEXITY OF SEQUENCES 1997 - PROCEEDINGS, 1998, :21-29
[8]  
Caruana R, 2006, ICML 06: proceedings of the 23rd International Conference on Machine Learning, P161, DOI [DOI 10.1145/1143844.1143865, 10.1145/1143844.1143865.]
[9]  
Castillo C., 2011, P 20 INT C WORLD WID, P675, DOI [10.1145/1963405.1963500, DOI 10.1145/1963405.1963500]
[10]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)