Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes

被引:165
作者
Kumar, Srijan [1 ,2 ]
West, Robert [2 ]
Leskovec, Jure [2 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
[2] Stanford Univ, Stanford, CA 94305 USA
来源
PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16) | 2016年
基金
美国国家科学基金会;
关键词
MISINFORMATION;
D O I
10.1145/2872427.2883085
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Wikipedia is a major source of information for many people. However, false information on Wikipedia raises concerns about its credibility. One way in which false information may be presented on Wikipedia is in the form of hoax articles, i.e., articles containing fabricated facts about nonexistent entities or events. In this paper we study false information on Wikipedia by focusing on the hoax articles that have been created throughout its history. We make several contributions. First, we assess the real-world impact of hoax articles by measuring how long they survive before being debunked, how many pageviews they receive, and how heavily they are referred to by documents on the Web. We find that, while most hoaxes are detected quickly and have little impact on Wikipedia, a small number of hoaxes survive long and are well cited across the Web. Second, we characterize the nature of successful hoaxes by comparing them to legitimate articles and to failed hoaxes that were discovered shortly after being created. We find characteristic differences in terms of article structure and content, embeddedness into the rest of Wikipedia, and features of the editor who created the hoax. Third, we successfully apply our findings to address a series of classification tasks, most notably to determine whether a given article is a hoax. And finally, we describe and evaluate a task involving humans distinguishing hoaxes from non-hoaxes. We find that humans are not good at solving this task and that our automated classifier outperforms them by a big margin.
引用
收藏
页码:591 / 602
页数:12
相关论文
共 39 条
[1]   Spread of (mis)information in social networks [J].
Acemoglu, Daron ;
Ozdaglar, Asuman ;
ParandehGheibi, Ali .
GAMES AND ECONOMIC BEHAVIOR, 2010, 70 (02) :194-227
[2]  
Adler B. T., 2007, WWW
[3]  
[Anonymous], 2014, Infopreneurship Journal
[4]  
[Anonymous], 2011, EMNLP
[5]  
[Anonymous], 2013, ICDM
[6]  
[Anonymous], 2011, WWW
[7]  
[Anonymous], 2014, ICWSM
[8]  
Blumenstock J.E., 2008, WWW
[9]  
Breiman L., 2001, Machine Learning, V45, P5
[10]  
Budak C., 2011, WWW