Cross-Domain Failures of Fake News Detection

被引:3
作者
Janicka, Maria [1 ]
Pszona, Maria [1 ]
Wawer, Aleksander [1 ]
机构
[1] Samsung R&D Inst Poland, Warsaw, Poland
来源
COMPUTACION Y SISTEMAS | 2019年 / 23卷 / 03期
关键词
Fake news detection; cross-domain; cross-domain failures;
D O I
10.13053/CyS-23-3-3281
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Fake news recognition has become a prominent research topic in natural language processing. Researchers reported significant successes when applying methods based on various stylometric and lexical features and machine learning, with accuracy reaching 90%. This article is focused on answering the question: are the fake news detection models universally applicable or limited to the domain they have been trained on? We used four different, freely available English language Fake News corpora and trained models in both in-domain and cross-domain setting. We also explored and compared features important in each domain. We found that the performance in cross-domain setting degrades by 20% and sets of features important to detect fake texts differ between domains. Our conclusions support the hypothesis that high accuracy of machine learning models applied to fake news detection may be related to over-fitting, and models need to be trained and evaluated on mixed types of texts.
引用
收藏
页码:1089 / 1097
页数:9
相关论文
共 22 条
[1]  
ANDERSON J, 1983, J READING, V26, P490
[2]   Language of lies in prison: Linguistic classification of prisoners' truthful and deceptive natural language [J].
Bond, GD ;
Lee, AY .
APPLIED COGNITIVE PSYCHOLOGY, 2005, 19 (03) :313-329
[3]  
Chall J. S., 1995, Readability Revisited: The New Dale-Chall Readability Formula
[4]   COMPUTER READABILITY FORMULA DESIGNED FOR MACHINE SCORING [J].
COLEMAN, M ;
LIAU, TL .
JOURNAL OF APPLIED PSYCHOLOGY, 1975, 60 (02) :283-284
[5]   The spreading of misinformation online [J].
Del Vicario, Michela ;
Bessi, Alessandro ;
Zollo, Fabiana ;
Petroni, Fabio ;
Scala, Antonio ;
Caldarelli, Guido ;
Stanley, H. Eugene ;
Quattrociocchi, Walter .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (03) :554-559
[6]  
Flesch R., 2007, FLESCH KINCAID READA, V26, P2007
[7]  
Gunning R, 1969, J BUS COMMUN, V6, P3, DOI [DOI 10.1177/002194366900600202, 10.1177/002194366900600202]
[8]   Novel Visual and Statistical Image Features for Microblogs News Verification [J].
Jin, Zhiwei ;
Cao, Juan ;
Zhang, Yongdong ;
Zhou, Jianshe ;
Tian, Qi .
IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (03) :598-608
[9]  
Kelly EdwardF., 1975, Computer recognition of english word senses, V13
[10]  
Kleinberg B, 2017, P INT C COMP LING