Proppy: Organizing the news based on their propagandistic content

被引:96
作者
Barron-Cedeno, Alberto [1 ,3 ]
Jaradat, Israa [2 ,3 ]
Da San Martino, Giovanni [3 ]
Nakov, Preslav [3 ]
机构
[1] Univ Bologna, Forli, Italy
[2] Univ Texas Arlington, Arlington, TX 76019 USA
[3] HBKU, Qatar Comp Res Inst, Doha, Qatar
关键词
Propaganda detection; News bias; Investigative journalism;
D O I
10.1016/j.ipm.2019.03.005
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Propaganda is a mechanism to influence public opinion, which is inherently present in extremely biased and fake news. Here, we propose a model to automatically assess the level of propagandistic content in an article based on different representations, from writing style and readability level to the presence of certain keywords. We experiment thoroughly with different variations of such a model on a new publicly available corpus, and we show that character n-grams and other style features outperform existing alternatives to identify propaganda based on word n-grams. Unlike previous work, we make sure that the test data comes from news sources that were unseen on training, thus penalizing learning algorithms that model the news sources used at training time as opposed to solving the actual task. We integrate our supervised model in a public website, which organizes recent articles covering the same event on the basis of their propagandistic contents. This allows users to quickly explore different perspectives of the same story, and it also enables investigative journalists to dig further into how different media use stories and propaganda to pursue their agenda.
引用
收藏
页码:1849 / 1864
页数:16
相关论文
共 58 条
[1]  
[Anonymous], 1944, The Statistical Study of Literary Vocabulary
[2]  
[Anonymous], 2012, PROPAGANDA PERSUASIO
[3]  
[Anonymous], P INT WORKSH NEWS PU
[4]  
[Anonymous], ABS170205638 CORR
[5]  
[Anonymous], 2008, Introduction to information retrieval
[6]  
[Anonymous], 1938, Publications of the Institute for Propaganda Analysis, P210
[7]  
[Anonymous], 2011, EVALUATING LEARNING, DOI DOI 10.1017/CBO9780511921803
[8]  
[Anonymous], 2009, NATURAL LANGUAGE PRO
[9]   VERA: A Platform for Veracity Estimation over Web Data [J].
Ba, Mouhamadou Lamine ;
Berti-Equille, Laure ;
Shah, Kushal ;
Hammady, Hossam M. .
PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16 COMPANION), 2016, :159-162
[10]   Bias on the Web [J].
Baeza-Yates, Ricardo .
COMMUNICATIONS OF THE ACM, 2018, 61 (06) :54-61