FactExtract: Automatic Collection and Aggregation of Articles and Journalistic Factual Claims from Online Newspaper

被引:0
作者
Sarr, Edouard Ngor [1 ]
Sall, Ousmane [1 ]
Diallo, Aminata [2 ]
机构
[1] Univ Thies, Ecole Doctorale Dev Durable & Soc ED2DS, Thies, Senegal
[2] Univ Thies, UFR Sci & Technol SET, Thies, Senegal
来源
2018 FIFTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORKS ANALYSIS, MANAGEMENT AND SECURITY (SNAMS) | 2018年
关键词
Fact-checking; Data-journalism; !text type='JAVA']JAVA[!/text] JAR; Web Scraping; Newspaper; TAL; Data Fusion;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays the web is very rich that's why it is the principal corpus of fact checkers. These journalists in their quest for the truth daily scan manually a lot of sources in search of relevant informations. But in the data-journalism context where data are always increasing in real time, this manual exploration became impossible and more and more laborious. One of the actual used techniques for this problem is the automatic extraction of information on web pages more known as "web scraping". The principal goal of web scraping is to retrieve on a web page. The main purpose of web scraping is to bring out a web page, specific and highly structured data with a reduced human effort. In this paper, we present an automatic extractor of articles and journalistic claims implemented on 15 Senegalese news websites.
引用
收藏
页码:336 / 341
页数:6
相关论文
共 19 条
[1]  
[Anonymous], 2017, P 15 INT S INF SCI
[2]  
CHAKRABARTI SOUMEN, 2003, MINING THE WEB
[3]  
Charrad M, 2005, TRANSFORMATION, V56, P5
[4]  
Desmontils E, 2002, P JOURN ASCNRS WEB S
[5]  
Fotsoh Tawaofaing A, 2018, THESIS
[6]  
FullFact, 2016, STAT AUT FACTCH
[7]  
HANRETTY CHRIS, 2013, SCRAPING WEB ARTS HU
[8]  
Heiden S., 2010, P 10 INT C JOURNEES, P1021
[9]  
Lejeune G, 2015, TRAITEMENT AUTOMATIQ
[10]  
Nimbalkar P. P, SURVEY DATA EXTRACTI