Automated Fact-Checking of Claims from Wikipedia

被引:0
|
作者
Sathe, Aalok [1 ]
Ather, Salar [1 ]
Tuan Manh Le [1 ]
Perry, Nathan [2 ]
Park, Joonsuk [1 ]
机构
[1] Univ Richmond, Dept Math & Comp Sci, Richmond, VA 23173 USA
[2] Williams Coll, Dept Comp Sci, Williamstown, MA 01267 USA
来源
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020) | 2020年
关键词
fact-checking; fact-verification; natural language inference; textual entailment; corpus;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Automated fact checking is becoming increasingly vital as both truthful and fallacious information accumulate online. Research on fact checking has benefited from large-scale datasets such as FEVER and SNLI. However, such datasets suffer from limited applicability due to the synthetic nature of claims and/or evidence written by annotators that differ from real claims and evidence on the internet. To this end, we present WIKIFACTCHECK-ENGLISH, a dataset of 124k+ triples consisting of a claim, context and an evidence document extracted from English Wikipedia articles and citations, as well as 34k+ manually written claims that are refuted by the evidence documents. This is the largest fact checking dataset consisting of real claims and evidence to date; it will allow the development of fact checking systems that can better process claims and evidence in the real world. We also show that for the NLI subtask, a logistic regression system trained using existing and novel features achieves peak accuracy of 68%, providing a competitive baseline for future work. Also, a decomposable attention model trained on SNLI significantly underperforms the models trained on this dataset, suggesting that models trained on manually generated data may not be sufficiently generalizable or suitable for fact checking real-world claims.
引用
收藏
页码:6874 / 6882
页数:9
相关论文
共 50 条
  • [21] ANALYSIS OF THE FACT-CHECKING INITIATIVES IN SPAIN
    Cardenas Rica, Maria Luisa
    REVISTA INCLUSIONES, 2019, 6 : 62 - 82
  • [22] TrumorGPT: Query Optimization and Semantic Reasoning over Networks for Automated Fact-Checking
    Hang, Ching Nam
    Yu, Pei-Duo
    Tan, Chee Wei
    2024 58TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS, CISS, 2024,
  • [23] Annotation and linguistic analysis of claim types for fact-checking
    Deck, Oliver
    Huesuenbeyi, Z. Melce
    Uhling, Leonie
    Scheffler, Tatjana
    LINGUISTICS VANGUARD, 2025,
  • [24] The Logics of Fact-Checking Website Operations
    Kim, Bumsoo
    Buzzelli, Nicholas R.
    DIGITAL JOURNALISM, 2022, : 1437 - 1460
  • [25] Human and Technological Infrastructures of Fact-checking
    Juneja P.
    Mitra T.
    Proceedings of the ACM on Human-Computer Interaction, 2022, 6 (CSCW2)
  • [26] FACT-CHECKING AS A KEY COMPETENCE IN INFODEMIA
    Bulganova, Diana
    MARKETING IDENTITY: COVID-2.0, 2020, : 54 - 61
  • [27] Why Do Fact-Checking Organizations Go Beyond Fact-Checking? A Leap Toward Media and Information Literacy Education
    Comlekci, Mehmet Fatih
    INTERNATIONAL JOURNAL OF COMMUNICATION, 2022, 16 : 4563 - 4583
  • [28] CAN FACT-CHECKING INFLUENCE USER BELIEFS ABOUT MISINFORMATION CLAIMS: AN EXAMINATION OF CONTINGENT EFFECTS
    Bhattacherjee, Anol
    MIS QUARTERLY, 2023, 47 (04) : 1679 - 1692
  • [29] QuanTemp: A real-world open-domain benchmark for fact-checking numerical claims
    Venktesh, V.
    Anand, Abhijit
    Anand, Avishek
    Setty, Vinay
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 650 - 660
  • [30] Communicating Fact to Combat Fake: Analysis of Fact-Checking Websites
    Pal, Anjan
    Loke, Cliff
    2019 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND COMPUTER COMMUNICATIONS (ITCC 2019), 2019, : 66 - 73