Validation and Extraction of Reliable Information Through Automated Scraping and Natural Language Inference

被引:0
作者
Shah, Arjun [1 ]
Shah, Hetansh [1 ]
Bafna, Vedica [1 ]
Khandor, Charmi [1 ]
Nair, Sindhu [1 ]
机构
[1] DJ Sanghvi Coll Engn, Dept Comp Engn, Mumbai, India
关键词
Fake news detection; Machine Learning; Natural Language Processing; Web-Scraping; Natural Language Inference; Language models;
D O I
10.1016/j.engappai.2025.110284
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In today's day and age where information is spread rapidly through online platforms, the rise of fake news poses an alarming threat to the integrity of public discourse, societal trust, and reputed news sources. Classical machine learning and transformer-based models have been studied extensively for claim verification, however, they are hampered by their reliance on static training data and cannot generalize on unseen headlines. To address these challenges, we propose our explainable solution, which leverages web-information retrieval techniques and Natural Language Inference models to verify the veracity of a news headline. We evaluate our solution on a diverse self-curated evaluation dataset spanning multiple news channels and domains. Our best-performing pipeline achieves an accuracy of 84.3% surpassing the best classical Machine Learning model by 33.3% and Bidirectional Encoder Representations from Transformers by 31.0%. Utilizing hardware accelerators our pipelines achieve end-to-end fact verification inference times ranging from 2.92-6.97 seconds. Our approach highlights the efficacy of combining dynamic information retrieval with Natural Language Inference to find support fora claimed headline in the corresponding externally retrieved knowledge. The respective code, dataset, and results of this study are available in our artifact: https://github.com/Arjun254/VERITAS-NLI.
引用
收藏
页数:15
相关论文
共 64 条
  • [1] Abdin M, 2024, Arxiv, DOI [arXiv:2404.14219, 10.48550/arXiv.2404.14219, DOI 10.48550/ARXIV.2404.14219]
  • [2] Aggarwal A., 2020, Classification of fake news by fine-tuning deep bidirectional transformers based language model, DOI [10.4108/eai.13-7-2018.163973, DOI 10.4108/EAI.13-7-2018.163973]
  • [3] Fake News Detection Using Machine Learning Ensemble Methods
    Ahmad, Iftikhar
    Yousaf, Muhammad
    Yousaf, Suhail
    Ahmad, Muhammad Ovais
    [J]. COMPLEXITY, 2020, 2020
  • [4] Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques
    Ahmed, Hadeer
    Traore, Issa
    Saad, Sherif
    [J]. INTELLIGENT, SECURE, AND DEPENDABLE SYSTEMS IN DISTRIBUTED AND CLOUD ENVIRONMENTS (ISDDC 2017), 2017, 10618 : 127 - 138
  • [5] Ajik E., 2023, J. Inf. Syst. Inform., V5, P1044, DOI [10.51519/journalisi.v5i3.548, DOI 10.51519/JOURNALISI.V5I3.548]
  • [6] Advanced Misinformation Detection: A Bi-LSTM Model Optimized by Genetic Algorithms
    Al Bataineh, Ali
    Reyes, Valeria
    Olukanni, Toluwani
    Khalaf, Majd
    Vibho, Amrutaa
    Pedyuk, Rodion
    [J]. ELECTRONICS, 2023, 12 (15)
  • [7] A comprehensive survey on machine learning approaches for fake news detection
    Alghamdi, Jawaher
    Luo, Suhuai
    Lin, Yuqing
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (17) : 51009 - 51067
  • [8] Towards COVID-19 fake news detection using transformer-based models
    Alghamdi, Jawaher
    Lin, Yuqing
    Luo, Suhuai
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 274
  • [9] A Comparative Study of Machine Learning and Deep Learning Techniques for Fake News Detection
    Alghamdi, Jawaher
    Lin, Yuqing
    Luo, Suhuai
    [J]. INFORMATION, 2022, 13 (12)
  • [10] Trends in the diffusion of misinformation on social media
    Allcott, Hunt
    Gentzkow, Matthew
    Yu, Chuan
    [J]. RESEARCH & POLITICS, 2019, 6 (02)