Validation and Extraction of Reliable Information Through Automated Scraping and Natural Language Inference

被引：0

作者：

Shah, Arjun ^{[1
]}

Shah, Hetansh ^{[1
]}

Bafna, Vedica ^{[1
]}

Khandor, Charmi ^{[1
]}

Nair, Sindhu ^{[1
]}

机构：

[1] DJ Sanghvi Coll Engn, Dept Comp Engn, Mumbai, India

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2025年 / 147卷

关键词：

Fake news detection; Machine Learning; Natural Language Processing; Web-Scraping; Natural Language Inference; Language models;

D O I：

10.1016/j.engappai.2025.110284

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In today's day and age where information is spread rapidly through online platforms, the rise of fake news poses an alarming threat to the integrity of public discourse, societal trust, and reputed news sources. Classical machine learning and transformer-based models have been studied extensively for claim verification, however, they are hampered by their reliance on static training data and cannot generalize on unseen headlines. To address these challenges, we propose our explainable solution, which leverages web-information retrieval techniques and Natural Language Inference models to verify the veracity of a news headline. We evaluate our solution on a diverse self-curated evaluation dataset spanning multiple news channels and domains. Our best-performing pipeline achieves an accuracy of 84.3% surpassing the best classical Machine Learning model by 33.3% and Bidirectional Encoder Representations from Transformers by 31.0%. Utilizing hardware accelerators our pipelines achieve end-to-end fact verification inference times ranging from 2.92-6.97 seconds. Our approach highlights the efficacy of combining dynamic information retrieval with Natural Language Inference to find support fora claimed headline in the corresponding externally retrieved knowledge. The respective code, dataset, and results of this study are available in our artifact: https://github.com/Arjun254/VERITAS-NLI.

引用

页数：15

共 64 条

[1] Abdin M, 2024, Arxiv, DOI [arXiv:2404.14219, 10.48550/arXiv.2404.14219, DOI 10.48550/ARXIV.2404.14219]
[2] Aggarwal A., 2020, Classification of fake news by fine-tuning deep bidirectional transformers based language model, DOI [10.4108/eai.13-7-2018.163973, DOI 10.4108/EAI.13-7-2018.163973]
[3] Fake News Detection Using Machine Learning Ensemble Methods
Ahmad, Iftikhar
Yousaf, Muhammad
Yousaf, Suhail
Ahmad, Muhammad Ovais
[J]. COMPLEXITY, 2020, 2020
[4] Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques
Ahmed, Hadeer
Traore, Issa
Saad, Sherif
[J]. INTELLIGENT, SECURE, AND DEPENDABLE SYSTEMS IN DISTRIBUTED AND CLOUD ENVIRONMENTS (ISDDC 2017), 2017, 10618 : 127 - 138
[5] Ajik E., 2023, J. Inf. Syst. Inform., V5, P1044, DOI [10.51519/journalisi.v5i3.548, DOI 10.51519/JOURNALISI.V5I3.548]
[6] Advanced Misinformation Detection: A Bi-LSTM Model Optimized by Genetic Algorithms
Al Bataineh, Ali
Reyes, Valeria
Olukanni, Toluwani
Khalaf, Majd
Vibho, Amrutaa
Pedyuk, Rodion
[J]. ELECTRONICS, 2023, 12 (15)
[7] A comprehensive survey on machine learning approaches for fake news detection
Alghamdi, Jawaher
Luo, Suhuai
Lin, Yuqing
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (17) : 51009 - 51067
[8] Towards COVID-19 fake news detection using transformer-based models
Alghamdi, Jawaher
Lin, Yuqing
Luo, Suhuai
[J]. KNOWLEDGE-BASED SYSTEMS, 2023, 274
[9] A Comparative Study of Machine Learning and Deep Learning Techniques for Fake News Detection
Alghamdi, Jawaher
Lin, Yuqing
Luo, Suhuai
[J]. INFORMATION, 2022, 13 (12)
[10] Trends in the diffusion of misinformation on social media
Allcott, Hunt
Gentzkow, Matthew
Yu, Chuan
[J]. RESEARCH & POLITICS, 2019, 6 (02)

← 1 2 3 4 5 6 7 →