Enhancing Misinformation Detection in Spanish Language with Deep Learning: BERT and RoBERTa Transformer Models

被引：0

作者：

Blanco-Fernandez, Yolanda ^{[1
]}

Otero-Vizoso, Javier ^{[2
]}

Gil-Solla, Alberto ^{[1
]}

Garcia-Duque, Jorge ^{[2
]}

机构：

[1] Univ Vigo, AtlanTTic Res Ctr Telecommun Technol, Vigo 36310, Spain

[2] Univ Vigo, Escuela Ingn Telecomunicac, Vigo, Spain

来源：

APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 21期

关键词：

fake news; Spanish; curated synthetic dataset; fine-tuning; Transformer-based models; BERT; RoBERTa; FAKE NEWS;

D O I：

10.3390/app14219729

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

This paper presents an approach to identifying political fake news in Spanish using Transformer architectures. Current methodologies often overlook political news due to the lack of quality datasets, especially in Spanish. To address this, we created a synthetic dataset of 57,231 Spanish political news articles, gathered via automated web scraping and enhanced with generative large language models. This dataset is used for fine-tuning and benchmarking Transformer models like BERT and RoBERTa for fake news detection. Our fine-tuned models showed outstanding performance on this dataset, with accuracy ranging from 97.4% to 98.6%. However, testing with a smaller, independent hand-curated dataset, including statements from political leaders during Spain's July 2023 electoral debates, revealed a performance drop to 71%. Although this suggests that the model needs additional refinements to handle the complexity and variability of real-world political discourse, achieving over 70% accuracy seems a promising result in the under-explored domain of Spanish political fake news detection.

引用

页数：27

共 50 条

[31] Fake News Detection Using Feature Extraction, Natural Language Processing, Curriculum Learning, and Deep Learning
Madani, Mirmorsal
Motameni, Homayun
Roshani, Reza
INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2024, 23 (03) : 1063 - 1098
[32] Transforming Ways of Enhancing Foreign Language Acquisition in the Spanish Classroom: Experiential Learning Approaches
Moreno-Lopez, Isabel
Ramos-Sellman, Aida
Miranda-Aldaco, Citlali
Quinto, Maria Teresa Gomis
FOREIGN LANGUAGE ANNALS, 2017, 50 (02) : 398 - 409
[33] FINE TUNING DEEP LEARNING MODELS FOR PEDESTRIAN DETECTION
Amisse, Caisse
Jijon-Palma, Mario Ernesto
Silva Centeno, Jorge Antonio
BOLETIM DE CIENCIAS GEODESICAS, 2021, 27 (02):
[34] Cyberbullying Text Identification: A Deep Learning and Transformer-based Language Modeling Approach
Saifullah K.
Khan M.I.
Jamal S.
Sarker I.H.
EAI Endorsed Transactions on Industrial Networks and Intelligent Systems, 2024, 11 (01) : 1 - 12
[35] Dissecting the infodemic: An in-depth analysis of COVID-19 misinformation detection on X (formerly Twitter) utilizing machine learning and deep learning techniques
Ul Hussna, Asma
Alam, Md Golam Rabiul
Islam, Risul
Alkhamees, Bader Fahad
Hassan, Mohammad Mehedi
Uddin, Md Zia
HELIYON, 2024, 10 (18)
[36] Transformer-Based Deep Learning for Sarcasm Detection with Imbalanced Dataset: Resampling Techniques with Downsampling and Augmentation
Abdullah, Malak
Khrais, Jumana
Swedat, Safa
2022 13TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2022, : 294 - 300
[37] AAEBERT: Debiasing BERT-based Hate Speech Detection Models via Adversarial Learning
Okpala, Ebuka
Cheng, Long
Mbwambo, Nicodemus
Luo, Feng
2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1606 - 1612
[38] Comparing pre-trained language models for Spanish hate speech detection
Miriam Plaza-del-Arco, Flor
Dolores Molina-Gonzalez, M.
Alfonso Urena-Lopez, L.
Teresa Martin-Valdivia, M.
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 166
[39] Deep learning with language models improves named entity recognition for PharmaCoNER
Cong Sun
Zhihao Yang
Lei Wang
Yin Zhang
Hongfei Lin
Jian Wang
BMC Bioinformatics, 22
[40] Next word prediction for Urdu language using deep learning models
Shahid, Ramish
Wali, Aamir
Bashir, Maryam
COMPUTER SPEECH AND LANGUAGE, 2024, 87

← 1 2 3 4 5 →