Enhancing Misinformation Detection in Spanish Language with Deep Learning: BERT and RoBERTa Transformer Models

被引:0
|
作者
Blanco-Fernandez, Yolanda [1 ]
Otero-Vizoso, Javier [2 ]
Gil-Solla, Alberto [1 ]
Garcia-Duque, Jorge [2 ]
机构
[1] Univ Vigo, AtlanTTic Res Ctr Telecommun Technol, Vigo 36310, Spain
[2] Univ Vigo, Escuela Ingn Telecomunicac, Vigo, Spain
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 21期
关键词
fake news; Spanish; curated synthetic dataset; fine-tuning; Transformer-based models; BERT; RoBERTa; FAKE NEWS;
D O I
10.3390/app14219729
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
This paper presents an approach to identifying political fake news in Spanish using Transformer architectures. Current methodologies often overlook political news due to the lack of quality datasets, especially in Spanish. To address this, we created a synthetic dataset of 57,231 Spanish political news articles, gathered via automated web scraping and enhanced with generative large language models. This dataset is used for fine-tuning and benchmarking Transformer models like BERT and RoBERTa for fake news detection. Our fine-tuned models showed outstanding performance on this dataset, with accuracy ranging from 97.4% to 98.6%. However, testing with a smaller, independent hand-curated dataset, including statements from political leaders during Spain's July 2023 electoral debates, revealed a performance drop to 71%. Although this suggests that the model needs additional refinements to handle the complexity and variability of real-world political discourse, achieving over 70% accuracy seems a promising result in the under-explored domain of Spanish political fake news detection.
引用
收藏
页数:27
相关论文
共 50 条
  • [31] Fake News Detection Using Feature Extraction, Natural Language Processing, Curriculum Learning, and Deep Learning
    Madani, Mirmorsal
    Motameni, Homayun
    Roshani, Reza
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2024, 23 (03) : 1063 - 1098
  • [32] Transforming Ways of Enhancing Foreign Language Acquisition in the Spanish Classroom: Experiential Learning Approaches
    Moreno-Lopez, Isabel
    Ramos-Sellman, Aida
    Miranda-Aldaco, Citlali
    Quinto, Maria Teresa Gomis
    FOREIGN LANGUAGE ANNALS, 2017, 50 (02) : 398 - 409
  • [33] FINE TUNING DEEP LEARNING MODELS FOR PEDESTRIAN DETECTION
    Amisse, Caisse
    Jijon-Palma, Mario Ernesto
    Silva Centeno, Jorge Antonio
    BOLETIM DE CIENCIAS GEODESICAS, 2021, 27 (02):
  • [34] Cyberbullying Text Identification: A Deep Learning and Transformer-based Language Modeling Approach
    Saifullah K.
    Khan M.I.
    Jamal S.
    Sarker I.H.
    EAI Endorsed Transactions on Industrial Networks and Intelligent Systems, 2024, 11 (01) : 1 - 12
  • [35] Dissecting the infodemic: An in-depth analysis of COVID-19 misinformation detection on X (formerly Twitter) utilizing machine learning and deep learning techniques
    Ul Hussna, Asma
    Alam, Md Golam Rabiul
    Islam, Risul
    Alkhamees, Bader Fahad
    Hassan, Mohammad Mehedi
    Uddin, Md Zia
    HELIYON, 2024, 10 (18)
  • [36] Transformer-Based Deep Learning for Sarcasm Detection with Imbalanced Dataset: Resampling Techniques with Downsampling and Augmentation
    Abdullah, Malak
    Khrais, Jumana
    Swedat, Safa
    2022 13TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2022, : 294 - 300
  • [37] AAEBERT: Debiasing BERT-based Hate Speech Detection Models via Adversarial Learning
    Okpala, Ebuka
    Cheng, Long
    Mbwambo, Nicodemus
    Luo, Feng
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1606 - 1612
  • [38] Comparing pre-trained language models for Spanish hate speech detection
    Miriam Plaza-del-Arco, Flor
    Dolores Molina-Gonzalez, M.
    Alfonso Urena-Lopez, L.
    Teresa Martin-Valdivia, M.
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 166
  • [39] Deep learning with language models improves named entity recognition for PharmaCoNER
    Cong Sun
    Zhihao Yang
    Lei Wang
    Yin Zhang
    Hongfei Lin
    Jian Wang
    BMC Bioinformatics, 22
  • [40] Next word prediction for Urdu language using deep learning models
    Shahid, Ramish
    Wali, Aamir
    Bashir, Maryam
    COMPUTER SPEECH AND LANGUAGE, 2024, 87