Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks

被引:3
|
作者
Tang, Huidong [1 ]
Kamei, Sayaka [1 ]
Morimoto, Yasuhiko [1 ]
机构
[1] Hiroshima Univ, Grad Sch Adv Sci & Engn, Kagamiyama 1-7-1, Higashihiroshima 7398521, Japan
关键词
artificial intelligence; natural language processing; text classification; data augmentation; robustness improvement;
D O I
10.3390/a16010059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification is widely studied in natural language processing (NLP). Deep learning models, including large pre-trained models like BERT and DistilBERT, have achieved impressive results in text classification tasks. However, these models' robustness against adversarial attacks remains an area of concern. To address this concern, we propose three data augmentation methods to improve the robustness of such pre-trained models. We evaluated our methods on four text classification datasets by fine-tuning DistilBERT on the augmented datasets and exposing the resulting models to adversarial attacks to evaluate their robustness. In addition to enhancing the robustness, our proposed methods can improve the accuracy and F1-score on three datasets. We also conducted comparison experiments with two existing data augmentation methods. We found that one of our proposed methods demonstrates a similar improvement in terms of performance, but all demonstrate a superior robustness improvement.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] Text Data Augmentation Techniques for Word Embeddings in Fake News Classification
    Kapusta, Jozef
    Drzik, David
    Steflovic, Kirsten
    Nagy, Kitti Szabo
    IEEE ACCESS, 2024, 12 : 31538 - 31550
  • [22] Probabilistic Interpolation with Mixup Data Augmentation for Text Classification
    Xu, Rongkang
    Zhang, Yongcheng
    Ren, Kai
    Huang, Yu
    Wei, Xiaomei
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IV, ICIC 2024, 2024, 14878 : 410 - 421
  • [23] Data Scarcity: Methods to Improve the Quality of Text Classification
    Glaser, Ingo
    Sadegharmaki, Shabnam
    Komboz, Basil
    Matthes, Florian
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 556 - 564
  • [24] Enhancing robustness of AI offensive code generators via data augmentation
    Improta, Cristina
    Liguori, Pietro
    Natella, Roberto
    Cukic, Bojan
    Cotroneo, Domenico
    EMPIRICAL SOFTWARE ENGINEERING, 2025, 30 (01)
  • [25] Data augmentation strategies to improve text classification: a use case in smart cities
    Bencke, Luciana
    Moreira, Viviane Pereira
    LANGUAGE RESOURCES AND EVALUATION, 2024, 58 (02) : 659 - 694
  • [26] Data augmentation strategies to improve text classification: a use case in smart cities
    Bencke, Luciana
    Moreira, Viviane Pereira
    LANGUAGE RESOURCES AND EVALUATION, 2023,
  • [27] GAN Data Augmentation Methods in Rock Classification
    Zhao, Gaochang
    Cai, Zhao
    Wang, Xin
    Dang, Xiaohu
    APPLIED SCIENCES-BASEL, 2023, 13 (09):
  • [28] Data Augmentation and Deep Learning Methods in Sound Classification: A Systematic Review
    Abayomi-Alli, Olusola O.
    Damasevicius, Robertas
    Qazi, Atika
    Adedoyin-Olowe, Mariam
    Misra, Sanjay
    ELECTRONICS, 2022, 11 (22)
  • [29] CONDITIONAL LABEL SMOOTHING FOR LLM-BASED DATA AUGMENTATION IN MEDICAL TEXT CLASSIFICATION
    Becker, Luca
    Pracht, Philip
    Sertdal, Peter
    Uboreck, Jil
    Bendel, Alexander
    Martin, Rainer
    2024 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2024, : 833 - 840
  • [30] Text Data Augmentation for Deep Learning
    Shorten, Connor
    Khoshgoftaar, Taghi M.
    Furht, Borko
    JOURNAL OF BIG DATA, 2021, 8 (01)