Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks

被引:3
|
作者
Tang, Huidong [1 ]
Kamei, Sayaka [1 ]
Morimoto, Yasuhiko [1 ]
机构
[1] Hiroshima Univ, Grad Sch Adv Sci & Engn, Kagamiyama 1-7-1, Higashihiroshima 7398521, Japan
关键词
artificial intelligence; natural language processing; text classification; data augmentation; robustness improvement;
D O I
10.3390/a16010059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification is widely studied in natural language processing (NLP). Deep learning models, including large pre-trained models like BERT and DistilBERT, have achieved impressive results in text classification tasks. However, these models' robustness against adversarial attacks remains an area of concern. To address this concern, we propose three data augmentation methods to improve the robustness of such pre-trained models. We evaluated our methods on four text classification datasets by fine-tuning DistilBERT on the augmented datasets and exposing the resulting models to adversarial attacks to evaluate their robustness. In addition to enhancing the robustness, our proposed methods can improve the accuracy and F1-score on three datasets. We also conducted comparison experiments with two existing data augmentation methods. We found that one of our proposed methods demonstrates a similar improvement in terms of performance, but all demonstrate a superior robustness improvement.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] RobustMixGen: Data augmentation for enhancing robustness of visual-language models in the presence of distribution shift
    Kim, Sunwoo
    Im, Hun
    Lee, Woojun
    Lee, Seonggye
    Kang, Pilsung
    NEUROCOMPUTING, 2025, 619
  • [42] Comparing automated text classification methods
    Hartmann, Jochen
    Huppertz, Juliana
    Schamp, Christina
    Heitmann, Mark
    INTERNATIONAL JOURNAL OF RESEARCH IN MARKETING, 2019, 36 (01) : 20 - 38
  • [43] Saliency-Based Token Swap - A Language-Agnostic Data Augmentation Method for Text Classification
    Ilangeshwaran, Hiroshan
    Abeywardhana, Lakmini
    Rathnayake, Samadhi
    2024 9TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY RESEARCH, ICITR, 2024,
  • [44] Data Mining in Clinical Trial Text: Transformers for Classification and Question Answering Tasks
    Schmidt, Lena
    Weeds, Julie
    Higgins, Julian P. T.
    PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 5: HEALTHINF, 2020, : 83 - 94
  • [45] Contrastive Graph Convolutional Networks with adaptive augmentation for text classification
    Yang, Yintao
    Miao, Rui
    Wang, Yili
    Wang, Xin
    INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (04)
  • [46] Enhancing deep learning image classification using data augmentation and genetic algorithm-based optimization
    Boudouh, Nouara
    Mokhtari, Bilal
    Foufou, Sebti
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2024, 13 (03)
  • [47] Text Classification with Transformers and Reformers for Deep Text Data
    Soleymani, Roghayeh
    Farret, Jeremie
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN SIGNAL PROCESSING AND ARTIFICIAL INTELLIGENCE, ASPAI' 2020, 2020, : 239 - 243
  • [48] Augmentation of Small Training Data Using GANs for Enhancing the Performance of Image Classification
    Hung, Shih-Kai
    Gan, John Q.
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3350 - 3356
  • [49] Counterfactual Fairness in Text Classification through Robustness
    Garg, Sahaj
    Perot, Vincent
    Limtiaco, Nicole
    Taly, Ankur
    Chi, Ed H.
    Beutel, Alex
    AIES '19: PROCEEDINGS OF THE 2019 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, 2019, : 219 - 226
  • [50] MDA: Multimodal Data Augmentation Framework for Boosting Performance on Sentiment/Emotion Classification Tasks
    Xu, Nan
    Mao, Wenji
    Wei, Penghui
    Zeng, Daniel
    IEEE INTELLIGENT SYSTEMS, 2021, 36 (06) : 3 - 11