Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks

被引:3
|
作者
Tang, Huidong [1 ]
Kamei, Sayaka [1 ]
Morimoto, Yasuhiko [1 ]
机构
[1] Hiroshima Univ, Grad Sch Adv Sci & Engn, Kagamiyama 1-7-1, Higashihiroshima 7398521, Japan
关键词
artificial intelligence; natural language processing; text classification; data augmentation; robustness improvement;
D O I
10.3390/a16010059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification is widely studied in natural language processing (NLP). Deep learning models, including large pre-trained models like BERT and DistilBERT, have achieved impressive results in text classification tasks. However, these models' robustness against adversarial attacks remains an area of concern. To address this concern, we propose three data augmentation methods to improve the robustness of such pre-trained models. We evaluated our methods on four text classification datasets by fine-tuning DistilBERT on the augmented datasets and exposing the resulting models to adversarial attacks to evaluate their robustness. In addition to enhancing the robustness, our proposed methods can improve the accuracy and F1-score on three datasets. We also conducted comparison experiments with two existing data augmentation methods. We found that one of our proposed methods demonstrates a similar improvement in terms of performance, but all demonstrate a superior robustness improvement.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] Text Data Augmentation for Deep Learning
    Connor Shorten
    Taghi M. Khoshgoftaar
    Borko Furht
    Journal of Big Data, 8
  • [32] Need Text Data Augmentation? Just One Insertion Is Enough
    Kim, Ho-Seung
    Lee, Jee-Hyong
    INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS, 2024, 24 (02) : 83 - 92
  • [33] Reducing and Exploiting Data Augmentation Noise through Meta Reweighting Contrastive Learning for Text Classification
    Mou, Guanyi
    Li, Yichuan
    Lee, Kyumin
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 876 - 887
  • [34] STTA: enhanced text classification via selective test-time augmentation
    Xiong H.
    Zhang X.
    Yang L.
    Xiang Y.
    Zhang Y.
    PeerJ Computer Science, 2023, 9
  • [35] MIXCODE: Enhancing Code Classification by Mixup-Based Data Augmentation
    Dong, Zeming
    Hu, Qiang
    Guo, Yuejun
    Cordy, Maxime
    Papadakis, Mike
    Zhang, Zhenya
    Le Traon, Yves
    Zhao, Jianjun
    2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER, 2023, : 379 - 390
  • [36] A Comparison of Classification Methods Applied to Legal Text Data
    Araujo, Diogenes Carlos
    Lima, Alexandre
    Lima, Joao Pedro
    Costa, Jose Alfredo
    PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2021), 2021, 12981 : 68 - 80
  • [37] FocusAugMix: A data augmentation method for enhancing Acute Lymphoblastic Leukemia classification
    Mustaqim, Tanzilal
    Fatichah, Chastine
    Suciati, Nanik
    Obi, Takashi
    Lee, Joong-Sun
    INTELLIGENT SYSTEMS WITH APPLICATIONS, 2025, 26
  • [38] Effect of Data Augmentation Methods on Face Image Classification Results
    Hrga, Ingrid
    Ivasic-Kos, Marina
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 660 - 667
  • [39] Rethinking data augmentation for adversarial robustness
    Eghbal-zadeh, Hamid
    Zellinger, Werner
    Pintor, Maura
    Grosse, Kathrin
    Koutini, Khaled
    Moser, Bernhard A.
    Biggio, Battista
    Widmer, Gerhard
    INFORMATION SCIENCES, 2024, 654
  • [40] CHARCNN-SVM FOR CHINESE TEXT DATASETS SENTIMENT CLASSIFICATION WITH DATA AUGMENTATION
    Wang, Xingkai
    Sheng, Yiqiang
    Deng, Haojiang
    Zhao, Zhenyu
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2019, 15 (01): : 227 - 246