Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks

被引：3

作者：

Tang, Huidong ^{[1
]}

Kamei, Sayaka ^{[1
]}

Morimoto, Yasuhiko ^{[1
]}

机构：

[1] Hiroshima Univ, Grad Sch Adv Sci & Engn, Kagamiyama 1-7-1, Higashihiroshima 7398521, Japan

来源：

ALGORITHMS | 2023年 / 16卷 / 01期

关键词：

artificial intelligence; natural language processing; text classification; data augmentation; robustness improvement;

D O I：

10.3390/a16010059

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text classification is widely studied in natural language processing (NLP). Deep learning models, including large pre-trained models like BERT and DistilBERT, have achieved impressive results in text classification tasks. However, these models' robustness against adversarial attacks remains an area of concern. To address this concern, we propose three data augmentation methods to improve the robustness of such pre-trained models. We evaluated our methods on four text classification datasets by fine-tuning DistilBERT on the augmented datasets and exposing the resulting models to adversarial attacks to evaluate their robustness. In addition to enhancing the robustness, our proposed methods can improve the accuracy and F1-score on three datasets. We also conducted comparison experiments with two existing data augmentation methods. We found that one of our proposed methods demonstrates a similar improvement in terms of performance, but all demonstrate a superior robustness improvement.

引用

页数：21

共 50 条

[31] Text Data Augmentation for Deep Learning
Connor Shorten
Taghi M. Khoshgoftaar
Borko Furht
Journal of Big Data, 8
[32] Need Text Data Augmentation? Just One Insertion Is Enough
Kim, Ho-Seung
Lee, Jee-Hyong
INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS, 2024, 24 (02) : 83 - 92
[33] Reducing and Exploiting Data Augmentation Noise through Meta Reweighting Contrastive Learning for Text Classification
Mou, Guanyi
Li, Yichuan
Lee, Kyumin
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 876 - 887
[34] STTA: enhanced text classification via selective test-time augmentation
Xiong H.
Zhang X.
Yang L.
Xiang Y.
Zhang Y.
PeerJ Computer Science, 2023, 9
[35] MIXCODE: Enhancing Code Classification by Mixup-Based Data Augmentation
Dong, Zeming
Hu, Qiang
Guo, Yuejun
Cordy, Maxime
Papadakis, Mike
Zhang, Zhenya
Le Traon, Yves
Zhao, Jianjun
2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER, 2023, : 379 - 390
[36] A Comparison of Classification Methods Applied to Legal Text Data
Araujo, Diogenes Carlos
Lima, Alexandre
Lima, Joao Pedro
Costa, Jose Alfredo
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2021), 2021, 12981 : 68 - 80
[37] FocusAugMix: A data augmentation method for enhancing Acute Lymphoblastic Leukemia classification
Mustaqim, Tanzilal
Fatichah, Chastine
Suciati, Nanik
Obi, Takashi
Lee, Joong-Sun
INTELLIGENT SYSTEMS WITH APPLICATIONS, 2025, 26
[38] Effect of Data Augmentation Methods on Face Image Classification Results
Hrga, Ingrid
Ivasic-Kos, Marina
PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 660 - 667
[39] Rethinking data augmentation for adversarial robustness
Eghbal-zadeh, Hamid
Zellinger, Werner
Pintor, Maura
Grosse, Kathrin
Koutini, Khaled
Moser, Bernhard A.
Biggio, Battista
Widmer, Gerhard
INFORMATION SCIENCES, 2024, 654
[40] CHARCNN-SVM FOR CHINESE TEXT DATASETS SENTIMENT CLASSIFICATION WITH DATA AUGMENTATION
Wang, Xingkai
Sheng, Yiqiang
Deng, Haojiang
Zhao, Zhenyu
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2019, 15 (01): : 227 - 246

← 1 2 3 4 5 →