Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks

被引：3

作者：

Tang, Huidong ^{[1
]}

Kamei, Sayaka ^{[1
]}

Morimoto, Yasuhiko ^{[1
]}

机构：

[1] Hiroshima Univ, Grad Sch Adv Sci & Engn, Kagamiyama 1-7-1, Higashihiroshima 7398521, Japan

来源：

ALGORITHMS | 2023年 / 16卷 / 01期

关键词：

artificial intelligence; natural language processing; text classification; data augmentation; robustness improvement;

D O I：

10.3390/a16010059

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text classification is widely studied in natural language processing (NLP). Deep learning models, including large pre-trained models like BERT and DistilBERT, have achieved impressive results in text classification tasks. However, these models' robustness against adversarial attacks remains an area of concern. To address this concern, we propose three data augmentation methods to improve the robustness of such pre-trained models. We evaluated our methods on four text classification datasets by fine-tuning DistilBERT on the augmented datasets and exposing the resulting models to adversarial attacks to evaluate their robustness. In addition to enhancing the robustness, our proposed methods can improve the accuracy and F1-score on three datasets. We also conducted comparison experiments with two existing data augmentation methods. We found that one of our proposed methods demonstrates a similar improvement in terms of performance, but all demonstrate a superior robustness improvement.

引用

页数：21

共 50 条

[41] RobustMixGen: Data augmentation for enhancing robustness of visual-language models in the presence of distribution shift
Kim, Sunwoo
Im, Hun
Lee, Woojun
Lee, Seonggye
Kang, Pilsung
NEUROCOMPUTING, 2025, 619
[42] Comparing automated text classification methods
Hartmann, Jochen
Huppertz, Juliana
Schamp, Christina
Heitmann, Mark
INTERNATIONAL JOURNAL OF RESEARCH IN MARKETING, 2019, 36 (01) : 20 - 38
[43] Saliency-Based Token Swap - A Language-Agnostic Data Augmentation Method for Text Classification
Ilangeshwaran, Hiroshan
Abeywardhana, Lakmini
Rathnayake, Samadhi
2024 9TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY RESEARCH, ICITR, 2024,
[44] Data Mining in Clinical Trial Text: Transformers for Classification and Question Answering Tasks
Schmidt, Lena
Weeds, Julie
Higgins, Julian P. T.
PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 5: HEALTHINF, 2020, : 83 - 94
[45] Contrastive Graph Convolutional Networks with adaptive augmentation for text classification
Yang, Yintao
Miao, Rui
Wang, Yili
Wang, Xin
INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (04)
[46] Enhancing deep learning image classification using data augmentation and genetic algorithm-based optimization
Boudouh, Nouara
Mokhtari, Bilal
Foufou, Sebti
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2024, 13 (03)
[47] Text Classification with Transformers and Reformers for Deep Text Data
Soleymani, Roghayeh
Farret, Jeremie
PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN SIGNAL PROCESSING AND ARTIFICIAL INTELLIGENCE, ASPAI' 2020, 2020, : 239 - 243
[48] Augmentation of Small Training Data Using GANs for Enhancing the Performance of Image Classification
Hung, Shih-Kai
Gan, John Q.
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3350 - 3356
[49] Counterfactual Fairness in Text Classification through Robustness
Garg, Sahaj
Perot, Vincent
Limtiaco, Nicole
Taly, Ankur
Chi, Ed H.
Beutel, Alex
AIES '19: PROCEEDINGS OF THE 2019 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, 2019, : 219 - 226
[50] MDA: Multimodal Data Augmentation Framework for Boosting Performance on Sentiment/Emotion Classification Tasks
Xu, Nan
Mao, Wenji
Wei, Penghui
Zeng, Daniel
IEEE INTELLIGENT SYSTEMS, 2021, 36 (06) : 3 - 11

← 1 2 3 4 5 →