Probabilistic Interpolation with Mixup Data Augmentation for Text Classification

被引:1
作者
Xu, Rongkang [1 ]
Zhang, Yongcheng [1 ]
Ren, Kai [2 ]
Huang, Yu [1 ]
Wei, Xiaomei [1 ]
机构
[1] Huazhong Agr Univ, Coll Informat, Wuhan 430070, Peoples R China
[2] South Cent Minzu Univ, Coll Comp Sci, Wuhan 430074, Peoples R China
来源
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IV, ICIC 2024 | 2024年 / 14878卷
关键词
Text Interpolation; Data Augmentation; Probabilistic Interpolation;
D O I
10.1007/978-981-97-5672-8_35
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Supervised deep learning models often confront the dilemma of insufficient training data, where the Mixup method, as a unique data augmentation technique, addresses this issue of data shortage by interpolating existing samples to generate new synthetic samples. However, most current Mixup methods adopt linear interpolation, which is limited to the generation of synthetic data within the linear range of the sample space, invariably restricting the diversity of synthetic samples. To break this limitation, we introduced an innovative non-linear interpolation technology known as PTMix in this study. PTMix applies interpolation based on random probabilities on each dimension of the feature, significantly enhancing the data augmentation process. Through this approach, we not only expanded the range of the synthetic sample space, increased the diversity of samples, but also ensured a high fidelity to the original data. Based on extensive experiments on five public text classification datasets, PTMix achieves the highest average accuracy to date of 86.64% under full resource conditions and 63.84% under low resource conditions.
引用
收藏
页码:410 / 421
页数:12
相关论文
共 50 条
[41]   ADAM: An Attentional Data Augmentation Method for Extreme Multi-label Text Classification [J].
Zhang, Jiaxin ;
Liu, Jie ;
Chen, Shaowei ;
Lin, Shaoxin ;
Wang, Bingquan ;
Wang, Shanpeng .
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT I, 2022, 13280 :131-142
[42]   CONDITIONAL LABEL SMOOTHING FOR LLM-BASED DATA AUGMENTATION IN MEDICAL TEXT CLASSIFICATION [J].
Becker, Luca ;
Pracht, Philip ;
Sertdal, Peter ;
Uboreck, Jil ;
Bendel, Alexander ;
Martin, Rainer .
2024 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2024, :833-840
[43]   Data augmentation for Arabic text classification: a review of current methods, challenges and prospective directions [J].
Abdhood, Samia F. ;
Omar, Nazlia ;
Tiun, Sabrina .
PEERJ COMPUTER SCIENCE, 2025, 11
[44]   PatchMix: patch-level mixup for data augmentation in convolutional neural networks [J].
Hong, Yichao ;
Chen, Yuanyuan .
KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (07) :3855-3881
[45]   Text Data Augmentation for Deep Learning [J].
Shorten, Connor ;
Khoshgoftaar, Taghi M. ;
Furht, Borko .
JOURNAL OF BIG DATA, 2021, 8 (01)
[46]   Text Data Augmentation for Deep Learning [J].
Connor Shorten ;
Taghi M. Khoshgoftaar ;
Borko Furht .
Journal of Big Data, 8
[47]   Text Data Augmentation for the Korean Language [J].
Dang Thanh Vu ;
Yu, Gwanghyun ;
Lee, Chilwoo ;
Kim, Jinyoung .
APPLIED SCIENCES-BASEL, 2022, 12 (07)
[48]   TABAS: Text augmentation based on attention score for text classification model [J].
Yu, Yeong Jae ;
Yoon, Seung Joo ;
Jun, So Young ;
Kim, Jong Woo .
ICT EXPRESS, 2022, 8 (04) :549-554
[49]   Data Augmentation for Graph Classification [J].
Zhou, Jiajun ;
Shen, Jie ;
Xuan, Qi .
CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, :2341-2344
[50]   Classification with Dynamic Data Augmentation [J].
Xu, Dejiang ;
Lee, Mong Li ;
Hsu, Wynne .
2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, :1434-1441