Probabilistic Interpolation with Mixup Data Augmentation for Text Classification

被引:1
作者
Xu, Rongkang [1 ]
Zhang, Yongcheng [1 ]
Ren, Kai [2 ]
Huang, Yu [1 ]
Wei, Xiaomei [1 ]
机构
[1] Huazhong Agr Univ, Coll Informat, Wuhan 430070, Peoples R China
[2] South Cent Minzu Univ, Coll Comp Sci, Wuhan 430074, Peoples R China
来源
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IV, ICIC 2024 | 2024年 / 14878卷
关键词
Text Interpolation; Data Augmentation; Probabilistic Interpolation;
D O I
10.1007/978-981-97-5672-8_35
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Supervised deep learning models often confront the dilemma of insufficient training data, where the Mixup method, as a unique data augmentation technique, addresses this issue of data shortage by interpolating existing samples to generate new synthetic samples. However, most current Mixup methods adopt linear interpolation, which is limited to the generation of synthetic data within the linear range of the sample space, invariably restricting the diversity of synthetic samples. To break this limitation, we introduced an innovative non-linear interpolation technology known as PTMix in this study. PTMix applies interpolation based on random probabilities on each dimension of the feature, significantly enhancing the data augmentation process. Through this approach, we not only expanded the range of the synthetic sample space, increased the diversity of samples, but also ensured a high fidelity to the original data. Based on extensive experiments on five public text classification datasets, PTMix achieves the highest average accuracy to date of 86.64% under full resource conditions and 63.84% under low resource conditions.
引用
收藏
页码:410 / 421
页数:12
相关论文
共 50 条
[31]   Data augmentation strategies to improve text classification: a use case in smart cities [J].
Bencke, Luciana ;
Moreira, Viviane Pereira .
LANGUAGE RESOURCES AND EVALUATION, 2023,
[32]   Data Augmentation Using Transformers and Similarity Measures for Improving Arabic Text Classification [J].
Refai, Dania ;
Abu-Soud, Saleh ;
Abdel-Rahman, Mohammad J. .
IEEE ACCESS, 2023, 11 :132516-132531
[33]   CHARCNN-SVM FOR CHINESE TEXT DATASETS SENTIMENT CLASSIFICATION WITH DATA AUGMENTATION [J].
Wang, Xingkai ;
Sheng, Yiqiang ;
Deng, Haojiang ;
Zhao, Zhenyu .
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2019, 15 (01) :227-246
[34]   Improving Text Classification with Large Language Model-Based Data Augmentation [J].
Zhao, Huanhuan ;
Chen, Haihua ;
Ruggles, Thomas A. ;
Feng, Yunhe ;
Singh, Debjani ;
Yoon, Hong-Jun .
ELECTRONICS, 2024, 13 (13)
[35]   RC-Mixup: A Data Augmentation Strategy against Noisy Data for Regression Tasks [J].
Hwang, Seong-Hyeon ;
Kim, Minsu ;
Whang, Steven Euijong .
PROCEEDINGS OF THE 30TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2024, 2024, :1155-1165
[36]   Syntactically Coherent Text Augmentation for Sequence Classification [J].
Pandey, Suraj ;
Akhtar, Md. Shad ;
Chakraborty, Tanmoy .
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2021, 8 (06) :1323-1332
[37]   A New Data Augmentation Method Based on Mixup and Dempster-Shafer Theory [J].
Zhang, Zhuo ;
Wang, Hongfei ;
Geng, Jie ;
Deng, Xinyang ;
Jiang, Wen .
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 :4998-5013
[38]   Research on Data Augmentation Techniques for Text Classification Based on Antonym Replacement and Random Swapping [J].
Wang, Shaoyan ;
Xiang, Yu .
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MODELING, NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING, CMNM 2024, 2024, :103-108
[39]   TextCut: A Multi-region Replacement Data Augmentation Approach for Text Imbalance Classification [J].
Jiang, Wanrong ;
Chen, Ya ;
Ri, Hao ;
Liu, Guiquan .
NEURAL INFORMATION PROCESSING, ICONIP 2021, PT IV, 2021, 13111 :427-439
[40]   Domain-Aligned Data Augmentation for Low-Resource and Imbalanced Text Classification [J].
Stylianou, Nikolaos ;
Chatzakou, Despoina ;
Tsikrika, Theodora ;
Vrochidis, Stefanos ;
Kompatsiaris, Ioannis .
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT II, 2023, 13981 :172-187