Probabilistic Interpolation with Mixup Data Augmentation for Text Classification

被引:1
|
作者
Xu, Rongkang [1 ]
Zhang, Yongcheng [1 ]
Ren, Kai [2 ]
Huang, Yu [1 ]
Wei, Xiaomei [1 ]
机构
[1] Huazhong Agr Univ, Coll Informat, Wuhan 430070, Peoples R China
[2] South Cent Minzu Univ, Coll Comp Sci, Wuhan 430074, Peoples R China
来源
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IV, ICIC 2024 | 2024年 / 14878卷
关键词
Text Interpolation; Data Augmentation; Probabilistic Interpolation;
D O I
10.1007/978-981-97-5672-8_35
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Supervised deep learning models often confront the dilemma of insufficient training data, where the Mixup method, as a unique data augmentation technique, addresses this issue of data shortage by interpolating existing samples to generate new synthetic samples. However, most current Mixup methods adopt linear interpolation, which is limited to the generation of synthetic data within the linear range of the sample space, invariably restricting the diversity of synthetic samples. To break this limitation, we introduced an innovative non-linear interpolation technology known as PTMix in this study. PTMix applies interpolation based on random probabilities on each dimension of the feature, significantly enhancing the data augmentation process. Through this approach, we not only expanded the range of the synthetic sample space, increased the diversity of samples, but also ensured a high fidelity to the original data. Based on extensive experiments on five public text classification datasets, PTMix achieves the highest average accuracy to date of 86.64% under full resource conditions and 63.84% under low resource conditions.
引用
收藏
页码:410 / 421
页数:12
相关论文
共 50 条
  • [1] MIXCODE: Enhancing Code Classification by Mixup-Based Data Augmentation
    Dong, Zeming
    Hu, Qiang
    Guo, Yuejun
    Cordy, Maxime
    Papadakis, Mike
    Zhang, Zhenya
    Le Traon, Yves
    Zhao, Jianjun
    2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER, 2023, : 379 - 390
  • [2] Data Augmentation with Transformers for Text Classification
    Medardo Tapia-Tellez, Jose
    Jair Escalante, Hugo
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, MICAI 2020, PT II, 2020, 12469 : 247 - 259
  • [3] A Survey on Data Augmentation for Text Classification
    Bayer, Markus
    Kaufhold, Marc-Andre
    Reuter, Christian
    ACM COMPUTING SURVEYS, 2023, 55 (07)
  • [4] Hierarchical Data Augmentation and the Application in Text Classification
    Yu, Shujuan
    Yang, Jie
    Liu, Danlei
    Li, Runqi
    Zhang, Yun
    Zhao, Shengmei
    IEEE ACCESS, 2019, 7 : 185476 - 185485
  • [5] Semantic Segmentation with the Mixup Data Augmentation Method
    Arpaci, Saadet Aytac
    Varli, Songul
    2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
  • [6] Data Augmentation via Latent Space Interpolation for Image Classification
    Liu, Xiaofeng
    Zou, Yang
    Kong, Lingsheng
    Diao, Zhihui
    Yan, Junliang
    Wang, Jun
    Li, Site
    Jia, Ping
    You, Jane
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 728 - 733
  • [7] Attention mechanism and mixup data augmentation for classification of COVID-19 Computed Tomography images
    Ozdemir, Ozgur
    Sonmez, Elena Battini
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (08) : 6199 - 6207
  • [8] DATA AUGMENTATION VIA SUBGROUP MIXUP FOR IMPROVING FAIRNESS
    Navarro, Madeline
    Little, Camille
    Allen, Genevera, I
    Segarra, Santiago
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, : 7350 - 7354
  • [9] Hybrid Model of Data Augmentation Methods for Text Classification Task
    Feng, Jia Hui
    Mohaghegh, Mahsa
    PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KMIS), VOL 3, 2021, : 194 - 197
  • [10] Data augmentation and adversary attack on limit resources text classification
    Sánchez-Vega F.
    López-Monroy A.P.
    Balderas-Paredes A.
    Pellegrin L.
    Rosales-Pérez A.
    Multimedia Tools and Applications, 2025, 84 (3) : 1317 - 1344