Probabilistic Interpolation with Mixup Data Augmentation for Text Classification

被引：1

作者：

Xu, Rongkang ^{[1
]}

Zhang, Yongcheng ^{[1
]}

Ren, Kai ^{[2
]}

Huang, Yu ^{[1
]}

Wei, Xiaomei ^{[1
]}

机构：

[1] Huazhong Agr Univ, Coll Informat, Wuhan 430070, Peoples R China

[2] South Cent Minzu Univ, Coll Comp Sci, Wuhan 430074, Peoples R China

来源：

ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IV, ICIC 2024 | 2024年 / 14878卷

关键词：

Text Interpolation; Data Augmentation; Probabilistic Interpolation;

D O I：

10.1007/978-981-97-5672-8_35

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Supervised deep learning models often confront the dilemma of insufficient training data, where the Mixup method, as a unique data augmentation technique, addresses this issue of data shortage by interpolating existing samples to generate new synthetic samples. However, most current Mixup methods adopt linear interpolation, which is limited to the generation of synthetic data within the linear range of the sample space, invariably restricting the diversity of synthetic samples. To break this limitation, we introduced an innovative non-linear interpolation technology known as PTMix in this study. PTMix applies interpolation based on random probabilities on each dimension of the feature, significantly enhancing the data augmentation process. Through this approach, we not only expanded the range of the synthetic sample space, increased the diversity of samples, but also ensured a high fidelity to the original data. Based on extensive experiments on five public text classification datasets, PTMix achieves the highest average accuracy to date of 86.64% under full resource conditions and 63.84% under low resource conditions.

引用

页码：410 / 421

页数：12

共 50 条

[21] DAugSindhi: a data augmentation approach for enhancing Sindhi language text classification [J].

Raja Vavekanand ;

Bhagwan Das ;

Teerath Kumar .

Discover Data, 3 (1)

[22] Data Augmentation for Vulnerability Detection Based on Code Refactoring and Mixup [J].

Xing, Ying ;

Huang, Jiaqi ;

Wang, Guilong ;

You, Yehao ;

Yang, Bin ;

Li, Xiaofeng ;

Luo, Yixing .

2024 IEEE 35TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS, ISSREW, 2024, :161-168

[23] MuHca: Mixup Heterogeneous Graphs for Contrastive Learning with Data Augmentation [J].

Liang, Dengzhe ;

Li, Binglin ;

Li, Hongxi ;

Jiang, Yuncheng .

PRICAI 2023: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2024, 14325 :377-388

[24] Data augmentation by morphological mixup for solving Raven's progressive matrices [J].

He, Wentao ;

Ren, Jianfeng ;

Bai, Ruibin .

VISUAL COMPUTER, 2024, 40 (04) :2457-2470

[25] Data Augmentation by Guided Deep Interpolation [J].

Szlobodnyik, Gergely ;

Farkas, Lorant .

APPLIED SOFT COMPUTING, 2021, 111

[26] Data augmentation by morphological mixup for solving Raven’s progressive matrices [J].

Wentao He ;

Jianfeng Ren ;

Ruibin Bai .

The Visual Computer, 2024, 40 :2457-2470

[27] Data augmentation strategies to improve text classification: a use case in smart cities [J].

Bencke, Luciana ;

Moreira, Viviane Pereira .

LANGUAGE RESOURCES AND EVALUATION, 2024, 58 (02) :659-694

[28] Enhancing Text Classification Models with Generative AI-aided Data Augmentation [J].

Zhao, Huanhuan ;

Chen, Haihua ;

Yoon, Hong-Jun .

2023 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING, AITEST, 2023, :138-145

[29] Data augmentation using virtual word insertion techniques in text classification tasks [J].

Long, Zhigao ;

Li, Hong ;

Shi, Jiawen ;

Ma, Xin .

EXPERT SYSTEMS, 2024, 41 (04)

[30] Iterative Translation-Based Data Augmentation Method for Text Classification Tasks [J].

Lee, Sangwon ;

Liu, Ling ;

Choi, Wonik .

IEEE ACCESS, 2021, 9 :160437-160445

← 1 2 3 4 5 →