Robust scientific text classification using prompt tuning based on data augmentation with L2 regularization

被引：8

作者：

Shi, Shijun ^{[1
]}

Hu, Kai ^{[1
]}

Xie, Jie ^{[2
,3
]}

Guo, Ya ^{[1
]}

Wu, Huayi ^{[4
]}

机构：

[1] Jiangnan Univ, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi 214122, Peoples R China

[2] Nanjing Normal Univ, Sch Comp & Elect Informat, Nanjing 210023, Peoples R China

[3] Nanjing Normal Univ, Sch Artificial Intelligence, Nanjing 210023, Peoples R China

[4] Wuhan Univ, State Key Lab Informat Engn Surveying Mapping & Re, Wuhan 430079, Peoples R China

来源：

INFORMATION PROCESSING & MANAGEMENT | 2024年 / 61卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Scientific text classification; Pre-training model; Prompt tuning; Data augmentation; Pairwise training; L2; regularization;

D O I：

10.1016/j.ipm.2023.103531

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, the prompt tuning technique, which incorporates prompts into the input of the pretraining language model (like BERT, GPT), has shown promise in improving the performance of language models when facing limited annotated data. However, the equivalence of template semantics in learning is not related to the effect of prompts and the prompt tuning often exhibits unstable performance, which is more severe in the domain of the scientific domain. To address this challenge, we propose to enhance prompt tuning using data augmentation with L2 regularization. Namely, pairing-wise training for the pair of the original and transformed data is performed. Our experiments on two scientific text datasets (ACL-ARC and SciCite) demonstrate that our proposed method significantly improves both accuracy and robustness. By using 1000 samples out of 1688 in the ACL-ARC training set, our method achieved an F1 score 3.33% higher than the same model trained on all 1688-sample data. In the SciCite dataset, our method surpassed the same model with labeled data reduced by over 93%. Our method is also proved to have high robustness, reaching F1 scores from 1% to 8% higher than those models without our method after the Probability Weighted Word Saliency attack.

引用

页数：19

共 50 条

[1] Assessment of data augmentation, dropout with L2 Regularization and differential privacy against membership inference attacks
Ben Hamida, Sana
Mrabet, Hichem
Chaieb, Faten
Jemai, Abderrazak
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 44455 - 44484
[2] Assessment of data augmentation, dropout with L2 Regularization and differential privacy against membership inference attacks
Sana Ben Hamida
Hichem Mrabet
Faten Chaieb
Abderrazak Jemai
Multimedia Tools and Applications, 2024, 83 : 44455 - 44484
[3] INTERMIX: AN INTERFERENCE-BASED DATA AUGMENTATION AND REGULARIZATION TECHNIQUE FOR AUTOMATIC DEEP SOUND CLASSIFICATION
Sawhney, Ramit
Neerkaje, Atula Tejaswi
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3443 - 3447
[4] Medical text classification based on the discriminative pre-training model and prompt-tuning
Wang, Yu
Wang, Yuan
Peng, Zhenwan
Zhang, Feifan
Zhou, Luyao
Yang, Fei
DIGITAL HEALTH, 2023, 9
[5] Novel Robust Augmentation Approach Based on Sensing Features for Data Classification
Alajmi, Masoud M.
Awedat, Khalfalla A.
IEEE ACCESS, 2021, 9 : 127559 - 127564
[6] GD-PTCF: Prompt-Tuning Based Classification Framework for Government Data
Mao, Ming
Zhang, Duo
Xia, Chao
Guo, Yunchuan
Zhang, Dunmin
Li, Xiaolin
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT II, ICIC 2024, 2024, 14876 : 211 - 224
[7] Data augmentation using virtual word insertion techniques in text classification tasks
Long, Zhigao
Li, Hong
Shi, Jiawen
Ma, Xin
EXPERT SYSTEMS, 2024, 41 (04)
[8] Data Augmentation Using Transformers and Similarity Measures for Improving Arabic Text Classification
Refai, Dania
Abu-Soud, Saleh
Abdel-Rahman, Mohammad J.
IEEE ACCESS, 2023, 11 : 132516 - 132531
[9] Enhancing relative humidity modelling using L2 regularization updates
Abdellah Ben Yahia
Iman Kadir
Abdelaziz Abdallaoui
Abdellah El-Hmaidi
Scientific Reports, 15 (1)
[10] Iterative Translation-Based Data Augmentation Method for Text Classification Tasks
Lee, Sangwon
Liu, Ling
Choi, Wonik
IEEE ACCESS, 2021, 9 : 160437 - 160445

← 1 2 3 4 5 →