Towards Robust Learning with Noisy and Pseudo Labels for Text Classification

被引:3
作者
Wen, Murtadha Ahmeda Bo [1 ]
Ao, Luo [1 ]
Pan, Shengfeng [1 ]
Su, Jianlin [1 ]
Cao, Xinxin [2 ]
Liu, Yunfeng [1 ]
机构
[1] Zhuiyi AI Lab, Shenzhen, Peoples R China
[2] Northwestern Polytech Univ, Xian, Shaanxi, Peoples R China
关键词
Natural language processing; Negative learning; Learning with noisy labels; Semi-supervised text classification;
D O I
10.1016/j.ins.2024.120160
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Unlike Positive Training (PT), Negative Training (NT) is an indirect learning technique that trains the model on a combination of clean and noisy data using complementary labels, which are randomly generated from the label space except for the actual label. Although clean samples have identical distributions to the test samples, they are treated with the same level of uncertainty as noisy samples because of the complementary labeling of NT. Consequently, their contribution to the overall performance is relatively lower. We propose a Learning with Noisy and Pseudo Label (LNPL) framework, which jointly trains the model using PT and NT on clean and noisy data, respectively. We aim to enable direct learning on clean samples while leveraging the robustness of NT against noise in a unified framework. To mitigate the abundance of noisy instances, we leverage a gradient reversal layer at the top of LNPL as a regularization term to mislead the recognition of the source of the instance (e.g., clean or noisy). Moreover, we introduce a selftraining LNPL that performs a semi -supervised text classification task as a learning with noisy pseudo -label problem. Extensive experiments on various textual benchmark datasets demonstrate that LNPL is robust and consistently outperforms the alternatives. The code is available on GitHub.1
引用
收藏
页数:14
相关论文
共 50 条
[21]   Separating hard clean samples from noisy samples with samples’ learning risk for DNN when learning with noisy labels [J].
Lihui Deng ;
Bo Yang ;
Zhongfeng Kang ;
Jiajin Wu ;
Shaosong Li ;
Yanping Xiang .
Complex & Intelligent Systems, 2024, 10 :4033-4054
[22]   A survey on learning with noisy labels in Natural Language Processing: How to train models with label noise [J].
Zhang, Han ;
Zhang, Yazhou ;
Li, Jiajun ;
Liu, Junxiu ;
Ji, Lixia .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 146
[23]   Class-conditional Importance Weighting for Deep Learning with Noisy Labels [J].
Nagarajan, Bhalaji ;
Marques, Ricardo ;
Mejia, Marcos ;
Radeva, Petia .
PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2022, :679-686
[24]   Knockoffs-SPR: Clean Sample Selection in Learning With Noisy Labels [J].
Wang, Yikai ;
Fu, Yanwei ;
Sun, Xinwei .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) :3242-3256
[25]   Classification with noisy labels through tree-based models and semi-supervised learning: A case study of lithology identification [J].
Zhu, Xinyi ;
Zhang, Hongbing ;
Zhu, Rui ;
Ren, Quan ;
Zhang, Lingyuan .
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 240
[26]   Learning with Noisy Labels by Efficient Transition Matrix Estimation to Combat Label Miscorrection [J].
Kye, Seong Min ;
Choi, Kwanghee ;
Yi, Joonyoung ;
Chang, Buru .
COMPUTER VISION, ECCV 2022, PT XXV, 2022, 13685 :717-738
[27]   CRAS: Curriculum Regularization and Adaptive Semi-Supervised Learning with Noisy Labels [J].
Higashimoto, Ryota ;
Yoshida, Soh ;
Muneyasu, Mitsuji .
APPLIED SCIENCES-BASEL, 2024, 14 (03)
[28]   Recycling: Semi-Supervised Learning With Noisy Labels in Deep Neural works [J].
Kong, Kyeongbo ;
Lee, Junggi ;
Kwak, Youngchul ;
Kang, Minsung ;
Kim, Seong Gyun ;
Song, Woo-Jin .
IEEE ACCESS, 2019, 7 :66998-67005
[29]   Training Robust Deep Neural Networks on Noisy Labels Using Adaptive Sample Selection With Disagreement [J].
Takeda, Hiroshi ;
Yoshida, Soh ;
Muneyasu, Mitsuji .
IEEE ACCESS, 2021, 9 :141131-141143
[30]   Active Learning for Turkish Text Classification [J].
Sapci, Ali Osman Berk ;
Tastan, Oznur ;
Yeniterzi, Reyyan .
2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,