Robust Positive-Unlabeled Learning via Noise Negative Sample Self-correction

被引:2
|
作者
Zhu, Zhangchi [1 ,2 ]
Wang, Lu [2 ]
Zhao, Pu [2 ]
Du, Chao [2 ]
Zhang, Wei [1 ]
Dong, Hang [2 ]
Qiao, Bo [2 ]
Lin, Qingwei [2 ]
Rajmohan, Saravan [3 ]
Zhang, Dongmei [2 ]
机构
[1] East China Normal Univ, Shanghai, Peoples R China
[2] Microsoft Res, Beijing, Peoples R China
[3] Microsoft 365, Seattle, WA USA
来源
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023 | 2023年
基金
中国国家自然科学基金;
关键词
positive-unlabeled learning; curriculum learning;
D O I
10.1145/3580305.3599491
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Learning from positive and unlabeled data is known as positive-unlabeled (PU) learning in literature and has attracted much attention in recent years. One common approach in PU learning is to sample a set of pseudo-negatives from the unlabeled data using ad-hoc thresholds so that conventional supervised methods can be applied with both positive and negative samples. Owing to the label uncertainty among the unlabeled data, errors of misclassifying unlabeled positive samples as negative samples inevitably appear and may even accumulate during the training processes. Those errors often lead to performance degradation and model instability. To mitigate the impact of label uncertainty and improve the robustness of learning with positive and unlabeled data, we propose a new robust PU learning method with a training strategy motivated by the nature of human learning: easy cases should be learned first. Similar intuition has been utilized in curriculum learning to only use easier cases in the early stage of training before introducing more complex cases. Specifically, we utilize a novel "hardness" measure to distinguish unlabeled samples with a high chance of being negative from unlabeled samples with large label noise. An iterative training strategy is then implemented to fine-tune the selection of negative samples during the training process in an iterative manner to include more "easy" samples in the early stage of training. Extensive experimental validations over a wide range of learning tasks show that this approach can effectively improve the accuracy and stability of learning with positive and unlabeled data. Our code is available at https://github.com/woriazzc/Robust-PU.
引用
收藏
页码:3663 / 3673
页数:11
相关论文
共 42 条
  • [21] Best-in-class imitation: Non-negative positive-unlabeled imitation learning from imperfect demonstrations
    Zhang, Lin
    Zhu, Fei
    Ling, Xinghong
    Liu, Quan
    INFORMATION SCIENCES, 2022, 601 : 71 - 89
  • [22] An optimized positive-unlabeled learning method for detecting a large scale of malware variants
    Zhang, Jixin
    Khan, Mohammad Faham
    Lin, Xiaodong
    Qin, Zheng
    2019 IEEE CONFERENCE ON DEPENDABLE AND SECURE COMPUTING (DSC), 2019, : 182 - 189
  • [23] Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning
    Li, Zhenfeng
    Hu, Lun
    Tang, Zehai
    Zhao, Cheng
    FRONTIERS IN GENETICS, 2021, 12
  • [24] Split-PU: Hardness-aware Training Strategy for Positive-Unlabeled Learning
    Xu, Chengming
    Liu, Chen
    Yang, Siqian
    Wang, Yabiao
    Zhang, Shijie
    Jia, Lijie
    Fu, Yanwei
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 2719 - 2729
  • [25] PUStackNGly: Positive-Unlabeled and Stacking Learning for N-Linked Glycosylation Site Prediction
    Alkuhlani, Alhasan
    Gad, Walaa
    Roushdy, Mohamed
    Salem, Abdel-Badeeh M.
    IEEE ACCESS, 2022, 10 : 12702 - 12713
  • [26] Perceptron ensemble of graph-based positive-unlabeled learning for disease gene identification
    Jowkar, Gholam-Hossein
    Mansoori, Eghbal G.
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2016, 64 : 263 - 270
  • [27] Leveraging Positive-Unlabeled Learning for Enhanced Black Spot Accident Identification on Greek Road Networks
    Sevetlidis, Vasileios
    Pavlidis, George
    Mouroutsos, Spyridon G.
    Gasteratos, Antonios
    COMPUTERS, 2024, 13 (02)
  • [28] GKF-PUAL: A group kernel-free approach to positive-unlabeled learning with variable selection
    Wang, Xiaoke
    Zhu, Rui
    Xue, Jing-Hao
    INFORMATION SCIENCES, 2025, 690
  • [29] Predicting disease-associated circular RNAs using deep forests combined with positive-unlabeled learning methods
    Zeng, Xiangxiang
    Zhong, Yue
    Lin, Wei
    Zou, Quan
    BRIEFINGS IN BIOINFORMATICS, 2020, 21 (04) : 1425 - 1436
  • [30] Enhancing landslide susceptibility mapping using a positive-unlabeled machine learning approach: a case study in Chamoli, India
    Zhang, Danrong
    Jindal, Dipali
    Roy, Nimisha
    Vangla, Prashanth
    Frost, J. David
    GEOENVIRONMENTAL DISASTERS, 2024, 11 (01)