Robust Positive-Unlabeled Learning via Noise Negative Sample Self-correction

被引:2
|
作者
Zhu, Zhangchi [1 ,2 ]
Wang, Lu [2 ]
Zhao, Pu [2 ]
Du, Chao [2 ]
Zhang, Wei [1 ]
Dong, Hang [2 ]
Qiao, Bo [2 ]
Lin, Qingwei [2 ]
Rajmohan, Saravan [3 ]
Zhang, Dongmei [2 ]
机构
[1] East China Normal Univ, Shanghai, Peoples R China
[2] Microsoft Res, Beijing, Peoples R China
[3] Microsoft 365, Seattle, WA USA
来源
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023 | 2023年
基金
中国国家自然科学基金;
关键词
positive-unlabeled learning; curriculum learning;
D O I
10.1145/3580305.3599491
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Learning from positive and unlabeled data is known as positive-unlabeled (PU) learning in literature and has attracted much attention in recent years. One common approach in PU learning is to sample a set of pseudo-negatives from the unlabeled data using ad-hoc thresholds so that conventional supervised methods can be applied with both positive and negative samples. Owing to the label uncertainty among the unlabeled data, errors of misclassifying unlabeled positive samples as negative samples inevitably appear and may even accumulate during the training processes. Those errors often lead to performance degradation and model instability. To mitigate the impact of label uncertainty and improve the robustness of learning with positive and unlabeled data, we propose a new robust PU learning method with a training strategy motivated by the nature of human learning: easy cases should be learned first. Similar intuition has been utilized in curriculum learning to only use easier cases in the early stage of training before introducing more complex cases. Specifically, we utilize a novel "hardness" measure to distinguish unlabeled samples with a high chance of being negative from unlabeled samples with large label noise. An iterative training strategy is then implemented to fine-tune the selection of negative samples during the training process in an iterative manner to include more "easy" samples in the early stage of training. Extensive experimental validations over a wide range of learning tasks show that this approach can effectively improve the accuracy and stability of learning with positive and unlabeled data. Our code is available at https://github.com/woriazzc/Robust-PU.
引用
收藏
页码:3663 / 3673
页数:11
相关论文
共 42 条
  • [31] Leveraging permutation testing to assess confidence in positive-unlabeled learning applied to high-dimensional biological datasets
    Xu, Shiwei
    Ackerman, Margaret E.
    BMC BIOINFORMATICS, 2024, 25 (01):
  • [32] Predicting potential microbe-disease associations with graph attention autoencoder, positive-unlabeled learning, and deep neural network
    Peng, Lihong
    Huang, Liangliang
    Tian, Geng
    Wu, Yan
    Li, Guang
    Cao, Jianying
    Wang, Peng
    Li, Zejun
    Duan, Lian
    FRONTIERS IN MICROBIOLOGY, 2023, 14
  • [33] DDI-PULearn: a positive-unlabeled learning method for large-scale prediction of drug-drug interactions
    Yi Zheng
    Hui Peng
    Xiaocai Zhang
    Zhixun Zhao
    Xiaoying Gao
    Jinyan Li
    BMC Bioinformatics, 20
  • [34] DDI-PULearn: a positive-unlabeled learning method for large-scale prediction of drug-drug interactions
    Zheng, Yi
    Peng, Hui
    Zhang, Xiaocai
    Zhao, Zhixun
    Gao, Xiaoying
    Li, Jinyan
    BMC BIOINFORMATICS, 2019, 20 (01)
  • [35] Adaptive multi-task positive-unlabeled learning for joint prediction of multiple chronic diseases using online shopping behaviors
    Wang, Yongzhen
    Lin, Jun
    Bi, Sheng
    Sun, Changlong
    Si, Luo
    Liu, Xiaozhong
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 191
  • [36] 3D-BoxSup: Positive-Unlabeled Learning of Brain Tumor Segmentation Networks From 3D Bounding Boxes
    Xu, Yanwu
    Gong, Mingming
    Chen, Junxiang
    Chen, Ziye
    Batmanghelich, Kayhan
    FRONTIERS IN NEUROSCIENCE, 2020, 14
  • [37] iFLAS: positive-unlabeled learning facilitates full-length transcriptome-based identification and functional exploration of alternatively spliced isoforms in maize
    Xu, Feng
    Liu, Songyu
    Zhao, Anwen
    Shang, Meiqi
    Wang, Qian
    Jiang, Shuqin
    Cheng, Qian
    Chen, Xingming
    Zhai, Xiaoguang
    Zhang, Jianan
    Wang, Xiangfeng
    Yan, Jun
    NEW PHYTOLOGIST, 2024, 241 (06) : 2606 - 2620
  • [38] Adversarial Positive-Unlabeled Learning-Based Invasive Plant Detection in Alpine Wetland Using Jilin-1 and Sentinel-2 Imageries
    Zhu, Enzhao
    Samat, Alim
    Li, Erzhu
    Xu, Ren
    Li, Wei
    Li, Wenbo
    REMOTE SENSING, 2025, 17 (06)
  • [39] Predicting drug-drug interactions using multi-modal deep auto-encoders based network embedding and positive-unlabeled learning
    Zhang, Yang
    Qiu, Yang
    Cui, Yuxin
    Liu, Shichao
    Zhang, Wen
    METHODS, 2020, 179 : 37 - 46
  • [40] An Integrated Framework for Data-Driven Mineral Prospectivity Mapping Using Bagging-Based Positive-Unlabeled Learning and Bayesian Cost-Sensitive Logistic Regression
    Zhiqiang Zhang
    Gongwen Wang
    Emmanuel John M. Carranza
    Junjie Fan
    Xinxing Liu
    Xiang Zhang
    Yulong Dong
    XiaoPeng Chang
    Deming Sha
    Natural Resources Research, 2022, 31 : 3041 - 3060