Analysis of Learning from Positive and Unlabeled Data

被引:0
|
作者
du Plessis, Marthinus C. [1 ]
Niu, Gang [2 ]
Sugiyama, Masashi [1 ]
机构
[1] Univ Tokyo, Tokyo 1130033, Japan
[2] Baidu Inc, Beijing 100085, Peoples R China
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014) | 2014年 / 27卷
关键词
ALGORITHM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning a classifier from positive and unlabeled data is an important class of classification problems that are conceivable in many practical applications. In this paper, we first show that this problem can be solved by cost-sensitive learning between positive and unlabeled data. We then show that convex surrogate loss functions such as the hinge loss may lead to a wrong classification boundary due to an intrinsic bias, but the problem can be avoided by using non-convex loss functions such as the ramp loss. We next analyze the excess risk when the class prior is estimated from data, and show that the classification accuracy is not sensitive to class prior estimation if the unlabeled data is dominated by the positive data (this is naturally satisfied in inlier-based outlier detection because inliers are dominant in the unlabeled dataset). Finally, we provide generalization error bounds and show that, for an equal number of labeled and unlabeled samples, the generalization error of learning only from positive and unlabeled samples is no worse than 2 root 2 times the fully supervised case. These theoretical findings are also validated through experiments.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Learning from positive and unlabeled data: a survey
    Jessa Bekker
    Jesse Davis
    Machine Learning, 2020, 109 : 719 - 760
  • [2] Learning from positive and unlabeled data: a survey
    Bekker, Jessa
    Davis, Jesse
    MACHINE LEARNING, 2020, 109 (04) : 719 - 760
  • [3] Learning from Positive and Unlabeled Data with Arbitrary Positive Shift
    Hammoudeh, Zayd
    Lowd, Daniel
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [4] Learning from data streams with only positive and unlabeled data
    Qin, Xiangju
    Zhang, Yang
    Li, Chen
    Li, Xue
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2013, 40 (03) : 405 - 430
  • [5] Learning from data streams with only positive and unlabeled data
    Xiangju Qin
    Yang Zhang
    Chen Li
    Xue Li
    Journal of Intelligent Information Systems, 2013, 40 : 405 - 430
  • [6] Convex Formulation for Learning from Positive and Unlabeled Data
    du Plessis, Marthinus Christoffel
    Niu, Gang
    Sugiyama, Masashi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1386 - 1394
  • [7] Positive-Unlabeled Learning from Imbalanced Data
    Su, Guangxin
    Chen, Weitong
    Xu, Miao
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2995 - 3001
  • [8] Predictive Adversarial Learning from Positive and Unlabeled Data
    Hu, Wenpeng
    Le, Ran
    Liu, Bing
    Ji, Feng
    Ma, Jinwen
    Zhao, Dongyan
    Yan, Rui
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7806 - 7814
  • [9] Federated Learning with Positive and Unlabeled Data
    Lin, Xinyang
    Chen, Hanting
    Xu, Yixing
    Xu, Chao
    Gui, Xiaolin
    Deng, Yiping
    Wang, Yunhe
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [10] Positive and unlabeled learning in categorical data
    Ienco, Dino
    Pensa, Ruggero G.
    NEUROCOMPUTING, 2016, 196 : 113 - 124