Analysis of Learning from Positive and Unlabeled Data

被引:0
|
作者
du Plessis, Marthinus C. [1 ]
Niu, Gang [2 ]
Sugiyama, Masashi [1 ]
机构
[1] Univ Tokyo, Tokyo 1130033, Japan
[2] Baidu Inc, Beijing 100085, Peoples R China
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014) | 2014年 / 27卷
关键词
ALGORITHM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning a classifier from positive and unlabeled data is an important class of classification problems that are conceivable in many practical applications. In this paper, we first show that this problem can be solved by cost-sensitive learning between positive and unlabeled data. We then show that convex surrogate loss functions such as the hinge loss may lead to a wrong classification boundary due to an intrinsic bias, but the problem can be avoided by using non-convex loss functions such as the ramp loss. We next analyze the excess risk when the class prior is estimated from data, and show that the classification accuracy is not sensitive to class prior estimation if the unlabeled data is dominated by the positive data (this is naturally satisfied in inlier-based outlier detection because inliers are dominant in the unlabeled dataset). Finally, we provide generalization error bounds and show that, for an equal number of labeled and unlabeled samples, the generalization error of learning only from positive and unlabeled samples is no worse than 2 root 2 times the fully supervised case. These theoretical findings are also validated through experiments.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Spectral signature analysis of false positive burned area detection from agricultural harvests using Sentinel-2 data
    van Dijk, Daan
    Shoaie, Sorosh
    van Leeuwen, Thijs
    Veraverbeke, Sander
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2021, 97
  • [42] Employing unlabeled data to improve the classification performance of SVM, and its application in audio event classification
    Leng, Yan
    Sun, Chengli
    Xu, Xinyan
    Yuan, Qi
    Xing, Shuning
    Wan, Honglin
    Wang, Jingjing
    Li, Dengwang
    KNOWLEDGE-BASED SYSTEMS, 2016, 98 : 117 - 129
  • [43] Active Learning From Imbalanced Data: A Solution of Online Weighted Extreme Learning Machine
    Yu, Hualong
    Yang, Xibei
    Zheng, Shang
    Sun, Changyin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (04) : 1088 - 1103
  • [44] Learning Transferred Weights From Co-Occurrence Data for Heterogeneous Transfer Learning
    Yang, Liu
    Jing, Liping
    Yu, Jian
    Ng, Michael K.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (11) : 2187 - 2200
  • [45] A Novel Meta-cognitive Extreme Learning Machine to Learning from Data Streams
    Pratama, Mahardhika
    Lu, Jie
    Zhang, Guangquan
    2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS, 2015, : 2792 - 2797
  • [46] Implicit data crimes: Machine learning bias arising from misuse of public data
    Shimron, Efrat
    Tamir, Jonathan, I
    Wang, Ke
    Lustig, Michael
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2022, 119 (13)
  • [47] Shifting machine learning for healthcare from development to deployment and from models to data
    Zhang, Angela
    Xing, Lei
    Zou, James
    Wu, Joseph C.
    NATURE BIOMEDICAL ENGINEERING, 2022, 6 (12) : 1330 - 1345
  • [48] Integrative Data Analysis of Multi-Platform Cancer Data with a Multimodal Deep Learning Approach
    Liang, Muxuan
    Li, Zhizhong
    Chen, Ting
    Zeng, Jianyang
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (04) : 928 - 937
  • [49] Learning from incomplete data in Bayesian networks with qualitative influences
    Masegosa, Andres R.
    Feelders, Ad J.
    van der Gaag, Linda C.
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2016, 69 : 18 - 34
  • [50] Learning from streaming data with unsupervised heterogeneous domain adaptation
    Moradi, Mona
    Rahmanimanesh, Mohammad
    Shahzadi, Ali
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2025, 19 (01) : 61 - 81