Analysis of Learning from Positive and Unlabeled Data

被引:0
|
作者
du Plessis, Marthinus C. [1 ]
Niu, Gang [2 ]
Sugiyama, Masashi [1 ]
机构
[1] Univ Tokyo, Tokyo 1130033, Japan
[2] Baidu Inc, Beijing 100085, Peoples R China
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014) | 2014年 / 27卷
关键词
ALGORITHM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning a classifier from positive and unlabeled data is an important class of classification problems that are conceivable in many practical applications. In this paper, we first show that this problem can be solved by cost-sensitive learning between positive and unlabeled data. We then show that convex surrogate loss functions such as the hinge loss may lead to a wrong classification boundary due to an intrinsic bias, but the problem can be avoided by using non-convex loss functions such as the ramp loss. We next analyze the excess risk when the class prior is estimated from data, and show that the classification accuracy is not sensitive to class prior estimation if the unlabeled data is dominated by the positive data (this is naturally satisfied in inlier-based outlier detection because inliers are dominant in the unlabeled dataset). Finally, we provide generalization error bounds and show that, for an equal number of labeled and unlabeled samples, the generalization error of learning only from positive and unlabeled samples is no worse than 2 root 2 times the fully supervised case. These theoretical findings are also validated through experiments.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Deep learning for heterogeneous medical data analysis
    Yue, Lin
    Tian, Dongyuan
    Chen, Weitong
    Han, Xuming
    Yin, Minghao
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2020, 23 (05): : 2715 - 2737
  • [22] Machine learning in neutron scattering data analysis
    Wang, Hao
    Du, Rong
    Liu, Zhiyong
    Zhang, Junrong
    JOURNAL OF RADIATION RESEARCH AND APPLIED SCIENCES, 2024, 17 (02)
  • [23] Radar-based Hail-producing Storm Detection Using Positive Unlabeled Classification
    Shi, Junzhi
    Wang, Ping
    Wang, Di
    Jia, Huizhen
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2020, 27 (03): : 941 - 950
  • [24] Learning possibilistic networks from data: a survey
    Haddad, Maroua
    Leray, Philippe
    Ben Amor, Nahla
    PROCEEDINGS OF THE 2015 CONFERENCE OF THE INTERNATIONAL FUZZY SYSTEMS ASSOCIATION AND THE EUROPEAN SOCIETY FOR FUZZY LOGIC AND TECHNOLOGY, 2015, 89 : 194 - 201
  • [25] Learning EPON delay models from data: a machine learning approach
    Alberto Hernandez, Jose
    Ebrahimzadeh, Amin
    Maier, Martin
    Larrabeiti, David
    JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING, 2021, 13 (12) : 322 - 330
  • [26] Editorial: Analysis and synthesis of ecological data by machine learning
    Recknagel, Friedrich
    Staiano, Antonino
    ECOLOGICAL INFORMATICS, 2019, 53
  • [27] Machine learning for internet of things data analysis: a survey
    Mahdavinejad, Mohammad Saeid
    Rezvan, Mohammadreza
    Barekatain, Mohammadamin
    Adibi, Peyman
    Barnaghi, Payam
    Sheth, Amit P.
    DIGITAL COMMUNICATIONS AND NETWORKS, 2018, 4 (03) : 161 - 175
  • [28] Learning naive Bayes classifiers from positive and unlabelled examples with uncertainty
    He, Jiazhen
    Zhang, Yang
    Li, Xue
    Shi, Peng
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2012, 43 (10) : 1805 - 1825
  • [29] Walk for Learning: A Random Walk Approach for Federated Learning From Heterogeneous Data
    Ayache, Ghadir
    Dassari, Venkat
    El Rouayheb, Salim
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2023, 41 (04) : 929 - 940
  • [30] State Aggregation Learning from Markov Transition Data
    Duan, Yaqi
    Ke, Zheng Tracy
    Wang, Mengdi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32