Analysis of Learning from Positive and Unlabeled Data

被引：0

作者：

du Plessis, Marthinus C. ^{[1
]}

Niu, Gang ^{[2
]}

Sugiyama, Masashi ^{[1
]}

机构：

[1] Univ Tokyo, Tokyo 1130033, Japan

[2] Baidu Inc, Beijing 100085, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014) | 2014年 / 27卷

关键词：

ALGORITHM;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning a classifier from positive and unlabeled data is an important class of classification problems that are conceivable in many practical applications. In this paper, we first show that this problem can be solved by cost-sensitive learning between positive and unlabeled data. We then show that convex surrogate loss functions such as the hinge loss may lead to a wrong classification boundary due to an intrinsic bias, but the problem can be avoided by using non-convex loss functions such as the ramp loss. We next analyze the excess risk when the class prior is estimated from data, and show that the classification accuracy is not sensitive to class prior estimation if the unlabeled data is dominated by the positive data (this is naturally satisfied in inlier-based outlier detection because inliers are dominant in the unlabeled dataset). Finally, we provide generalization error bounds and show that, for an equal number of labeled and unlabeled samples, the generalization error of learning only from positive and unlabeled samples is no worse than 2 root 2 times the fully supervised case. These theoretical findings are also validated through experiments.

引用

页数：9

共 50 条

[21] Deep learning for heterogeneous medical data analysis
Yue, Lin
Tian, Dongyuan
Chen, Weitong
Han, Xuming
Yin, Minghao
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2020, 23 (05): : 2715 - 2737
[22] Machine learning in neutron scattering data analysis
Wang, Hao
Du, Rong
Liu, Zhiyong
Zhang, Junrong
JOURNAL OF RADIATION RESEARCH AND APPLIED SCIENCES, 2024, 17 (02)
[23] Radar-based Hail-producing Storm Detection Using Positive Unlabeled Classification
Shi, Junzhi
Wang, Ping
Wang, Di
Jia, Huizhen
TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2020, 27 (03): : 941 - 950
[24] Learning possibilistic networks from data: a survey
Haddad, Maroua
Leray, Philippe
Ben Amor, Nahla
PROCEEDINGS OF THE 2015 CONFERENCE OF THE INTERNATIONAL FUZZY SYSTEMS ASSOCIATION AND THE EUROPEAN SOCIETY FOR FUZZY LOGIC AND TECHNOLOGY, 2015, 89 : 194 - 201
[25] Learning EPON delay models from data: a machine learning approach
Alberto Hernandez, Jose
Ebrahimzadeh, Amin
Maier, Martin
Larrabeiti, David
JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING, 2021, 13 (12) : 322 - 330
[26] Editorial: Analysis and synthesis of ecological data by machine learning
Recknagel, Friedrich
Staiano, Antonino
ECOLOGICAL INFORMATICS, 2019, 53
[27] Machine learning for internet of things data analysis: a survey
Mahdavinejad, Mohammad Saeid
Rezvan, Mohammadreza
Barekatain, Mohammadamin
Adibi, Peyman
Barnaghi, Payam
Sheth, Amit P.
DIGITAL COMMUNICATIONS AND NETWORKS, 2018, 4 (03) : 161 - 175
[28] Learning naive Bayes classifiers from positive and unlabelled examples with uncertainty
He, Jiazhen
Zhang, Yang
Li, Xue
Shi, Peng
INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2012, 43 (10) : 1805 - 1825
[29] Walk for Learning: A Random Walk Approach for Federated Learning From Heterogeneous Data
Ayache, Ghadir
Dassari, Venkat
El Rouayheb, Salim
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2023, 41 (04) : 929 - 940
[30] State Aggregation Learning from Markov Transition Data
Duan, Yaqi
Ke, Zheng Tracy
Wang, Mengdi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32

← 1 2 3 4 5 →