Analysis of Learning from Positive and Unlabeled Data

被引：0

作者：

du Plessis, Marthinus C. ^{[1
]}

Niu, Gang ^{[2
]}

Sugiyama, Masashi ^{[1
]}

机构：

[1] Univ Tokyo, Tokyo 1130033, Japan

[2] Baidu Inc, Beijing 100085, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014) | 2014年 / 27卷

关键词：

ALGORITHM;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning a classifier from positive and unlabeled data is an important class of classification problems that are conceivable in many practical applications. In this paper, we first show that this problem can be solved by cost-sensitive learning between positive and unlabeled data. We then show that convex surrogate loss functions such as the hinge loss may lead to a wrong classification boundary due to an intrinsic bias, but the problem can be avoided by using non-convex loss functions such as the ramp loss. We next analyze the excess risk when the class prior is estimated from data, and show that the classification accuracy is not sensitive to class prior estimation if the unlabeled data is dominated by the positive data (this is naturally satisfied in inlier-based outlier detection because inliers are dominant in the unlabeled dataset). Finally, we provide generalization error bounds and show that, for an equal number of labeled and unlabeled samples, the generalization error of learning only from positive and unlabeled samples is no worse than 2 root 2 times the fully supervised case. These theoretical findings are also validated through experiments.

引用

页数：9

共 50 条

[1] Learning from positive and unlabeled data: a survey
Jessa Bekker
Jesse Davis
Machine Learning, 2020, 109 : 719 - 760
[2] Learning from positive and unlabeled data: a survey
Bekker, Jessa
Davis, Jesse
MACHINE LEARNING, 2020, 109 (04) : 719 - 760
[3] Learning from Positive and Unlabeled Data with Arbitrary Positive Shift
Hammoudeh, Zayd
Lowd, Daniel
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[4] Learning from data streams with only positive and unlabeled data
Qin, Xiangju
Zhang, Yang
Li, Chen
Li, Xue
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2013, 40 (03) : 405 - 430
[5] Learning from data streams with only positive and unlabeled data
Xiangju Qin
Yang Zhang
Chen Li
Xue Li
Journal of Intelligent Information Systems, 2013, 40 : 405 - 430
[6] Convex Formulation for Learning from Positive and Unlabeled Data
du Plessis, Marthinus Christoffel
Niu, Gang
Sugiyama, Masashi
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1386 - 1394
[7] Positive-Unlabeled Learning from Imbalanced Data
Su, Guangxin
Chen, Weitong
Xu, Miao
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2995 - 3001
[8] Predictive Adversarial Learning from Positive and Unlabeled Data
Hu, Wenpeng
Le, Ran
Liu, Bing
Ji, Feng
Ma, Jinwen
Zhao, Dongyan
Yan, Rui
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7806 - 7814
[9] Federated Learning with Positive and Unlabeled Data
Lin, Xinyang
Chen, Hanting
Xu, Yixing
Xu, Chao
Gui, Xiaolin
Deng, Yiping
Wang, Yunhe
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[10] Positive and unlabeled learning in categorical data
Ienco, Dino
Pensa, Ruggero G.
NEUROCOMPUTING, 2016, 196 : 113 - 124

← 1 2 3 4 5 →