Beyond the Selected Completely at Random Assumption for Learning from Positive and Unlabeled Data

被引:37
作者
Bekker, Jessa [1 ]
Robberechts, Pieter [1 ]
Davis, Jesse [1 ]
机构
[1] Katholieke Univ Leuven, Dept Comp Sci, Leuven, Belgium
来源
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II | 2020年 / 11907卷
关键词
PU learning; Unlabeled data; Classification;
D O I
10.1007/978-3-030-46147-8_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most positive and unlabeled data is subject to selection biases. The labeled examples can, for example, be selected from the positive set because they are easier to obtain or more obviously positive. This paper investigates how learning can be enabled in this setting. We propose and theoretically analyze an empirical-risk-based method for incorporating the labeling mechanism. Additionally, we investigate under which assumptions learning is possible when the labeling mechanism is not fully understood and propose a practical method to enable this. Our empirical analysis supports the theoretical results and shows that taking into account the possibility of a selection bias, even when the labeling mechanism is unknown, improves the trained classifiers.
引用
收藏
页码:71 / 85
页数:15
相关论文
共 37 条
[1]   Learning from positive and unlabeled data: a survey [J].
Bekker, Jessa ;
Davis, Jesse .
MACHINE LEARNING, 2020, 109 (04) :719-760
[2]  
Bekker J, 2018, AAAI CONF ARTIF INTE, P2712
[3]   Building text classifiers using positive and unlabeled examples [J].
Bing, L ;
Yang, D ;
Li, XL ;
Lee, WS ;
Yu, PS .
THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, :179-186
[4]  
Blanchard G, 2010, J MACH LEARN RES, V11, P2973
[5]  
Blockeel H., 2017, ILP 2017 LATE BREAKI
[6]   Positive-Unlabeled Learning in Streaming Networks [J].
Chang, Shiyu ;
Zhang, Yang ;
Tang, Jiliang ;
Yin, Dawei ;
Chang, Yi ;
Hasegawa-Johnson, Mark A. ;
Huang, Thomas S. .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :755-764
[7]  
Chapelle Olivier, 2009, IEEE Transactions on Neural Networks, V20, P542, DOI [DOI 10.1109/TNN.2009.2015974, 10.1109/TNN.2009.2015974]
[8]   A robust ensemble approach to learn from positive and unlabeled data using SVM base models [J].
Claesen, Marc ;
De Smet, Frank ;
Suykens, Johan A. K. ;
De Moor, Bart .
NEUROCOMPUTING, 2015, 160 :73-84
[9]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[10]   Learning from positive and unlabeled examples [J].
Denis, F ;
Gilleron, R ;
Letouzey, F .
THEORETICAL COMPUTER SCIENCE, 2005, 348 (01) :70-83