Modeling PU learning using probabilistic logic programming

被引:0
作者
Verreet, Victor [1 ]
De Raedt, Luc [1 ,2 ]
Bekker, Jessa [1 ]
机构
[1] Katholieke Univ Leuven, Dept Comp Sci, Leuven AI, Leuven, Belgium
[2] Orebro Univ, AASS, Orebro, Sweden
关键词
Positive unlabeled learning; Weak supervision; Probabilistic logic programming; Modeling; Unidentifiability;
D O I
10.1007/s10994-023-06461-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The goal of learning from positive and unlabeled (PU) examples is to learn a classifier that predicts the posterior class probability. The challenge is that the available labels in the data are determined by (1) the true class, and (2) the labeling mechanism that selects which positive examples get labeled, where often certain examples have a higher probability to be selected than others. Incorrectly assuming an unbiased labeling mechanism leads to learning a biased classifier. Yet, this is what most existing methods do. A handful of methods makes more realistic assumptions, but they are either so general that it is impossible to distinguish between the effects of the true classification and of the labeling mechanism, or too restrictive to correctly model the real situation, or require knowledge that is typically unavailable. This paper studies how to formulate and integrate more realistic assumptions for learning better classifiers, by exploiting the strengths of probabilistic logic programming (PLP). Concretely, (1) we propose PU ProbLog: a PLP-based general method that allows to (partially) model the labeling mechanism. (2) We show that our method generalizes existing methods, in the sense that it can model the same assumptions. (3) Thanks to the use of PLP, our method supports also PU learning in relational domains. (4) Our empirical analysis shows that partially modeling the labeling bias, improves the learned classifiers.
引用
收藏
页码:1351 / 1372
页数:22
相关论文
共 31 条
[1]   Beyond the Selected Completely at Random Assumption for Learning from Positive and Unlabeled Data [J].
Bekker, Jessa ;
Robberechts, Pieter ;
Davis, Jesse .
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 11907 :71-85
[2]   Learning from positive and unlabeled data: a survey [J].
Bekker, Jessa ;
Davis, Jesse .
MACHINE LEARNING, 2020, 109 (04) :719-760
[3]  
Bekker J, 2018, AAAI CONF ARTIF INTE, P2712
[4]   Positive and Unlabeled Relational Classification Through Label Frequency Estimation [J].
Bekker, Jessa ;
Davis, Jesse .
INDUCTIVE LOGIC PROGRAMMING (ILP 2017), 2018, 10759 :16-30
[5]   Building text classifiers using positive and unlabeled examples [J].
Bing, L ;
Yang, D ;
Li, XL ;
Lee, WS ;
Yu, PS .
THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, :179-186
[6]  
Blanchard G, 2010, J MACH LEARN RES, V11, P2973
[7]   Top-down induction of first-order logical decision trees [J].
Blockeel, H ;
De Raedt, L .
ARTIFICIAL INTELLIGENCE, 1998, 101 (1-2) :285-297
[8]  
Blockeel H., 2017, INT C IND LOG PROGR
[9]   Positive-Unlabeled Learning in Streaming Networks [J].
Chang, Shiyu ;
Zhang, Yang ;
Tang, Jiliang ;
Yin, Dawei ;
Chang, Yi ;
Hasegawa-Johnson, Mark A. ;
Huang, Thomas S. .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :755-764
[10]  
du Plessis MC, 2015, PR MACH LEARN RES, V37, P1386