Large-Margin Label-Calibrated Support Vector Machines for Positive and Unlabeled Learning

被引:53
作者
Gong, Chen [1 ,2 ]
Liu, Tongliang [3 ,4 ]
Yang, Jian [5 ,6 ]
Tao, Dacheng [3 ,4 ]
机构
[1] Nanjing Univ Sci & Technol, PCA Lab, Key Lab Intelligent Percept & Syst High Dimens In, Minist Educ,Sch Comp Sci & Engn, Nanjing 210094, Jiangsu, Peoples R China
[2] Xidian Univ, State Key Lab Integrated Serv Networks, Xian 710071, Shaanxi, Peoples R China
[3] Univ Sydney, UBTECH Sydney Artificial Intelligence Ctr, Darlington, NSW 2008, Australia
[4] Univ Sydney, Sch Informat Technol, Fac Engn & Informat Technol, Darlington, NSW 2008, Australia
[5] Nanjing Univ Sci & Technol, PCA Lab, Key Lab Intelligent Percept & Syst High Dimens In, Minist Educ, Nanjing 210094, Jiangsu, Peoples R China
[6] Nanjing Univ Sci & Technol, Jiangsu Key Lab Image & Video Understanding Socia, Sch Comp Sci & Engn, Nanjing 210094, Jiangsu, Peoples R China
基金
澳大利亚研究理事会;
关键词
Support vector machines; Calibration; Training; Training data; Data models; Learning systems; Intserv networks; Label calibration; large margin; positive and unlabeled learning (PU Learning); support vector machines (SVMs); CLASSIFICATION;
D O I
10.1109/TNNLS.2019.2892403
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Positive and unlabeled learning (PU learning) aims to train a binary classifier based on only PU data. Existing methods usually cast PU learning as a label noise learning problem or a cost-sensitive learning problem. However, none of them fully take the data distribution information into consideration when designing the model, which hinders them from acquiring more encouraging performance. In this paper, we argue that the clusters formed by positive examples and potential negative examples in the feature space should be critically utilized to establish the PU learning model, especially when the negative data are not explicitly available. To this end, we introduce a hat loss to discover the margin between data clusters, a label calibration regularizer to amend the biased decision boundary to the potentially correct one, and propose a novel discriminative PU classifier termed "Large-margin Label-calibrated Support Vector Machines" (LLSVM). Our LLSVM classifier can work properly in the absence of negative training examples and effectively achieve the max-margin effect between positive and negative classes. Theoretically, we derived the generalization error bound of LLSVM which reveals that the introduction of PU data does help to enhance the algorithm performance. Empirically, we compared LLSVM with state-of-the-art PU methods on various synthetic and practical data sets, and the results confirm that the proposed LLSVM is more effective than other compared methods on dealing with PU learning tasks.
引用
收藏
页码:3471 / 3483
页数:13
相关论文
共 38 条
[1]  
[Anonymous], 2017, PROCEEDINGS OF THE 2
[2]  
[Anonymous], 14091556 ARXIV
[3]  
[Anonymous], 2018, P 2018 NEUR INF PROC
[4]  
[Anonymous], 2005, P 10 INT WORKSH ART
[5]  
[Anonymous], 2002, ICML
[6]  
[Anonymous], P 27 INT JOINT C ART
[7]  
[Anonymous], 2009, LEARNING MULTIPLE LA
[8]  
[Anonymous], 2012, P 4 AS C MACH LEARN
[9]  
[Anonymous], 2009, Introduction to semi-supervised learning
[10]  
Bartlett P. L., 2003, Journal of Machine Learning Research, V3, P463, DOI 10.1162/153244303321897690