Class Prior Estimation from Positive and Unlabeled Data

被引:57
作者
Du Plessis, Marthinus Christoffel [1 ]
Sugiyama, Masashi [1 ]
机构
[1] Tokyo Inst Technol, Dept Comp Sci, Tokyo 1528552, Japan
来源
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2014年 / E97D卷 / 05期
关键词
class-prior change; outlier detection; positive and unlabeled learning; divergence estimation; pearson divergence; RATIO;
D O I
10.1587/transinf.E97.D.1358
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider the problem of learning a classifier using only positive and unlabeled samples. In this setting, it is known that a classifier can be successfully learned if the class prior is available. However, in practice, the class prior is unknown and thus must be estimated from data. In this paper, we propose a new method to estimate the class prior by partially matching the class-conditional density of the positive class to the input density. By performing this partial matching in terms of the Pearson divergence, which we estimate directly without density estimation via lower-bound maximization, we can obtain an analytical estimator of the class prior. We further show that an existing class prior estimation method can also be interpreted as performing partial matching under the Pearson divergence, but in an indirect manner. The superiority of our direct class prior estimation method is illustrated on several benchmark datasets.
引用
收藏
页码:1358 / 1362
页数:5
相关论文
共 11 条
  • [1] [Anonymous], 2000, NATURE STAT LEARNING, DOI DOI 10.1007/978-1-4757-3264-1
  • [2] du Plessis M.C., 2012, Proceedings of the 29th International Conference on Machine Learning, ICML 2012, P823
  • [3] Elkan C., 2008, PROC 14 ACM SIGKDD I, P213, DOI DOI 10.1145/1401890.1401920
  • [4] Hastie T., 2001, ELEMENTS STAT LEARNI
  • [5] Inlier-based Outlier Detection via Direct Density Ratio Estimation
    Hido, Shohei
    Tsuboi, Yuta
    Kashima, Hisashi
    Sugiyama, Masashi
    Kanamori, Takafumi
    [J]. ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 223 - 232
  • [6] Kanamori T., 2009, PROC 22 ANN C NEURAL, P809
  • [7] Kanamori T, 2009, J MACH LEARN RES, V10, P1391
  • [8] Dual representation of φ-divergences and applications
    Keziou, A
    [J]. COMPTES RENDUS MATHEMATIQUE, 2003, 336 (10) : 857 - 862
  • [9] A Positive and Unlabeled Learning Algorithm for One-Class Classification of Remote-Sensing Data
    Li, Wenkai
    Guo, Qinghua
    Elkan, Charles
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2011, 49 (02): : 717 - 725
  • [10] Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization
    Nguyen, XuanLong
    Wainwright, Martin J.
    Jordan, Michael I.
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2010, 56 (11) : 5847 - 5861