Learning from data streams with only positive and unlabeled data

被引:13
|
作者
Qin, Xiangju [1 ]
Zhang, Yang [1 ,2 ]
Li, Chen [1 ]
Li, Xue [3 ]
机构
[1] Northwest A&F Univ, Coll Informat Engn, Yangling, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210008, Jiangsu, Peoples R China
[3] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld, Australia
基金
中国国家自然科学基金;
关键词
Positive and unlabeled learning; Data stream classification; Incremental learning; Functional leaves; DECISION TREES; CLASSIFICATION;
D O I
10.1007/s10844-012-0231-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many studies on streaming data classification have been based on a paradigm in which a fully labeled stream is available for learning purposes. However, it is often too labor-intensive and time-consuming to manually label a data stream for training. This difficulty may cause conventional supervised learning approaches to be infeasible in many real world applications, such as credit fraud detection, intrusion detection, and rare event prediction. In previous work, Li et al. suggested that these applications be treated as Positive and Unlabeled learning problem, and proposed a learning algorithm, OcVFD, as a solution (Li et al. 2009). Their method requires only a set of positive examples and a set of unlabeled examples which is easily obtainable in a streaming environment, making it widely applicable to real-life applications. Here, we enhance Li et al.'s solution by adding three features: an efficient method to estimate the percentage of positive examples in the training stream, the ability to handle numeric attributes, and the use of more appropriate classification methods at tree leaves. Experimental results on synthetic and real-life datasets show that our enhanced solution (called PUVFDT) has very good classification performance and a strong ability to learn from data streams with only positive and unlabeled examples. Furthermore, our enhanced solution reduces the learning time of OcVFDT by about an order of magnitude. Even with 80 % of the examples in the training data stream unlabeled, PUVFDT can still achieve a competitive classification performance compared with that of VFDTcNB (Gama et al. 2003), a supervised learning algorithm.
引用
收藏
页码:405 / 430
页数:26
相关论文
共 50 条
  • [31] Learning Instance Weighted Naive Bayes from labeled and unlabeled data
    Liangxiao Jiang
    Journal of Intelligent Information Systems, 2012, 38 : 257 - 268
  • [32] Scale Invariant Learning from Trapezoidal Data Streams
    Alagurajah, Jeevithan
    Yuan, Xu
    Wu, Xindong
    PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 505 - 508
  • [33] Autonomous Learning Multimodel Systems From Data Streams
    Angelov, Plamen P.
    Gu, Xiaowei
    Principe, Jose C.
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2018, 26 (04) : 2213 - 2224
  • [34] Recurring Drift Detection and Model Selection-Based Ensemble Classification for Data Streams with Unlabeled Data
    Peipei Li
    Man Wu
    Junhong He
    Xuegang Hu
    New Generation Computing, 2021, 39 : 341 - 376
  • [35] Recurring Drift Detection and Model Selection-Based Ensemble Classification for Data Streams with Unlabeled Data
    Li, Peipei
    Wu, Man
    He, Junhong
    Hu, Xuegang
    NEW GENERATION COMPUTING, 2021, 39 (02) : 341 - 376
  • [36] Incremental rule learning and border examples selection from numerical data streams
    Ferrer-Troyano, FJ
    Aguilar-Ruiz, JS
    Riquelme, JC
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2005, 11 (08) : 1426 - 1439
  • [37] Revisiting Strategies for Fitting Logistic Regression for Positive and Unlabeled Data
    WAWRZENCZYK, A. D. A. M.
    MIELNICZUK, J. A. N.
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2022, 32 (02) : 299 - 309
  • [38] Time series feature learning with labeled and unlabeled data
    Wang, Haishuai
    Zhang, Qin
    Wu, Jia
    Pan, Shirui
    Chen, Yixin
    PATTERN RECOGNITION, 2019, 89 : 55 - 66
  • [39] Global and local learning from positive and unlabeled examples
    Ting Ke
    Ling Jing
    Hui Lv
    Lidong Zhang
    Yaping Hu
    Applied Intelligence, 2018, 48 : 2373 - 2392
  • [40] Online Positive and Unlabeled Learning
    Zhang, Chuang
    Gong, Chen
    Liu, Tengfei
    Lu, Xun
    Wang, Weiqiang
    Yang, Jian
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2248 - 2254