Learning from data streams with only positive and unlabeled data

被引:13
|
作者
Qin, Xiangju [1 ]
Zhang, Yang [1 ,2 ]
Li, Chen [1 ]
Li, Xue [3 ]
机构
[1] Northwest A&F Univ, Coll Informat Engn, Yangling, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210008, Jiangsu, Peoples R China
[3] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld, Australia
基金
中国国家自然科学基金;
关键词
Positive and unlabeled learning; Data stream classification; Incremental learning; Functional leaves; DECISION TREES; CLASSIFICATION;
D O I
10.1007/s10844-012-0231-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many studies on streaming data classification have been based on a paradigm in which a fully labeled stream is available for learning purposes. However, it is often too labor-intensive and time-consuming to manually label a data stream for training. This difficulty may cause conventional supervised learning approaches to be infeasible in many real world applications, such as credit fraud detection, intrusion detection, and rare event prediction. In previous work, Li et al. suggested that these applications be treated as Positive and Unlabeled learning problem, and proposed a learning algorithm, OcVFD, as a solution (Li et al. 2009). Their method requires only a set of positive examples and a set of unlabeled examples which is easily obtainable in a streaming environment, making it widely applicable to real-life applications. Here, we enhance Li et al.'s solution by adding three features: an efficient method to estimate the percentage of positive examples in the training stream, the ability to handle numeric attributes, and the use of more appropriate classification methods at tree leaves. Experimental results on synthetic and real-life datasets show that our enhanced solution (called PUVFDT) has very good classification performance and a strong ability to learn from data streams with only positive and unlabeled examples. Furthermore, our enhanced solution reduces the learning time of OcVFDT by about an order of magnitude. Even with 80 % of the examples in the training data stream unlabeled, PUVFDT can still achieve a competitive classification performance compared with that of VFDTcNB (Gama et al. 2003), a supervised learning algorithm.
引用
收藏
页码:405 / 430
页数:26
相关论文
共 50 条
  • [41] Incremental learning of approximations from positive data
    Grieser, G
    Lange, S
    INFORMATION PROCESSING LETTERS, 2004, 89 (01) : 37 - 42
  • [42] Active learning for data streams: a survey
    Cacciarelli, Davide
    Kulahci, Murat
    MACHINE LEARNING, 2024, 113 (01) : 185 - 239
  • [43] Learning Simplified Decision Boundaries from Trapezoidal Data Streams
    Beyazit, Ege
    Hosseini, Matin
    Maida, Anthony
    Wu, Xindong
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 : 508 - 517
  • [44] A Positive-Unlabeled Learning Model for Extending a Vietnamese Petroleum Dictionary Based on Vietnamese Wikipedia Data
    Ngoc-Trinh Vu
    Quoc-Dat Nguyen
    Tien-Dat Nguyen
    Manh-Cuong Nguyen
    Van-Vuong Vu
    Quang-Thuy Ha
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2018, PT I, 2018, 10751 : 190 - 199
  • [45] A robust ensemble approach to learn from positive and unlabeled data using SVM base models
    Claesen, Marc
    De Smet, Frank
    Suykens, Johan A. K.
    De Moor, Bart
    NEUROCOMPUTING, 2015, 160 : 73 - 84
  • [46] A new classifier based on information theoretic learning with unlabeled data
    Jeong, KH
    Xu, JW
    Erdogmus, D
    Principe, JC
    NEURAL NETWORKS, 2005, 18 (5-6) : 719 - 726
  • [47] Regularization of Unlabeled Data for Learning of Classifiers based on Mixture Models
    Iswanto, Bambang Heru
    ICICI-BME: 2009 INTERNATIONAL CONFERENCE ON INSTRUMENTATION, COMMUNICATION, INFORMATION TECHNOLOGY, AND BIOMEDICAL ENGINEERING, 2009, : 345 - 349
  • [48] Large Margin Distribution Learning with Cost Interval and Unlabeled Data
    Zhou, Yu-Hang
    Zhou, Zhi-Hua
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (07) : 1749 - 1763
  • [49] Conditional generative positive and unlabeled learning
    Papic, Ales
    Kononenko, Igor
    Bosnic, Zoran
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 224
  • [50] Positive and unlabeled examples help learning
    De Comité, F
    Denis, F
    Gilleron, R
    Letouzey, F
    ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 1999, 1720 : 219 - 230