Learning from data streams with only positive and unlabeled data

被引:13
|
作者
Qin, Xiangju [1 ]
Zhang, Yang [1 ,2 ]
Li, Chen [1 ]
Li, Xue [3 ]
机构
[1] Northwest A&F Univ, Coll Informat Engn, Yangling, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210008, Jiangsu, Peoples R China
[3] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld, Australia
基金
中国国家自然科学基金;
关键词
Positive and unlabeled learning; Data stream classification; Incremental learning; Functional leaves; DECISION TREES; CLASSIFICATION;
D O I
10.1007/s10844-012-0231-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many studies on streaming data classification have been based on a paradigm in which a fully labeled stream is available for learning purposes. However, it is often too labor-intensive and time-consuming to manually label a data stream for training. This difficulty may cause conventional supervised learning approaches to be infeasible in many real world applications, such as credit fraud detection, intrusion detection, and rare event prediction. In previous work, Li et al. suggested that these applications be treated as Positive and Unlabeled learning problem, and proposed a learning algorithm, OcVFD, as a solution (Li et al. 2009). Their method requires only a set of positive examples and a set of unlabeled examples which is easily obtainable in a streaming environment, making it widely applicable to real-life applications. Here, we enhance Li et al.'s solution by adding three features: an efficient method to estimate the percentage of positive examples in the training stream, the ability to handle numeric attributes, and the use of more appropriate classification methods at tree leaves. Experimental results on synthetic and real-life datasets show that our enhanced solution (called PUVFDT) has very good classification performance and a strong ability to learn from data streams with only positive and unlabeled examples. Furthermore, our enhanced solution reduces the learning time of OcVFDT by about an order of magnitude. Even with 80 % of the examples in the training data stream unlabeled, PUVFDT can still achieve a competitive classification performance compared with that of VFDTcNB (Gama et al. 2003), a supervised learning algorithm.
引用
收藏
页码:405 / 430
页数:26
相关论文
共 50 条
  • [21] MULTI TASK LEARNING WITH POSITIVE AND UNLABELED DATA AND ITS APPLICATION TO MENTAL STATE PREDICTION
    Kaji, Hirotaka
    Yamaguchi, Hayato
    Sugiyama, Masashi
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2301 - 2305
  • [22] Learning Only from Relevant Keywords and Unlabeled Documents
    Charoenphakdee, Nontawat
    Lee, Jongyeong
    Jin, Yiping
    Wanvarie, Dittaya
    Sugiyama, Masashi
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3993 - 4002
  • [23] Learning from evolving data streams through ensembles of random patches
    Gomes, Heitor Murilo
    Read, Jesse
    Bifet, Albert
    Durrant, Robert J.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 63 (07) : 1597 - 1625
  • [24] Classifying networked text data with positive and unlabeled examples
    Li, Mei
    Pan, Shirui
    Zhang, Yang
    Cai, Xiaoyan
    PATTERN RECOGNITION LETTERS, 2016, 77 : 1 - 7
  • [25] Tensor decision trees for continual learning from drifting data streams
    Krawczyk, Bartosz
    MACHINE LEARNING, 2021, 110 (11-12) : 3015 - 3035
  • [26] Learning to Integrate Unlabeled Data in Text Classification
    Jiang, Eric P.
    ICCSIT 2010 - 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, VOL 4, 2010, : 82 - 86
  • [27] Learning with Augmented Class by Exploiting Unlabeled Data
    Da, Qing
    Yu, Yang
    Zhou, Zhi-Hua
    PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 1760 - 1766
  • [28] Learning Instance Weighted Naive Bayes from labeled and unlabeled data
    Jiang, Liangxiao
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2012, 38 (01) : 257 - 268
  • [29] Online Active Learning for Drifting Data Streams
    Liu, Sanmin
    Xue, Shan
    Wu, Jia
    Zhou, Chuan
    Yang, Jian
    Li, Zhao
    Cao, Jie
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (01) : 186 - 200
  • [30] An evolutionary multi-objective approach to learn from positive and unlabeled data
    Qiu, Jianfeng
    Cai, Xiaoqiang
    Zhang, Xingyi
    Cheng, Fan
    Yuan, Shenzhi
    Fu, Guanglong
    APPLIED SOFT COMPUTING, 2021, 101