STDS: self-training data streams for mining limited labeled data in non-stationary environment

被引:0
|
作者
Shirin Khezri
Jafar Tanha
Ali Ahmadi
Arash Sharifi
机构
[1] Islamic Azad University,Department of Computer Engineering, Science and Research Branch
[2] University of Tabriz,Electrical and computer Engineering Department
[3] School of Computer Science,Faculty of Computer Engineering
[4] Institute for Research in Fundamental Sciences (IPM),undefined
[5] K.N.Toosi University of Technology,undefined
来源
Applied Intelligence | 2020年 / 50卷
关键词
Semi-supervised learning; Self-training; Data streams; Concept drift; Clustering algorithm;
D O I
暂无
中图分类号
学科分类号
摘要
Inthis article, wefocus on the classification problem to semi-supervised learning in non-stationary environment. Semi-supervised learning is a learning task from both labeled and unlabeled data points. There are several approaches to semi-supervised learning in stationary environment which are not applicable directly for data streams. We propose a novel semi-supervised learning algorithm, named STDS. The proposed approach uses labeled and unlabeled data and employs an approach to handle the concept drift in data streams. The main challenge in semi-supervised self-training for data streams is to find a proper selection metric in order to find a set of high-confidence predictions and a proper underlying base learner. We therefore propose an ensemble approach to find a set of high-confidence predictions based on clustering algorithms and classifier predictions. We then employ the Kullback-Leibler (KL) divergence approach to measure the distribution differences between sequential chunks in order to detect the concept drift. When drift is detected, a new classifier is updated from the new set of labeled data in the current chunk; otherwise, a percentage of high-confidence newly labeled data in the current chunk is added to the labeled data in the next chunk for updating the incremental classifier based on the proposed selection metric. The results of our experiments on a number of classification benchmark datasets show that STDS outperforms the supervised and the most of other semi-supervised learning methods.
引用
收藏
页码:1448 / 1467
页数:19
相关论文
共 50 条
  • [21] Detection of evolving concepts in non-stationary data streams: A multiple kernel learning approach
    Siahroudi, Sajjad Kamali
    Moodi, Poorya Zare
    Beigy, Hamid
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 91 : 187 - 197
  • [22] Concept-evolution detection in non-stationary data streams: a fuzzy clustering approach
    ZareMoodi, Poorya
    Siahroudi, Sajjad Kamali
    Beigy, Hamid
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 60 (03) : 1329 - 1352
  • [23] A novel semi-supervised ensemble algorithm using a performance-based selection metric to non-stationary data streams
    Khezri, Shirin
    Tanha, Jafar
    Ahmadi, Ali
    Sharifi, Arash
    NEUROCOMPUTING, 2021, 442 : 125 - 145
  • [24] Concept-evolution detection in non-stationary data streams: a fuzzy clustering approach
    Poorya ZareMoodi
    Sajjad Kamali Siahroudi
    Hamid Beigy
    Knowledge and Information Systems, 2019, 60 : 1329 - 1352
  • [25] Drift Detection over Non-stationary Data Streams Using Evolving Spiking Neural Networks
    Lobo, Jesus L.
    Del Ser, Javier
    Lana, Ibai
    Nekane Bilbao, Miren
    Kasabov, Nikola
    INTELLIGENT DISTRIBUTED COMPUTING XII, 2018, 798 : 82 - 94
  • [26] Online cost-sensitive neural network classifiers for non-stationary and imbalanced data streams
    Ghazikhani, Adel
    Monsefi, Reza
    Yazdi, Hadi Sadoghi
    NEURAL COMPUTING & APPLICATIONS, 2013, 23 (05) : 1283 - 1295
  • [27] Just-in-time software defect prediction method for non-stationary and imbalanced data streams
    Wu, Qikai
    Wang, Xingqi
    Wei, Dan
    Chen, Bin
    Dang, Qingguo
    SOFTWARE QUALITY JOURNAL, 2025, 33 (01)
  • [28] Universal Lesion Detection and Classification Using Limited Data and Weakly-Supervised Self-training
    Naga, Varun
    Mathai, Tejas Sudharshan
    Paul, Angshuman
    Summers, Ronald M.
    MEDICAL IMAGE LEARNING WITH LIMITED AND NOISY DATA (MILLAND 2022), 2022, 13559 : 55 - 64
  • [29] Online cost-sensitive neural network classifiers for non-stationary and imbalanced data streams
    Adel Ghazikhani
    Reza Monsefi
    Hadi Sadoghi Yazdi
    Neural Computing and Applications, 2013, 23 : 1283 - 1295
  • [30] Leveraging Contrastive Learning and Self-Training for Multimodal Emotion Recognition with Limited Labeled Samples
    Fan, Qi
    Li, Yutong
    Xin, Yi
    Cheng, Xinyu
    Gao, Guanglai
    Ma, Miao
    PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON MULTIMODAL AND RESPONSIBLE AFFECTIVE COMPUTING, MRAC 2024, 2024, : 72 - 77