STDS: self-training data streams for mining limited labeled data in non-stationary environment

被引:0
|
作者
Shirin Khezri
Jafar Tanha
Ali Ahmadi
Arash Sharifi
机构
[1] Islamic Azad University,Department of Computer Engineering, Science and Research Branch
[2] University of Tabriz,Electrical and computer Engineering Department
[3] School of Computer Science,Faculty of Computer Engineering
[4] Institute for Research in Fundamental Sciences (IPM),undefined
[5] K.N.Toosi University of Technology,undefined
来源
Applied Intelligence | 2020年 / 50卷
关键词
Semi-supervised learning; Self-training; Data streams; Concept drift; Clustering algorithm;
D O I
暂无
中图分类号
学科分类号
摘要
Inthis article, wefocus on the classification problem to semi-supervised learning in non-stationary environment. Semi-supervised learning is a learning task from both labeled and unlabeled data points. There are several approaches to semi-supervised learning in stationary environment which are not applicable directly for data streams. We propose a novel semi-supervised learning algorithm, named STDS. The proposed approach uses labeled and unlabeled data and employs an approach to handle the concept drift in data streams. The main challenge in semi-supervised self-training for data streams is to find a proper selection metric in order to find a set of high-confidence predictions and a proper underlying base learner. We therefore propose an ensemble approach to find a set of high-confidence predictions based on clustering algorithms and classifier predictions. We then employ the Kullback-Leibler (KL) divergence approach to measure the distribution differences between sequential chunks in order to detect the concept drift. When drift is detected, a new classifier is updated from the new set of labeled data in the current chunk; otherwise, a percentage of high-confidence newly labeled data in the current chunk is added to the labeled data in the next chunk for updating the incremental classifier based on the proposed selection metric. The results of our experiments on a number of classification benchmark datasets show that STDS outperforms the supervised and the most of other semi-supervised learning methods.
引用
收藏
页码:1448 / 1467
页数:19
相关论文
共 50 条
  • [31] Online Machine Learning from Non-stationary Data Streams in the Presence of Concept Drift and Class Imbalance: A Systematic Review
    Palli, Abdul Sattar
    Jaafar, Jafreezal
    Gilal, Abdul Rehman
    Alsughayyir, Aeshah
    Gomes, Heitor Murilo
    Alshanqiti, Abdullah
    Omar, Mazni
    JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA, 2024, 23 (01): : 105 - 139
  • [32] A self-training semi-supervised machine learning method for predictive mapping of soil classes with limited sample data
    Zhang, Lei
    Yang, Lin
    Ma, Tianwu
    Shen, Feixue
    Cai, Yanyan
    Zhou, Chenghu
    GEODERMA, 2021, 384
  • [33] Spike sorting: Bayesian clustering of non-stationary data
    Bar-Hillel, Aharon
    Spiro, Adam
    Stark, Eran
    JOURNAL OF NEUROSCIENCE METHODS, 2006, 157 (02) : 303 - 316
  • [34] Mining Recurring Concept Drifts with Limited Labeled Streaming Data
    Li, Peipei
    Wu, Xindong
    Hu, Xuegang
    PROCEEDINGS OF 2ND ASIAN CONFERENCE ON MACHINE LEARNING (ACML2010), 2010, 13 : 241 - 252
  • [35] Mining Recurring Concept Drifts with Limited Labeled Streaming Data
    Li, Peipei
    Wu, Xindong
    Hu, Xuegang
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2012, 3 (02)
  • [36] Fast semi-supervised self-training algorithm based on data editing
    Li, Bing
    Wang, Jikui
    Yang, Zhengguo
    Yi, Jihai
    Nie, Feiping
    INFORMATION SCIENCES, 2023, 626 : 293 - 314
  • [37] RDIS: Random Drop Imputation With Self-Training for Incomplete Time Series Data
    Choi, Tae-Min
    Kang, Ji-Su
    Kim, Jong-Hwan
    IEEE ACCESS, 2023, 11 : 100720 - 100728
  • [38] FEDERATED SELF-TRAINING FOR DATA-EFFICIENT AUDIO RECOGNITION
    Tsouvalas, Vasileios
    Saeed, Aaqib
    Ozcelebi, Tanir
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 476 - 480
  • [39] A New Ensemble Method for Multi-label Data Stream Classification in Non-stationary Environment
    Song, Ge
    Ye, Yunming
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 1776 - 1783
  • [40] Neural networks for online learning of non-stationary data streams: a review and application for smart grids flexibility improvement
    Hammami, Zeineb
    Sayed-Mouchaweh, Moamar
    Mouelhi, Wiem
    Ben Said, Lamjed
    ARTIFICIAL INTELLIGENCE REVIEW, 2020, 53 (08) : 6111 - 6154