STDS: self-training data streams for mining limited labeled data in non-stationary environment

被引:0
|
作者
Shirin Khezri
Jafar Tanha
Ali Ahmadi
Arash Sharifi
机构
[1] Islamic Azad University,Department of Computer Engineering, Science and Research Branch
[2] University of Tabriz,Electrical and computer Engineering Department
[3] School of Computer Science,Faculty of Computer Engineering
[4] Institute for Research in Fundamental Sciences (IPM),undefined
[5] K.N.Toosi University of Technology,undefined
来源
Applied Intelligence | 2020年 / 50卷
关键词
Semi-supervised learning; Self-training; Data streams; Concept drift; Clustering algorithm;
D O I
暂无
中图分类号
学科分类号
摘要
Inthis article, wefocus on the classification problem to semi-supervised learning in non-stationary environment. Semi-supervised learning is a learning task from both labeled and unlabeled data points. There are several approaches to semi-supervised learning in stationary environment which are not applicable directly for data streams. We propose a novel semi-supervised learning algorithm, named STDS. The proposed approach uses labeled and unlabeled data and employs an approach to handle the concept drift in data streams. The main challenge in semi-supervised self-training for data streams is to find a proper selection metric in order to find a set of high-confidence predictions and a proper underlying base learner. We therefore propose an ensemble approach to find a set of high-confidence predictions based on clustering algorithms and classifier predictions. We then employ the Kullback-Leibler (KL) divergence approach to measure the distribution differences between sequential chunks in order to detect the concept drift. When drift is detected, a new classifier is updated from the new set of labeled data in the current chunk; otherwise, a percentage of high-confidence newly labeled data in the current chunk is added to the labeled data in the next chunk for updating the incremental classifier based on the proposed selection metric. The results of our experiments on a number of classification benchmark datasets show that STDS outperforms the supervised and the most of other semi-supervised learning methods.
引用
收藏
页码:1448 / 1467
页数:19
相关论文
共 50 条
  • [41] Semi-supervised Concept Preserving Hashing for Image Retrieval in Non-stationary Data Environment
    Tian, Xing
    Zhu, Dezhong
    Li, Qihua
    Ng, Wing W. Y.
    Xu, Chunlin
    PROCEEDINGS OF 2024 ACM ICMR WORKSHOP ON MULTIMODAL VIDEO RETRIEVAL, ICMR-MVR 2024, 2024, : 14 - 19
  • [42] Neural networks for online learning of non-stationary data streams: a review and application for smart grids flexibility improvement
    Zeineb Hammami
    Moamar Sayed-Mouchaweh
    Wiem Mouelhi
    Lamjed Ben Said
    Artificial Intelligence Review, 2020, 53 : 6111 - 6154
  • [43] Adaptive Incremental Gaussian Mixture Network for Non-Stationary Data Stream Classification
    Chamby-Diaz, Jorge C.
    Recamonde-Mendoza, Mariana
    Bazzan, Ana L. C.
    Grunitzki, Ricardo
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [44] Adaptive Drift Detection Mechanism for Non-Stationary Data Stream
    Nagendhiran, Nalini
    Kuppusamy, Lakshmanan
    JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2021, 20 (01)
  • [45] Combining instance selection and self-training to improve data stream quantification
    Maletzke A.G.
    dos Reis D.M.
    Batista G.E.A.P.A.
    Journal of the Brazilian Computer Society, 2018, 24 (01)
  • [46] Boosting Aspect Sentiment Quad Prediction by Data Augmentation and Self-Training
    Yu, Yongxin
    Zhao, Minyi
    Zhou, Shuigeng
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [47] An ensemble-based semi-supervised learning approach for non-stationary imbalanced data streams with label scarcity
    Abdi, Yousef
    Asadpour, Mohammad
    Feizi-Derakhshi, Mohammad-Reza
    APPLIED SOFT COMPUTING, 2024, 167
  • [48] A Semi-supervised Based Framework for Data Stream Classification in Non-Stationary Environments
    Gorgonio, Arthur Costa
    Canuto, Anne Magaly de P.
    Vale, Karliane M. O.
    Gorgonio, Flavius L.
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [49] Analysis of training data using clustering to improve semi-supervised self-training
    Piroonsup, N.
    Sinthupinyo, S.
    KNOWLEDGE-BASED SYSTEMS, 2018, 143 : 65 - 80
  • [50] An Exploration of Online Missing Value Imputation in Non-stationary Data Stream
    Dong W.
    Gao S.
    Yang X.
    Yu H.
    SN Computer Science, 2021, 2 (2)