Dynamic ensemble selection classification algorithm based on window over imbalanced drift data stream

被引:8
|
作者
Han, Meng [1 ]
Zhang, Xilong [1 ]
Chen, Zhiqiang [1 ]
Wu, Hongxin [1 ]
Li, Muhang [1 ]
机构
[1] North Minzu Univ, Sch Comp Sci & Engn, Yinchuan, Ningxia, Peoples R China
关键词
Data stream; Imbalance data; Concept drift; Window sampling; Ensemble classification;
D O I
10.1007/s10115-022-01791-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data stream classification is an important research direction in the field of data mining, but in many practical applications, it is impossible to collect the complete training set at one time, and the data may be in an imbalanced state and interspersed with concept drift, which will greatly affect the classification performance. To this end, an online dynamic ensemble selection classification algorithm based on window over imbalanced drift data stream (DESW-ID) is proposed. The algorithm employs various balancing measures, first resampling the data stream using Poisson distribution, and if it is in a highly imbalanced state then secondary sampling is performed using a window storing a minority class instances to achieve the current balanced state of the data. To improve the processing efficiency of the algorithm, a classifier selection ensemble is proposed to dynamically adjust the number of classifiers, and the algorithm runs with an ADWIN detector to detect the presence of concept drift. The experimental results show that the proposed algorithm ranks first on average in all five classification performance metrics compared to the state-of-the-art methods. Therefore, the proposed algorithm has better classification performance for imbalanced data streams with concept drift and also improves the operation efficiency of the algorithm.
引用
收藏
页码:1105 / 1128
页数:24
相关论文
共 50 条
  • [31] Imbalanced Network Traffic Classification based on Ensemble Feature Selection
    Ding, Yaojun
    2016 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (ICSPCC), 2016,
  • [32] Over-sampling algorithm for imbalanced data classification
    Xu Xiaolong
    Chen Wen
    Sun Yanfei
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2019, 30 (06) : 1182 - 1191
  • [33] Ensemble Learning with Active Example Selection for Imbalanced Biomedical Data Classification
    Oh, Sangyoon
    Lee, Min Su
    Zhang, Byoung-Tak
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2011, 8 (02) : 316 - 325
  • [34] Over-sampling algorithm for imbalanced data classification
    XU Xiaolong
    CHEN Wen
    SUN Yanfei
    JournalofSystemsEngineeringandElectronics, 2019, 30 (06) : 1182 - 1191
  • [35] An Approach to Imbalanced Data Classification Based on Instance Selection and Over-Sampling
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, PT I, 2019, 11683 : 601 - 610
  • [36] Iterative ensemble feature selection for multiclass classification of imbalanced microarray data
    Yang, Junshan
    Zhou, Jiarui
    Zhu, Zexuan
    Ma, Xiaoliang
    Ji, Zhen
    JOURNAL OF BIOLOGICAL RESEARCH-THESSALONIKI, 2016, 23
  • [37] Feature Selection for Handling Concept Drift in the Data Stream Classification
    Turkov, Pavel
    Krasotkina, Olga
    Mottl, Vadim
    Sychugov, Alexey
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION (MLDM 2016), 2016, 9729 : 614 - 629
  • [38] Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data
    Li Yijing
    Guo Haixiang
    Liu Xiao
    Li Yanan
    Li Jinling
    KNOWLEDGE-BASED SYSTEMS, 2016, 94 : 88 - 104
  • [39] A SURVEY OF ENSEMBLE CLASSIFICATION OVER CONCEPT DRIFT DATA STREAMS
    Du, Shiyu
    Han, Meng
    Shen, Mingyao
    Zhang, Chunyan
    Sun, Rui
    Gao, Tianji
    JOURNAL OF NONLINEAR AND CONVEX ANALYSIS, 2020, 21 (07) : 1567 - 1579
  • [40] An Imbalanced Data Classification Algorithm Based on Boosting
    Li Qiu-Jie
    Mao Yao-Bin
    Wang Zhi-Quan
    2011 30TH CHINESE CONTROL CONFERENCE (CCC), 2011, : 3053 - 3057