Two-Stage Cost-Sensitive Learning for Data Streams With Concept Drift and Class Imbalance

被引:22
作者
Sun, Yange [1 ,2 ]
Sun, Yi [3 ]
Dai, Honghua [4 ,5 ]
机构
[1] Xinyang Normal Univ, Sch Comp & Informat Technol, Xinyang 464000, Peoples R China
[2] Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing 100044, Peoples R China
[3] Zhengzhou Informat Sci & Technol Inst, Zhengzhou 450004, Peoples R China
[4] Deakin Univ, Inst Intelligent Syst & Innovat, Melbourne, Vic 3125, Australia
[5] Zhengzhou Univ, Cooperat Innovat Ctr Internet Healthcare, Zhengzhou 450000, Peoples R China
基金
中国国家自然科学基金;
关键词
Classification algorithms; Data mining; Data models; Adaptation models; Vegetation; Feature extraction; Heuristic algorithms; Data streams; classification; class imbalance; concept drift; cost-sensitive; ensemble learning; STATISTICAL COMPARISONS; ONLINE; CLASSIFICATION;
D O I
10.1109/ACCESS.2020.3031603
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most methods for classifying data streams operate under the hypothesis that the distribution of classes is balanced. Unfortunately, the phenomenon of class imbalance widely exists in many real-world applications. In addition, the underlying concept of data stream may change in a certain way over time, and attacks increase the difficulty of data stream mining. Motivated by this challenge, a Two-Stage Cost-Sensitive (TSCS) classification is proposed for addressing the class imbalance issue in non-stationary data streams. We propose a novel two-stage cost-sensitive framework for data stream classification by utilizing cost information in both feature selection stage and classification stage. Moreover, a window adaptation and drift detection mechanism, which guarantees that an ensemble can adapt promptly to concept drift, is embedded in our method. Our algorithm is compared with competitive algorithms on different kinds of datasets. The result demonstrates that TSCS obtains significant improvement in terms of class imbalance data stream metrics.
引用
收藏
页码:191942 / 191955
页数:14
相关论文
共 67 条
  • [1] Principal component analysis
    Abdi, Herve
    Williams, Lynne J.
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (04): : 433 - 459
  • [2] Aggarwal Charu C, 2007, Data Streams: Models and Algorithms, V31
  • [3] [Anonymous], 2015, THESIS
  • [4] [Anonymous], 2014, P 29 ANN ACM S APPL
  • [5] [Anonymous], 2004, COMPUTER SCI
  • [6] [Anonymous], 2018, EVOL SYST-GER, DOI DOI 10.1007/s12530-016-9168-2
  • [7] Baena-Garcia M., 2006, P 4 INT WORKSH KNOWL, V6, P77
  • [8] Bifet A., 2009, P 8 INT S INT DAT AN, P246
  • [9] Bifet A, 2010, J MACH LEARN RES, V11, P1601
  • [10] Bifet A, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P139