Active Learning with Abstaining Classifiers for Imbalanced Drifting Data Streams

被引:0
|
作者
Korycki, Lukasz [1 ]
Cano, Alberto [1 ]
Krawczyk, Bartosz [1 ]
机构
[1] Virginia Commonwealth Univ, Dept Comp Sci, Richmond, VA 23284 USA
来源
2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2019年
关键词
machine learning; data stream mining; imbalanced data; active learning; ensemble learning; RESAMPLING ENSEMBLE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning from data streams is one of the most promising and challenging domains in modern machine learning. Proliferating online data sources provide us access to real-time knowledge we have never had before. At the same time, new obstacles emerge and we have to overcome them in order to fully and effectively utilize the potential of the data. Prohibitive time and memory constraints or non-stationary distributions are only some of the problems. When dealing with classification tasks, one has to remember that effective adaptation has to be achieved on weak foundations of partially labeled and often imbalanced data. In our work, we propose an online framework for binary classification, that aims to handle the complex problem of working with dynamic, sparsely labeled and imbalanced streams. The main part of it is a novel active learning strategy (MD-OAL) that is able to prioritize labeling of minority instances and, as a result, improve the balance of the learning process. We combine the strategy with a dynamic ensemble of base learners that can abstain from making decisions, if they are very uncertain. We adjust the abstaining mechanism in favor of minority instances, providing an effective method for handling remaining imbalance and a concept drift simultaneously. The conducted evaluation shows that in the challenging and realistic scenarios our framework outperforms state-of-the-art algorithms, providing higher resilience to the combined effect of limited labeling and imbalance.
引用
收藏
页码:2334 / 2343
页数:10
相关论文
共 50 条
  • [31] Learning the hard-to-learn: Active learning for imbalanced datasets in data-centric tunnel engineering
    Yuan, Xiao
    Wang, Shuying
    Qu, Tongming
    Feng, Huanhuan
    Liu, Pengfei
    Zeng, Junhao
    Chen, Xiangsheng
    COMPUTERS AND GEOTECHNICS, 2024, 174
  • [32] Adaptive Learning in Imbalanced Data Streams With Unpredictable Feature Evolution
    Tu, Jiahang
    Tang, Xijia
    Gu, Shilin
    Dai, Yucong
    Fan, Ruidong
    Hou, Chenping
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (04) : 1527 - 1541
  • [33] Online Asymmetric Active Learning with Imbalanced Data
    Zhang, Xiaoxuan
    Yang, Tianbao
    Srinivasan, Padmini
    KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 2055 - 2064
  • [34] Tensor Decision Trees for Continual Learning from Drifting Data Streams
    Krawczyk, Bartosz
    2021 IEEE 8TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2021,
  • [35] Tensor decision trees for continual learning from drifting data streams
    Bartosz Krawczyk
    Machine Learning, 2021, 110 : 3015 - 3035
  • [36] Tensor decision trees for continual learning from drifting data streams
    Krawczyk, Bartosz
    MACHINE LEARNING, 2021, 110 (11-12) : 3015 - 3035
  • [37] A Performance Analysis of Classifiers on Imbalanced Data
    Garcia, Nathan F.
    Strzoda, Romulo A.
    Lucca, Giancarlo
    Borges, Eduardo N.
    ICEIS: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS - VOL 1, 2022, : 602 - 609
  • [38] Evidential Combination of Classifiers for Imbalanced Data
    Niu, Jiawei
    Liu, Zhunga
    Lu, Yao
    Wen, Zaidao
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (12): : 7642 - 7653
  • [39] Imbalanced Data Problem in Machine Learning: A Review
    Altalhan, Manahel
    Algarni, Abdulmohsen
    Alouane, Monia Turki-Hadj
    IEEE ACCESS, 2025, 13 : 13686 - 13699
  • [40] Online cost-sensitive neural network classifiers for non-stationary and imbalanced data streams
    Ghazikhani, Adel
    Monsefi, Reza
    Yazdi, Hadi Sadoghi
    NEURAL COMPUTING & APPLICATIONS, 2013, 23 (05): : 1283 - 1295