ROSE: robust online self-adjusting ensemble for continual learning on imbalanced drifting data streams

被引:74
作者
Cano, Alberto [1 ]
Krawczyk, Bartosz [1 ]
机构
[1] Virginia Commonwealth Univ, Dept Comp Sci, 401 Main St ERB2314, Richmond, VA 23284 USA
关键词
Data streams; Concept drift; Online learning; Continual learning; Imbalanced data; DYNAMIC WEIGHTED MAJORITY; EXPERIENCE REPLAY; CLASSIFIERS; SELECTION;
D O I
10.1007/s10994-022-06168-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data streams are potentially unbounded sequences of instances arriving over time to a classifier. Designing algorithms that are capable of dealing with massive, rapidly arriving information is one of the most dynamically developing areas of machine learning. Such learners must be able to deal with a phenomenon known as concept drift, where the data stream may be subject to various changes in its characteristics over time. Furthermore, distributions of classes may evolve over time, leading to a highly difficult non-stationary class imbalance. In this work we introduce Robust Online Self-Adjusting Ensemble (ROSE), a novel online ensemble classifier capable of dealing with all of the mentioned challenges. The main features of ROSE are: (1) online training of base classifiers on variable size random subsets of features; (2) online detection of concept drift and creation of a background ensemble for faster adaptation to changes; (3) sliding window per class to create skew-insensitive classifiers regardless of the current imbalance ratio; and (4) self-adjusting bagging to enhance the exposure of difficult instances from minority classes. The interplay among these features leads to an improved performance in various data stream mining benchmarks. An extensive experimental study comparing with 30 ensemble classifiers shows that ROSE is a robust and well-rounded classifier for drifting imbalanced data streams, especially under the presence of noise and class imbalance drift, while maintaining competitive time complexity and memory consumption. Results are supported by a thorough non-parametric statistical analysis.
引用
收藏
页码:2561 / 2599
页数:39
相关论文
共 84 条
[1]   Drift-Aware Multi-Memory Model for Imbalanced Data Streams [J].
Abolfazli, Amir ;
Ntoutsi, Eirini .
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, :878-885
[2]   An effective density-based clustering and dynamic maintenance framework for evolving medical data streams [J].
Al-Shammari, Ahmed ;
Zhou, Rui ;
Naseriparsaa, Mehdi ;
Liu, Chengfei .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 126 :176-186
[3]   Task-Free Continual Learning [J].
Aljundi, Rahaf ;
Kelchtermans, Klaas ;
Tuytelaars, Tinne .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :11246-11255
[4]   A Study on Imbalanced Data Streams [J].
Aminian, Ehsan ;
Ribeiro, Rita P. ;
Gama, Joao .
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 :380-389
[5]   A novel approach using incremental oversampling for data stream mining [J].
Anupama, N. ;
Jena, Sudarson .
EVOLVING SYSTEMS, 2019, 10 (03) :351-362
[6]   Data stream analysis: Foundations, major tasks and tools [J].
Bahri, Maroua ;
Bifet, Albert ;
Gama, Joao ;
Gomes, Heitor Murilo ;
Maniu, Silviu .
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2021, 11 (03)
[7]   C-SMOTE: Continuous Synthetic Minority Oversampling for Evolving Data Streams [J].
Bernardo, Alessio ;
Gomes, Heitor Murilo ;
Montiel, Jacob ;
Pfahringer, Bernhard ;
Bifet, Albert ;
Della Valle, Emanuele .
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, :483-492
[8]   Incremental Rebalancing Learning on Evolving Data Streams [J].
Bernardo, Alessio ;
Valle, Emanuele Della ;
Bifet, Albert .
20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2020), 2020, :844-850
[9]  
Bifet A., 2019, EUR S ART NEUR NETW
[10]  
Bifet A, 2010, LECT NOTES ARTIF INT, V6321, P135, DOI 10.1007/978-3-642-15880-3_15