EMRIL: Ensemble Method based on ReInforcement Learning for binary classification in imbalanced drifting data streams

被引:1
作者
Usman, Muhammad [1 ]
Chen, Huanhuan [1 ]
机构
[1] Univ Sci & Technol China, 96 JinZhai Rd, Hefei 230026, Anhui, Peoples R China
关键词
Data stream classification; Imbalance; Concept drift; Ensemble learning; Reinforcement Learning;
D O I
10.1016/j.neucom.2024.128259
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The co-occurrence of evolving concepts and imbalanced data deteriorates the learning performance of classifiers in data streams. Recent studies do not account for data difficulty factors associated with class imbalance, i.e. imbalance complexity, complicating the imbalance learning under a drifting data environment. This paper proposes EMRIL, a novel batch-based ensemble method, to deal with this challenge. As part of EMRIL, Imbalance Complexity Redressing Component ( EMRIL ICRC ), a data-level balancing module, resolves the imbalance complexity to increase minority class visibility for the base classifiers of the ensemble. Additionally, a novel ensemble pool management ( EMRIL EPM ) technique is designed using Reinforcement Learning (RL). EMRILEPM EPM regularly updates the ensemble pool and constructs an optimal base classifier subset for predictions through effective training and evaluation policies. Handling imbalance complexity, and RL-based ensemble pool management helps EMRIL to effectively perform the binary classification task in imbalanced and evolving data streams. A comprehensive experimental evaluation is conducted with 104 data streams which contain a variety of concept drifts and imbalance ratios categorized by various data difficulty factors. The results are compared with 15 state-of-the-art methods showing the superiority of the proposed method.
引用
收藏
页数:22
相关论文
共 85 条
[1]   A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework [J].
Aguiar, Gabriel ;
Krawczyk, Bartosz ;
Cano, Alberto .
MACHINE LEARNING, 2024, 113 (07) :4165-4243
[2]   Just-in-time adaptive classifiers - Part I: Detecting nonstationary changes [J].
Alippi, Cesare ;
Roveri, Manuel .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2008, 19 (07) :1145-1153
[3]   Adapting dynamic classifier selection for concept drift [J].
Almeida, Paulo R. L. ;
Oliveira, Luiz S. ;
Britto, Alceu S., Jr. ;
Sabourin, Robert .
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 104 :67-85
[4]   An Investigation of SMOTE Based Methods for Imbalanced Datasets With Data Complexity Analysis [J].
Azhar, Nur Athirah ;
Pozi, Muhammad Syafiq Mohd ;
Din, Aniza Mohamed ;
Jatowt, Adam .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (07) :6651-6672
[5]   Data stream analysis: Foundations, major tasks and tools [J].
Bahri, Maroua ;
Bifet, Albert ;
Gama, Joao ;
Gomes, Heitor Murilo ;
Maniu, Silviu .
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2021, 11 (03)
[6]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[7]  
Batista GEAPA, 2004, ACM SIGKDD Explor Newsl, V6, P20, DOI [10.1145/1007730.1007735, DOI 10.1145/1007730.1007735]
[8]   SMOTE-OB: Combining SMOTE and Online Bagging for Continuous Rebalancing of Evolving Data Streams [J].
Bernardo, Alessio ;
Della Valle, Emanuele .
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, :5033-5042
[9]   VFC-SMOTE: very fast continuous synthetic minority oversampling for evolving data streams [J].
Bernardo, Alessio ;
Della Valle, Emanuele .
DATA MINING AND KNOWLEDGE DISCOVERY, 2021, 35 (06) :2679-2713
[10]   C-SMOTE: Continuous Synthetic Minority Oversampling for Evolving Data Streams [J].
Bernardo, Alessio ;
Gomes, Heitor Murilo ;
Montiel, Jacob ;
Pfahringer, Bernhard ;
Bifet, Albert ;
Della Valle, Emanuele .
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, :483-492