Incremental Learning of Concept Drift from Streaming Imbalanced Data

被引:293
作者
Ditzler, Gregory [1 ]
Polikar, Robi [2 ]
机构
[1] Drexel Univ, Dept Elect & Comp Engn, Philadelphia, PA 19104 USA
[2] Rowan Univ, Dept Elect & Comp Engn, Glassboro, NJ 08028 USA
基金
美国国家科学基金会;
关键词
Incremental learning; concept drift; class imbalance; multiple classifier systems; TIME ADAPTIVE CLASSIFIERS; ENSEMBLE; MODELS;
D O I
10.1109/TKDE.2012.136
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning in nonstationary environments, also known as learning concept drift, is concerned with learning from data whose statistical characteristics change over time. Concept drift is further complicated if the data set is class imbalanced. While these two issues have been independently addressed, their joint treatment has been mostly underexplored. We describe two ensemble-based approaches for learning concept drift from imbalanced data. Our first approach is a logical combination of our previously introduced Learn(++).NSE algorithm for concept drift, with the well-established SMOTE for learning from imbalanced data. Our second approach makes two major modifications to Learn(++).NSE-SMOTE integration by replacing SMOTE with a subensemble that makes strategic use of minority class data; and replacing Learn(++).NSE and its class-independent error weighting mechanism with a penalty constraint that forces the algorithm to balance accuracy on all classes. The primary novelty of this approach is in determining the voting weights for combining ensemble members, based on each classifier's time and imbalance-adjusted accuracy on current and past environments. Favorable results in comparison to other approaches indicate that both approaches are able to address this challenging problem, each with its own specific areas of strength. We also release all experimental data as a resource and benchmark for future research.
引用
收藏
页码:2283 / 2301
页数:19
相关论文
共 64 条
[1]   Classification Using Streaming Random Forests [J].
Abdulsalam, Hanady ;
Skillicorn, David B. ;
Martin, Patrick .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (01) :22-36
[2]  
Alippi C., 2009, Proceedings 2009 International Joint Conference on Neural Networks (IJCNN 2009 - Atlanta), P114, DOI 10.1109/IJCNN.2009.5178799
[3]  
Alippi C., 2010, INT JOINT C NEUR NET, P1190
[4]   Just-in-time adaptive classifiers - Part I: Detecting nonstationary changes [J].
Alippi, Cesare ;
Roveri, Manuel .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2008, 19 (07) :1145-1153
[5]   Just-in-Time Adaptive Classifiers-Part II: Designing the Classifier [J].
Alippi, Cesare ;
Roveri, Manuel .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2008, 19 (12) :2053-2064
[6]  
[Anonymous], 2009, THESIS U POLITECNICA
[7]  
[Anonymous], 2007, P 45 ANN SE REG C, DOI DOI 10.1145/1233341.1233378
[8]  
Baena-Garcia M, 2006, 4 INT WORKSH KNOWL D, V6, P77
[9]   FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning [J].
Batuwita, Rukshan ;
Palade, Vasile .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2010, 18 (03) :558-571
[10]  
Bifet A., 2011, MOA MASSIVE ONLINE A