Evolutionary under-sampling based bagging ensemble method for imbalanced data classification

被引:52
|
作者
Sun, Bo [1 ,2 ]
Chen, Haiyan [1 ,2 ]
Wang, Jiandong [1 ]
Xie, Hua [2 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 210016, Jiangsu, Peoples R China
[2] Nanjing Univ Aeronaut & Astronaut, Natl Key Lab ATFM, Nanjing 211106, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
class imbalanced problem; under-sampling; bagging; evolutionary under-sampling; ensemble learning; machine learning; data mining; SUPPORT VECTOR MACHINES; DATA-SETS; SMOTE; CLASSIFIERS; STRATEGIES;
D O I
10.1007/s11704-016-5306-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the class imbalanced learning scenario, traditional machine learning algorithms focusing on optimizing the overall accuracy tend to achieve poor classification performance especially for the minority class in which we are most interested. To solve this problem, many effective approaches have been proposed. Among them, the bagging ensemble methods with integration of the under-sampling techniques have demonstrated better performance than some other ones including the bagging ensemble methods integrated with the over-sampling techniques, the cost-sensitive methods, etc. Although these under-sampling techniques promote the diversity among the generated base classifiers with the help of random partition or sampling for the majority class, they do not take any measure to ensure the individual classification performance, consequently affecting the achievability of better ensemble performance. On the other hand, evolutionary under-sampling EUS as a novel undersampling technique has been successfully applied in searching for the best majority class subset for training a good-performance nearest neighbor classifier. Inspired by EUS, in this paper, we try to introduce it into the under-sampling bagging framework and propose an EUS based bagging ensemble method EUS-Bag by designing a new fitness function considering three factors to make EUS better suited to the framework. With our fitness function, EUS-Bag could generate a set of accurate and diverse base classifiers. To verify the effectiveness of EUS-Bag, we conduct a series of comparison experiments on 22 two-class imbalanced classification problems. Experimental results measured using recall, geometric mean and AUC all demonstrate its superior performance.
引用
收藏
页码:331 / 350
页数:20
相关论文
共 50 条
  • [41] Classification of pulsar signals using ensemble gradient boosting algorithms based on asymmetric under-sampling method
    Tariq, I
    Qiao, M.
    Wei, L.
    Yao, S.
    Zhou, C.
    Ali, Z.
    Azeem, S. W.
    Spanakis-Misirlis, A.
    JOURNAL OF INSTRUMENTATION, 2022, 17 (03):
  • [42] Similarity Majority Under-Sampling Technique for Easing Imbalanced Classification Problem
    Li, Jinyan
    Fong, Simon
    Hu, Shimin
    Wong, Raymond K.
    Mohammed, Sabah
    DATA MINING, AUSDM 2017, 2018, 845 : 3 - 23
  • [43] A Meta-Learning Method to Select Under-Sampling Algorithms for Imbalanced Data Sets
    de Morais, Romero F. A. B.
    Miranda, Pericles B. C.
    Silva, Ricardo M. A.
    PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 385 - 390
  • [44] Online Sequential Extreme Learning Machine with Under-Sampling and Over-Sampling for Imbalanced Big Data Classification
    Du, Jie
    Vong, Chi-Man
    Chang, Yajie
    Jiao, Yang
    PROCEEDINGS OF ELM-2016, 2018, 9 : 229 - 239
  • [45] Neighbourhood sampling in bagging for imbalanced data
    Blaszczynski, Jerzy
    Stefanowski, Jerzy
    NEUROCOMPUTING, 2015, 150 : 529 - 542
  • [46] Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers Under Imbalanced Data Sets
    Aridas, Christos K.
    Karlos, Stamatis
    Kanas, Vasileios G.
    Fazakis, Nikos
    Kotsiantis, Sotiris B.
    IEEE ACCESS, 2020, 8 : 2122 - 2133
  • [47] Feature Selection and Ensemble Hierarchical Cluster-based Under-sampling Approach for Extremely Imbalanced Datasets
    Soltani, Sima
    Sadri, Javad
    Torshizi, Hassan Ahmadi
    2011 1ST INTERNATIONAL ECONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2011, : 166 - 171
  • [48] Cluster-based Under-sampling with Random Forest for Multi-Class Imbalanced Classification
    Arafat, Md. Yasir
    Hoque, Sabera
    Farid, Dewan Md.
    2017 11TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2017,
  • [49] A Hybrid MultiLayer Perceptron Under-Sampling with Bagging Dealing with a Real-Life Imbalanced Rice Dataset
    Diallo, Moussa
    Xiong, Shengwu
    Emiru, Eshete Derb
    Fesseha, Awet
    Abdulsalami, Aminu Onimisi
    Abd Elaziz, Mohamed
    INFORMATION, 2021, 12 (08)
  • [50] Multi-granularity relabeled under-sampling algorithm for imbalanced data
    Dai, Qi
    Liu, Jian-wei
    Liu, Yang
    APPLIED SOFT COMPUTING, 2022, 124