Evolutionary Undersampling for Classification with Imbalanced Datasets: Proposals and Taxonomy

被引:282
作者
Garcia, Salvador [1 ]
Herrera, Francisco [1 ]
机构
[1] Univ Granada, Dept Comp Sci & Artificial Intelligence, E-18071 Granada, Spain
关键词
Classification; class imbalance problem; undersampling; prototype selection; evolutionary algorithms; FEATURE-SELECTION; ALGORITHMS; REDUCTION; SYSTEMS; MODELS; RULES; SETS;
D O I
10.1162/evco.2009.17.3.275
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning with imbalanced data is one of the recent challenges in machine learning. Various solutions have been proposed in order to find a treatment for this problem, such as modifying methods or the application of a preprocessing stage. Within the preprocessing focused oil balancing data, two tendencies exist: reduce the set of examples (undersampling) or replicate minority class examples (oversampling). Undersampling with imbalanced datasets could be considered as a prototype selection procedure with the purpose of balancing datasets to achieve a high classification rate, avoiding the bias toward majority class examples. Evolutionary algorithms have been used for classical prototype selection showing good results, where the fitness function is associated to the classification and reduction rates. In this paper, we propose a set of methods called evolutionary undersampling that take into consideration the nature of the problem and use different fitness functions for getting a good trade-off between balance of distribution of classes and performance. The study includes a taxonomy of the approaches and an overall comparison among our models and state of the art undersampling methods. The results have been contrasted by using nonparametric statistical procedures and show that evolutionary undersampling outperforms the nonevolutionary models when the degree of imbalance is increased.
引用
收藏
页码:275 / 306
页数:32
相关论文
共 50 条
  • [11] Anomaly detection-based undersampling for imbalanced classification problems
    Park, You-Jin
    Brito, Paula
    Ma, Yun-Chen
    ENGINEERING OPTIMIZATION, 2024, 56 (12) : 2565 - 2578
  • [12] Improvement of Bagging performance for classification of imbalanced datasets using evolutionary multi-objective optimization
    Roshan, Seyed Ehsan
    Asadi, Shahrokh
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 87
  • [13] Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling
    Luengo, Julian
    Fernandez, Alberto
    Garcia, Salvador
    Herrera, Francisco
    SOFT COMPUTING, 2011, 15 (10) : 1909 - 1936
  • [14] An Approach for Mining Imbalanced Datasets Combining Specialized Oversampling and Undersampling Methods
    Jedrzejowicz, Joanna
    Jedrzejowicz, Piotr
    IEEE ACCESS, 2023, 11 : 136782 - 136792
  • [15] Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling
    Julián Luengo
    Alberto Fernández
    Salvador García
    Francisco Herrera
    Soft Computing, 2011, 15 : 1909 - 1936
  • [16] Relevant information undersampling to support imbalanced data classification
    Hoyos-Osorio, J.
    Alvarez-Meza, A.
    Daza-Santacoloma, G.
    Orozco-Gutierrez, A.
    Castellanos-Dominguez, G.
    NEUROCOMPUTING, 2021, 436 : 136 - 146
  • [17] A First Attempt on Global Evolutionary Undersampling for Imbalanced Big Data
    Triguero, I.
    Galar, M.
    Bustince, H.
    Herrera, F.
    2017 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2017, : 2054 - 2061
  • [18] RETRACTED: The Use of Hellinger Distance Undersampling Model to Improve the Classification of Disease Class in Imbalanced Medical Datasets (Retracted Article)
    Al-Shamaa, Zina Z. R.
    Kurnaz, Sefer
    Duru, Adil Deniz
    Peppa, Nadia
    Mirnezami, Alex H.
    Hamady, Zaed Z. R.
    APPLIED BIONICS AND BIOMECHANICS, 2020, 2020
  • [19] Undersampling Instance Selection for Hybrid and Incomplete Imbalanced Data
    Camacho-Nieto, Oscar
    Yanez-Marquez, Cornelio
    Villuendas-Rey, Yenny
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2020, 26 (06) : 698 - 719
  • [20] A Generalized Optimization Embedded Framework of Undersampling Ensembles for Imbalanced Classification
    Guan, Hongjiao
    Zhang, Yingtao
    Ma, Bin
    Li, Jian
    Wang, Chunpeng
    2021 IEEE 8TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2021,