Evolutionary Undersampling for Classification with Imbalanced Datasets: Proposals and Taxonomy

被引:280
|
作者
Garcia, Salvador [1 ]
Herrera, Francisco [1 ]
机构
[1] Univ Granada, Dept Comp Sci & Artificial Intelligence, E-18071 Granada, Spain
关键词
Classification; class imbalance problem; undersampling; prototype selection; evolutionary algorithms; FEATURE-SELECTION; ALGORITHMS; REDUCTION; SYSTEMS; MODELS; RULES; SETS;
D O I
10.1162/evco.2009.17.3.275
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning with imbalanced data is one of the recent challenges in machine learning. Various solutions have been proposed in order to find a treatment for this problem, such as modifying methods or the application of a preprocessing stage. Within the preprocessing focused oil balancing data, two tendencies exist: reduce the set of examples (undersampling) or replicate minority class examples (oversampling). Undersampling with imbalanced datasets could be considered as a prototype selection procedure with the purpose of balancing datasets to achieve a high classification rate, avoiding the bias toward majority class examples. Evolutionary algorithms have been used for classical prototype selection showing good results, where the fitness function is associated to the classification and reduction rates. In this paper, we propose a set of methods called evolutionary undersampling that take into consideration the nature of the problem and use different fitness functions for getting a good trade-off between balance of distribution of classes and performance. The study includes a taxonomy of the approaches and an overall comparison among our models and state of the art undersampling methods. The results have been contrasted by using nonparametric statistical procedures and show that evolutionary undersampling outperforms the nonevolutionary models when the degree of imbalance is increased.
引用
收藏
页码:275 / 306
页数:32
相关论文
共 50 条
  • [1] Evolutionary Undersampling for Imbalanced Big Data Classification
    Triguero, I.
    Galar, M.
    Vluymans, S.
    Cornelis, C.
    Bustince, H.
    Herrera, F.
    Saeys, Y.
    2015 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2015, : 715 - 722
  • [2] A Hybrid Surrogate Model for Evolutionary Undersampling in Imbalanced Classification
    Le, Hoang Lam
    Landa-Silva, Dario
    Galar, Mikel
    Garcia, Salvador
    Triguero, I
    2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
  • [3] GUM: A Guided Undersampling Method to Preprocess Imbalanced Datasets for Classification
    Sung, Kisuk
    Brown, W. Eric
    Moreno-Centeno, Erick
    Ding, Yu
    2022 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE), 2022, : 1086 - 1091
  • [4] Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy
    Krawczyk, Bartosz
    Galar, Mikel
    Jelen, Lukasz
    Herrera, Francisco
    APPLIED SOFT COMPUTING, 2016, 38 : 714 - 726
  • [5] Exploiting Prototypical Explanations for Undersampling Imbalanced Datasets
    Arslan, Yusuf
    Allix, Kevin
    Lefebvre, Clement
    Boytsov, Andrey
    Bissyand, Tegawende F.
    Klein, Jacques
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1449 - 1454
  • [6] Combination of Oversampling and Undersampling Techniques on Imbalanced Datasets
    Bansal, Ankita
    Verma, Ayush
    Singh, Sarabjot
    Jain, Yashonam
    INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, ICICC 2022, VOL 3, 2023, 492 : 647 - 656
  • [7] Evolutionary Undersampling for Extremely Imbalanced Big Data Classification under Apache Spark
    Triguero, I.
    Galar, M.
    Merino, D.
    Maillo, J.
    Bustince, H.
    Herrera, F.
    2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2016, : 640 - 647
  • [8] The Proposal of Undersampling Method for Learning from Imbalanced Datasets
    Bach, Malgorzata
    Werner, Aleksandra
    Palt, Mateusz
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KES 2019), 2019, 159 : 125 - 134
  • [9] EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification
    Hoang Lam Le
    Landa-Silva, Dario
    Galar, Mikel
    Garcia, Salvador
    Triguero, Isaac
    APPLIED SOFT COMPUTING, 2021, 101
  • [10] Quartiles based UnderSampling(QUS): A Simple and Novel Method to increase the Classification rate of positives in Imbalanced Datasets
    Veni, C. V. Krishna
    Rani, T. Sobha
    2017 NINTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION (ICAPR), 2017, : 121 - 126