No Free Lunch in imbalanced learning

被引:18
作者
Moniz, Nuno [1 ,2 ]
Monteiro, Hugo [3 ,4 ]
机构
[1] INESC TEC, Porto, Portugal
[2] Univ Porto, Fac Sci, Dept Comp Sci, Porto, Portugal
[3] Univ Porto, Inst Philosophy, Porto, Portugal
[4] Ctr Res & Innovat Educ, Porto, Portugal
关键词
Supervised learning; Imbalanced domain learning; No Free Lunch; A-PRIORI DISTINCTIONS; CLASSIFICATION; SMOTE;
D O I
10.1016/j.knosys.2021.107222
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The No Free Lunch (NFL) theorems have sparked intense debate since their publication, from theoretical and practical perspectives. However, to this date, no discussion is provided concerning its impact in the established field of imbalanced domain learning (IDL), known for its challenges regarding learning and evaluation processes. Most importantly, understanding the effect of commonly used solutions in such a field would prove very useful for future research. In this paper, we study the impact of data preprocessing methods, also known as resampling strategies, under the framework of the NFL theorems. Focusing on binary classification tasks, we claim that in IDL settings, when given a learning algorithm and a uniformly distributed set of target functions, the core conclusions of the NFL theorems are extensible to resampling strategies. As such, given no a priori knowledge or assumptions concerning data domains, any two resampling strategies are identical concerning their impact in the performance of predictive models. We provide a theoretical analysis and discussion on the intersection between IDL and the NFL theorems to support such a claim. Also, we collect empirical evidence via a thorough experimental study, including 98 data sets from multiple real-world knowledge domains. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:9
相关论文
共 54 条
  • [1] Ahn G., 2020, J CLASSIF, P1
  • [2] MFC-GAN: Class-imbalanced dataset classification using Multiple Fake Class Generative Adversarial Network
    Ali-Gombe, Adamu
    Elyan, Eyad
    [J]. NEUROCOMPUTING, 2019, 361 : 212 - 221
  • [3] [Anonymous], P ICML 2005 WORKSH M
  • [4] [Anonymous], 1994, MACHINE LEARNING P 1, DOI DOI 10.1016/B978-1-55860-335-6.50039-8
  • [5] [Anonymous], 2005, Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques, DOI DOI 10.1007/0-387-28356-0_11
  • [6] Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers Under Imbalanced Data Sets
    Aridas, Christos K.
    Karlos, Stamatis
    Kanas, Vasileios G.
    Fazakis, Nikos
    Kotsiantis, Sotiris B.
    [J]. IEEE ACCESS, 2020, 8 : 2122 - 2133
  • [7] LoRAS: an oversampling approach for imbalanced datasets
    Bej, Saptarshi
    Davtyan, Narek
    Wolfien, Markus
    Nassar, Mariam
    Wolkenhauer, Olaf
    [J]. MACHINE LEARNING, 2021, 110 (02) : 279 - 301
  • [8] Benavoli A, 2014, PR MACH LEARN RES, V32, P1026
  • [9] Benavoli A, 2017, J MACH LEARN RES, V18
  • [10] Bertorello P. R., 2019, SMATE SYNTHETIC MINO, DOI [10.2139/ssrn.3501279, DOI 10.2139/SSRN.3501279]