No Free Lunch in imbalanced learning

被引：18

作者：

Moniz, Nuno ^{[1
,2
]}

Monteiro, Hugo ^{[3
,4
]}

机构：

[1] INESC TEC, Porto, Portugal

[2] Univ Porto, Fac Sci, Dept Comp Sci, Porto, Portugal

[3] Univ Porto, Inst Philosophy, Porto, Portugal

[4] Ctr Res & Innovat Educ, Porto, Portugal

来源：

KNOWLEDGE-BASED SYSTEMS | 2021年 / 227卷

关键词：

Supervised learning; Imbalanced domain learning; No Free Lunch; A-PRIORI DISTINCTIONS; CLASSIFICATION; SMOTE;

D O I：

10.1016/j.knosys.2021.107222

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The No Free Lunch (NFL) theorems have sparked intense debate since their publication, from theoretical and practical perspectives. However, to this date, no discussion is provided concerning its impact in the established field of imbalanced domain learning (IDL), known for its challenges regarding learning and evaluation processes. Most importantly, understanding the effect of commonly used solutions in such a field would prove very useful for future research. In this paper, we study the impact of data preprocessing methods, also known as resampling strategies, under the framework of the NFL theorems. Focusing on binary classification tasks, we claim that in IDL settings, when given a learning algorithm and a uniformly distributed set of target functions, the core conclusions of the NFL theorems are extensible to resampling strategies. As such, given no a priori knowledge or assumptions concerning data domains, any two resampling strategies are identical concerning their impact in the performance of predictive models. We provide a theoretical analysis and discussion on the intersection between IDL and the NFL theorems to support such a claim. Also, we collect empirical evidence via a thorough experimental study, including 98 data sets from multiple real-world knowledge domains. (C) 2021 Elsevier B.V. All rights reserved.

引用

页数：9

共 54 条

[1] Ahn G., 2020, J CLASSIF, P1
[2] MFC-GAN: Class-imbalanced dataset classification using Multiple Fake Class Generative Adversarial Network
Ali-Gombe, Adamu
Elyan, Eyad
[J]. NEUROCOMPUTING, 2019, 361 : 212 - 221
[3] [Anonymous], P ICML 2005 WORKSH M
[4] [Anonymous], 1994, MACHINE LEARNING P 1, DOI DOI 10.1016/B978-1-55860-335-6.50039-8
[5] [Anonymous], 2005, Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques, DOI DOI 10.1007/0-387-28356-0_11
[6] Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers Under Imbalanced Data Sets
Aridas, Christos K.
Karlos, Stamatis
Kanas, Vasileios G.
Fazakis, Nikos
Kotsiantis, Sotiris B.
[J]. IEEE ACCESS, 2020, 8 : 2122 - 2133
[7] LoRAS: an oversampling approach for imbalanced datasets
Bej, Saptarshi
Davtyan, Narek
Wolfien, Markus
Nassar, Mariam
Wolkenhauer, Olaf
[J]. MACHINE LEARNING, 2021, 110 (02) : 279 - 301
[8] Benavoli A, 2014, PR MACH LEARN RES, V32, P1026
[9] Benavoli A, 2017, J MACH LEARN RES, V18
[10] Bertorello P. R., 2019, SMATE SYNTHETIC MINO, DOI [10.2139/ssrn.3501279, DOI 10.2139/SSRN.3501279]

← 1 2 3 4 5 6 →