SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary

被引：1198

作者：

Fernandez, Alberto ^{[1
]}

Garcia, Salvador ^{[1
]}

Herrera, Francisco ^{[1
]}

Chawla, Nitesh V. ^{[2
,3
]}

机构：

[1] Univ Granada, Dept Comp Sci & Artificial Intelligence, Granada, Spain

[2] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA

[3] Univ Notre Dame, Interdisciplinary Ctr Network Sci & Applicat, Notre Dame, IN 46556 USA

来源：

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH | 2018年 / 61卷

基金：

美国国家科学基金会;

关键词：

OVER-SAMPLING APPROACH; FEATURE-SELECTION; BIG DATA; DATA-SETS; SVM CLASSIFICATION; DATA GENERATION; MINORITY CLASS; ALGORITHM; FRAMEWORK; PERFORMANCE;

D O I：

10.1613/jair.1.11192

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered "de facto" standard in the framework of learning from imbalanced data. This is due to its simplicity in the design of the procedure, as well as its robustness when applied to different type of problems. Since its publication in 2002, SMOTE has proven successful in a variety of applications from several different domains. SMOTE has also inspired several approaches to counter the issue of class imbalance, and has also significantly contributed to new supervised learning paradigms, including multilabel classification, incremental learning, semi-supervised learning, multi-instance learning, among others. It is standard benchmark for learning from imbalanced data. It is also featured in a number of different software packages - from open source to commercial. In this paper, marking the fifteen year anniversary of SMOTE, we reflect on the SMOTE journey, discuss the current state of affairs with SMOTE, its applications, and also identify the next set of challenges to extend SMOTE for Big Data problems.

引用

页码：863 / 905

页数：43

共 237 条

[1] To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques [J].

Abdi, Lida ;

Hashemi, Sattar .

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) :238-251

[2] An Efficient Over-sampling Approach Based on Mean Square Error Back-propagation for Dealing with the Multi-class Imbalance Problem [J].

Alejo, R. ;

Garcia, V. ;

Pacheco-Sanchez, J. H. .

NEURAL PROCESSING LETTERS, 2015, 42 (03) :603-617

[3] NEATER: filtering of over-sampled data using non-cooperative game theory [J].

Almogahed, B. A. ;

Kakadiaris, I. A. .

SOFT COMPUTING, 2015, 19 (11) :3301-3322

[4] AN IMPROVED ALGORITHM FOR NEURAL-NETWORK CLASSIFICATION OF IMBALANCED TRAINING SETS [J].

ANAND, R ;

MEHROTRA, KG ;

MOHAN, CK ;

RANKA, S .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 1993, 4 (06) :962-969

[5]

[Anonymous], 2004, ACM SIGKDD Explor. Newsl.

[6]

[Anonymous], 1996, TECH REP

[7]

[Anonymous], 2012, P 18 ACM SIGKDD INT, DOI [10.1145/2339530.2339558, DOI 10.1145/2339530.2339558]

[8] Introduction to semi-supervised learning [J].

Goldberg, Xiaojin .

Synthesis Lectures on Artificial Intelligence and Machine Learning, 2009, 6 :1-116

[9]

[Anonymous], 2013, FRONT PHYS, DOI DOI 10.3389/FPED.2013.00043

[10]

[Anonymous], 2010, P 16 ACM SIGKDD INT

← 1 2 3 4 5 6 7 8 9 10 →