PF-SMOTE: A novel parameter-free SMOTE for imbalanced datasets

被引:41
作者
Chen, Qiong [1 ]
Zhang, Zhong-Liang [1 ,2 ,3 ]
Huang, Wen-Po [1 ]
Wu, Jian [1 ,3 ]
Luo, Xing-Gang [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Management, Hangzhou 310018, Peoples R China
[2] Shanghai Jiao Tong Univ, Antai Coll Econ & Management, Shanghai 200030, Peoples R China
[3] Hangzhou Dianzi Univ, Res Ctr Youth Publ Opin Zhejiang, Hangzhou 310018, Peoples R China
基金
美国国家科学基金会;
关键词
Imbalanced datasets; Data preprocessing; SMOTE; Gaussian process; Oversampling; OVERSAMPLING TECHNIQUE; SAMPLING APPROACH; DATA-SETS; CLASSIFICATION; NOISY; TREES;
D O I
10.1016/j.neucom.2022.05.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalance learning is one of the most important topics in the field of machine learning and data mining, and the Synthetic Minority Oversampling Techniques (SMOTE) is the common method to handle this issue. The main shortcomings of the classic SMOTE and its variants is the interpolation of potential noise and unrepresentative examples. This paper is devoted to proposing a novel parameter-free SMOTE mechanism to produce sufficient representative synthetic examples while avoiding interpolating noisy examples. Specifically, two types of minority class examples are defined, namely boundary and safe minority examples. The synthetic examples generation procedure fully reflects the characteristics of the minority class examples with filling the region dominated by the minority class and expanding the margin of the minority class. To verify the effectiveness and robustness of the proposed method, a thorough experimental study on forty datasets selected from real-world applications is carried out. The experimental results indicate that our proposed method is competitive to the classic SMOTE and its state-of-the-art variants. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:75 / 88
页数:14
相关论文
共 61 条
[1]   To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques [J].
Abdi, Lida ;
Hashemi, Sattar .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) :238-251
[2]  
Alasadi S.A., 2017, J. Eng. Appl. Sci., V12, P4102
[3]   An Efficient Over-sampling Approach Based on Mean Square Error Back-propagation for Dealing with the Multi-class Imbalance Problem [J].
Alejo, R. ;
Garcia, V. ;
Pacheco-Sanchez, J. H. .
NEURAL PROCESSING LETTERS, 2015, 42 (03) :603-617
[4]  
[Anonymous], 1993, C4.5: Programs of machine learning
[5]  
[Anonymous], 1998, N Y
[6]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[7]  
Batista Gustavo APA, 2004, ACM SIGKDD Explor Newsl, V6, P20, DOI [10.1145/1007730.1007735, DOI 10.1145/1007730.1007735]
[8]   IIvotes ensemble for imbalanced data [J].
Blaszczynski, Jerzy ;
Deckert, Magdalena ;
Stefanowski, Jerzy ;
Wilk, Szymon .
INTELLIGENT DATA ANALYSIS, 2012, 16 (05) :777-801
[9]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[10]  
Breiman L, 1996, MACH LEARN, V24, P123, DOI 10.1023/A:1018054314350