PF-SMOTE: A novel parameter-free SMOTE for imbalanced datasets

被引:41
作者
Chen, Qiong [1 ]
Zhang, Zhong-Liang [1 ,2 ,3 ]
Huang, Wen-Po [1 ]
Wu, Jian [1 ,3 ]
Luo, Xing-Gang [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Management, Hangzhou 310018, Peoples R China
[2] Shanghai Jiao Tong Univ, Antai Coll Econ & Management, Shanghai 200030, Peoples R China
[3] Hangzhou Dianzi Univ, Res Ctr Youth Publ Opin Zhejiang, Hangzhou 310018, Peoples R China
基金
美国国家科学基金会;
关键词
Imbalanced datasets; Data preprocessing; SMOTE; Gaussian process; Oversampling; OVERSAMPLING TECHNIQUE; SAMPLING APPROACH; DATA-SETS; CLASSIFICATION; NOISY; TREES;
D O I
10.1016/j.neucom.2022.05.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalance learning is one of the most important topics in the field of machine learning and data mining, and the Synthetic Minority Oversampling Techniques (SMOTE) is the common method to handle this issue. The main shortcomings of the classic SMOTE and its variants is the interpolation of potential noise and unrepresentative examples. This paper is devoted to proposing a novel parameter-free SMOTE mechanism to produce sufficient representative synthetic examples while avoiding interpolating noisy examples. Specifically, two types of minority class examples are defined, namely boundary and safe minority examples. The synthetic examples generation procedure fully reflects the characteristics of the minority class examples with filling the region dominated by the minority class and expanding the margin of the minority class. To verify the effectiveness and robustness of the proposed method, a thorough experimental study on forty datasets selected from real-world applications is carried out. The experimental results indicate that our proposed method is competitive to the classic SMOTE and its state-of-the-art variants. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:75 / 88
页数:14
相关论文
共 61 条
[51]   A Cost-Sensitive Deep Learning-Based Approach for Network Traffic Classification [J].
Telikani, Akbar ;
Gandomi, Amir H. ;
Choo, Kim-Kwang Raymond ;
Shen, Jun .
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2022, 19 (01) :661-670
[52]   Data imbalance in classification: Experimental evaluation [J].
Thabtah, Fadi ;
Hammoud, Suhel ;
Kamalov, Firuz ;
Gonsalves, Amanda .
INFORMATION SCIENCES, 2020, 513 :429-441
[53]  
Vapnik V, 1998, NONLINEAR MODELING, P55
[54]   A Novel Ensemble Method for Imbalanced Data Learning: Bagging of Extrapolation-SMOTE SVM [J].
Wang, Qi ;
Luo, ZhiHao ;
Huang, JinCai ;
Feng, YangHe ;
Liu, Zhong .
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2017, 2017
[55]   Multiclass Imbalance Problems: Analysis and Potential Solutions [J].
Wang, Shuo ;
Yao, Xin .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2012, 42 (04) :1119-1130
[56]  
Weiss G.M., 2004, ACM SIGKDD Explorations Newsletter, V6
[57]   Maximizing classifier utility when there are data acquisition and modeling costs [J].
Weiss, Gary M. ;
Tian, Ye .
DATA MINING AND KNOWLEDGE DISCOVERY, 2008, 17 (02) :253-282
[58]  
WILCOXON F, 1946, J ECON ENTOMOL, V39, P269, DOI 10.1093/jee/39.2.269
[59]   SPY: a novel resampling method for improving classification performance in imbalanced data [J].
Xuan Tho Dang ;
Dang Hung Tran ;
Hirose, Osamu ;
Satou, Kenji .
2015 SEVENTH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2015, :280-285
[60]   DRCW-ASEG: One-versus-One distance-based relative competence weighting with adaptive synthetic example generation for multi-class imbalanced datasets [J].
Zhang, Zhong-Liang ;
Luo, Xing-Gang ;
Gonzalez, Sergio ;
Garcia, Salvador ;
Herrera, Francisco .
NEUROCOMPUTING, 2018, 285 :176-187