A novel SMOTE-based resampling technique trough noise detection and the boosting procedure

被引:48
作者
Saglam, Fatih [1 ]
Cengiz, Mehmet Ali [1 ]
机构
[1] Ondokuz Mays Univ, Fac Art & Sci, Dept Stat, Samsun, Turkey
关键词
Oversampling; SMOTE; Class imbalance; Noisy data; SAMPLING METHOD; CLASSIFIERS; PREDICTION; MAJORITY;
D O I
10.1016/j.eswa.2022.117023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most of the classification methods assume that the numbers of class observations are balanced. In such cases, models are predicted by giving biased weight to the the class with more observations. Therefore, the classifiers ignore the class with smaller number of observations and the majority class makes biased predictions. There are some advised performance measures to be used in datasets, as well as recommended approaches to solve class imbalance problem. One of the most widely used methods is resampling method. In this study, the difficulties relevant to random oversampling (ROS) and synthetic minority oversampling technique (SMOTE), which are some of the oversampling methods, are discussed. This study aims to propose a combination of a new noise detection method and SMOTE to overcome those difficulties. Using the boosting procedure in ensemble algo-rithms, noise detection is possible with the proposed SMOTE with boosting (SMOTEWB) method, which makes use of this information to determine the appropriate number of neighbors for each observation within SMOTE algorithm.
引用
收藏
页数:12
相关论文
共 59 条
[1]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[2]  
[Anonymous], 2013, Journal Information Engineering Applications
[3]  
Blake C., 1998, Uci repository of machine learning databases
[4]   DBMUTE: density-based majority under-sampling technique [J].
Bunkhumpornpat, Chumphol ;
Sinapiromsaran, Krung .
KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 50 (03) :827-850
[5]   DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique [J].
Bunkhumpornpat, Chumphol ;
Sinapiromsaran, Krung ;
Lursinsap, Chidchanok .
APPLIED INTELLIGENCE, 2012, 36 (03) :664-684
[6]  
Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43
[7]   Dealing with difficult minority labels in imbalanced mutilabel data sets [J].
Charte, Francisco ;
Rivera, Antonio J. ;
del Jesus, Maria J. ;
Herrera, Francisco .
NEUROCOMPUTING, 2019, 326 :39-53
[8]   Automatically countering imbalance and its empirical relationship to cost [J].
Chawla, Nitesh V. ;
Cieslak, David A. ;
Hall, Lawrence O. ;
Joshi, Ajay .
DATA MINING AND KNOWLEDGE DISCOVERY, 2008, 17 (02) :225-252
[9]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[10]   SMOTEBoost: Improving prediction of the minority class in boosting [J].
Chawla, NV ;
Lazarevic, A ;
Hall, LO ;
Bowyer, KW .
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 :107-119