KSIPF: an effective noise filtering oversampling method based on k-means and iterative-partitioning filter

被引:0
作者
Sun, Pengfei [1 ]
Wang, Zhiping [1 ]
Jia, Liyan [1 ]
Wang, Xiaoxi [2 ]
机构
[1] Dalian Maritime Univ, Sch Sci, Dalian 116026, Liaoning, Peoples R China
[2] Dalian Med Univ, Affiliated Hosp 1, Dept Clin Lab Med, Dalian 116011, Liaoning, Peoples R China
关键词
Imbalanced classification; Noise filter; Iterative partition filtering; SMOTE; RE-SAMPLING METHOD; BORDERLINE EXAMPLES; IMBALANCED PROBLEMS; SMOTE; CLASSIFICATION; PREDICTION; MACHINE;
D O I
10.1007/s11227-025-07081-5
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The Synthetic Minority Oversampling TEchnique (SMOTE) is known as the benchmark method to solve class imbalance learning. Since SMOTE was proposed, many variants of it have emerged, which are classified into two types: pre-processing and post-processing. However, most of the pre-processing methods do not filter the noisy samples; at the same time, the post-processing methods do not give attention to the focus area data. In this paper, we present an oversampling method based on kmeans-SMOTE and Iterative Partition Filter (KSIPF), which overcomes the shortcomings of the above methods. Firstly, KSIPF uses k-means to cluster the data and selects the clusters to oversample, and then, IPF is used to remove the noise samples from the data. Then, KSIPF is compared with the SMOTE and its variants on 30 synthetic imbalanced data sets and 20 real-world imbalanced data sets, and the balanced data sets are used to train SVM and AdaBoost classifiers to determine whether it is effective. Finally, the experiment results demonstrate that KSIPF performs better than the comparisons, including area under the curve, F1-measure, and the statistical test.
引用
收藏
页数:35
相关论文
共 52 条
  • [1] An Efficient Over-sampling Approach Based on Mean Square Error Back-propagation for Dealing with the Multi-class Imbalance Problem
    Alejo, R.
    Garcia, V.
    Pacheco-Sanchez, J. H.
    [J]. NEURAL PROCESSING LETTERS, 2015, 42 (03) : 603 - 617
  • [2] MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning
    Barua, Sukarna
    Islam, Md. Monirul
    Yao, Xin
    Murase, Kazuyuki
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) : 405 - 425
  • [3] Batista G.E.A.P.A., 2004, SIGKDD Explor. Newsl., V6, P20, DOI [10.1145/1007730.1007735, 10.1145/1007730.1007735.2]
  • [4] MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction
    Benni, Kwabena Ebo
    Keung, Jacky
    Phannachitta, Passakorn
    Monden, Akito
    Mensah, Solomon
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2018, 44 (06) : 534 - 550
  • [5] The use of the area under the roc curve in the evaluation of machine learning algorithms
    Bradley, AP
    [J]. PATTERN RECOGNITION, 1997, 30 (07) : 1145 - 1159
  • [6] Identifying mislabeled training data
    Brodley, CE
    Friedl, MA
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1999, 11 : 131 - 167
  • [7] Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43
  • [8] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [9] RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise
    Chen, Baiyun
    Xia, Shuyin
    Chen, Zizhong
    Wang, Binggui
    Wang, Guoyin
    [J]. INFORMATION SCIENCES, 2021, 553 : 397 - 428
  • [10] Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE
    Douzas, Georgios
    Bacao, Fernando
    Last, Felix
    [J]. INFORMATION SCIENCES, 2018, 465 : 1 - 20