KSIPF: an effective noise filtering oversampling method based on k-means and iterative-partitioning filter

被引：0

作者：

Sun, Pengfei ^{[1
]}

Wang, Zhiping ^{[1
]}

Jia, Liyan ^{[1
]}

Wang, Xiaoxi ^{[2
]}

机构：

[1] Dalian Maritime Univ, Sch Sci, Dalian 116026, Liaoning, Peoples R China

[2] Dalian Med Univ, Affiliated Hosp 1, Dept Clin Lab Med, Dalian 116011, Liaoning, Peoples R China

来源：

JOURNAL OF SUPERCOMPUTING | 2025年 / 81卷 / 04期

关键词：

Imbalanced classification; Noise filter; Iterative partition filtering; SMOTE; RE-SAMPLING METHOD; BORDERLINE EXAMPLES; IMBALANCED PROBLEMS; SMOTE; CLASSIFICATION; PREDICTION; MACHINE;

D O I：

10.1007/s11227-025-07081-5

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The Synthetic Minority Oversampling TEchnique (SMOTE) is known as the benchmark method to solve class imbalance learning. Since SMOTE was proposed, many variants of it have emerged, which are classified into two types: pre-processing and post-processing. However, most of the pre-processing methods do not filter the noisy samples; at the same time, the post-processing methods do not give attention to the focus area data. In this paper, we present an oversampling method based on kmeans-SMOTE and Iterative Partition Filter (KSIPF), which overcomes the shortcomings of the above methods. Firstly, KSIPF uses k-means to cluster the data and selects the clusters to oversample, and then, IPF is used to remove the noise samples from the data. Then, KSIPF is compared with the SMOTE and its variants on 30 synthetic imbalanced data sets and 20 real-world imbalanced data sets, and the balanced data sets are used to train SVM and AdaBoost classifiers to determine whether it is effective. Finally, the experiment results demonstrate that KSIPF performs better than the comparisons, including area under the curve, F1-measure, and the statistical test.

引用

页数：35

共 52 条

[1] An Efficient Over-sampling Approach Based on Mean Square Error Back-propagation for Dealing with the Multi-class Imbalance Problem
Alejo, R.
Garcia, V.
Pacheco-Sanchez, J. H.
[J]. NEURAL PROCESSING LETTERS, 2015, 42 (03) : 603 - 617
[2] MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning
Barua, Sukarna
Islam, Md. Monirul
Yao, Xin
Murase, Kazuyuki
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) : 405 - 425
[3] Batista G.E.A.P.A., 2004, SIGKDD Explor. Newsl., V6, P20, DOI [10.1145/1007730.1007735, 10.1145/1007730.1007735.2]
[4] MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction
Benni, Kwabena Ebo
Keung, Jacky
Phannachitta, Passakorn
Monden, Akito
Mensah, Solomon
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2018, 44 (06) : 534 - 550
[5] The use of the area under the roc curve in the evaluation of machine learning algorithms
Bradley, AP
[J]. PATTERN RECOGNITION, 1997, 30 (07) : 1145 - 1159
[6] Identifying mislabeled training data
Brodley, CE
Friedl, MA
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1999, 11 : 131 - 167
[7] Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43
[8] SMOTE: Synthetic minority over-sampling technique
Chawla, Nitesh V.
Bowyer, Kevin W.
Hall, Lawrence O.
Kegelmeyer, W. Philip
[J]. 2002, American Association for Artificial Intelligence (16)
[9] RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise
Chen, Baiyun
Xia, Shuyin
Chen, Zizhong
Wang, Binggui
Wang, Guoyin
[J]. INFORMATION SCIENCES, 2021, 553 : 397 - 428
[10] Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE
Douzas, Georgios
Bacao, Fernando
Last, Felix
[J]. INFORMATION SCIENCES, 2018, 465 : 1 - 20

← 1 2 3 4 5 6 →