A novel instance density-based hybrid resampling for imbalanced classification problems

被引:0
作者
Park, You-Jin [1 ]
Ma, Chung-Kang [1 ]
机构
[1] Department of Industrial Engineering and Management, National Taipei University of Technology, Taipei
关键词
Class imbalance; Classification; Hybrid resampling; Instance density; Machine learning;
D O I
10.1007/s00500-025-10499-x
中图分类号
学科分类号
摘要
The class imbalance problem is one of the challenging issues in various machine learning applications. This problem occurs when the number of instances of a class is much smaller (or larger) than those of the other classes. To handle the imbalanced classification problems, many useful approaches have been developed, for example, synthetic minority oversampling technique (SMOTE). However, the SMOTE is often sensitive to the predetermined k value, i.e., the number of nearest neighbors used to generate the synthetic instances. For example, if the k value is moderately large, some of the synthetic instances generated by the SMOTE would be located near a decision boundary or even within the majority class area and thus these can be treated as unnecessary noisy instances. Thus, in this study, we propose an efficient hybrid resampling method based on instance density called IDHR (Instance Density-based Hybrid Resampling) to improve the classification performance by generating instances that are closer to the minority class than the majority class while avoiding generation of noisy instances. For this, we first apply the instance density-based oversampling (IDO) technique to generate new synthetic instances. And then, we eliminate some of the synthetic instances that are close to the decision boundary and determine the number of the synthetic instances among the retained synthetic ones which can be eliminated based on maximum of the distances from all the synthetic instances to the minority class instances and minimum of the distances from all the synthetic instances to the majority class instances as well as classification performances. To demonstrate the effectiveness of the proposed resampling method, comprehensive experiments are conducted on sixteen imbalanced datasets with considering three classifiers, i.e., C4.5 decision tree algorithm, support vector machine (SVM), and multi-layer perceptron neural network (MLP-NN). Through the experimental analysis, it is shown that the proposed resampling method outperforms the traditional oversampling methods with respect to AUC and F-measure for most of the imbalanced datasets regardless of classifiers. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2025.
引用
收藏
页码:2031 / 2045
页数:14
相关论文
共 29 条
[11]  
Hud S., Liu K., Abdelrazek M., Ibrahim A., Alyahya S., Al-Dossari H., Ahmad S., An ensemble oversampling model for class imbalance problem in software defect prediction, IEEE Access, 6, pp. 24184-24195, (2018)
[12]  
Krawczyk B., Learning from imbalanced data: open challenges and future directions, Lect Notes Artif Int, 5, 4, pp. 221-232, (2016)
[13]  
Krawczyk B., Wozniak M., Herrera F., Weighted one-class classification for different types of minority class examples in imbalanced data, IEEE symposium on computational intelligence and data mining (CIDM). IEEE, pp. 337-344, (2014)
[14]  
Leevy J.L., Khoshgoftaar T.M., Bauder R.A., Seliya N., A survey on addressing high-class imbalance in big data, J Big Data, 5, 1, pp. 1-30, (2018)
[15]  
Li D., Kotani D., Okabe Y., Improving attack detection performance in NIDS using GAN, IEEE 44th Annual computers, software, and applications conference (COMPSAC). IEEE, pp. 817-825, (2020)
[16]  
Liu Y., Zhu L., Ding L., Sui H., Shang W., A hybrid sampling method for highly imbalanced and overlapped data classification with complex distribution, Inf Sci, 661, (2024)
[17]  
Ma T., Lu S., Jiang C., A membership-based resampling and cleaning algorithm for multi-class imbalanced overlapping data, Expert Syst Appl, 240, (2024)
[18]  
Napierala K., Stefanowski J., Types of minority class examples and their influence on learning classifiers from imbalanced data, J Intell Inf Syst, 46, pp. 563-597, (2016)
[19]  
Pruengkarn R., Wong K.W., Fung C.C., Imbalanced data classification using complementary fuzzy support vector machine techniques and smote, IEEE International conference on systems, man and cybernetics (SMC). IEEE, pp. 978-983, (2017)
[20]  
Saez J.A., Luengo J., Stefanowski J., Herrera F., SMOTE–IPF: addressing the noisy and borderline examples problem in imbalanced classification by a resampling method with filtering, Inf Sci, 291, pp. 184-203, (2015)