Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE

被引:1
作者
Chen, Junfeng [1 ]
Zheng, Zhongtuan [1 ]
机构
[1] School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai
关键词
clustering consistency index; clustering ensembles; clusters matching; feature weighting; imbalanced data classification; over-sampling;
D O I
10.3778/j.issn.1002-8331.2011-0215
中图分类号
学科分类号
摘要
A new method for imbalanced data sets on feature weighting and clustering ensembles is proposed(WKMeans-SMOTE), which aims to solve the problem of synthesizing all the minority samples without any guidance in SMOTE method. Firstly, considering the different degree of impact of different feature weights on the clustering results, a new clustering algorithm with different feature weights is selected. The initial cluster center is changed many times to generate different clustering results.Then,clustering results are aligned based on the idea of matching clusters algorithm,and the cluster boundary minority samples are picked by introducing clustering consistency index. Finally, the SMOTE method is used on those picked minority samples, and CART algorithm is used as the base classifier to train the balanced dataset. The experimental results show that the method achieves better classifying quality on F-value and G-mean compared with SMOTE, Borderline-SMOTE, ADASYN and other oversampling methods. © 2024 Journal of Computer Engineering and Applications Beijing Co., Ltd.; Science Press. All rights reserved.
引用
收藏
页码:106 / 112
页数:6
相关论文
共 22 条
[1]  
HE H, GARCIA E A., Learning from imbalanced data[J], IEEE Transactions on Knowledge and Data Engineering, 21, 9, pp. 1263-1284, (2009)
[2]  
PHILIP K, CHAN S J S., Toward scalable learning with non-uniform class and cost distributions:a case study in credit card fraud detection[C], Proceeding of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 164-168, (1998)
[3]  
BATISTA G E, PRATI R C, MONARD M C., A study of the behavior of several methods for balancing machine learning training data[J], ACM SIGKDD Explorations Newsletter, 6, 1, pp. 20-29, (2004)
[4]  
CHAWLA N V,, BOWYER K W, HALL L O, Et al., SMOTE:synthetic minority over-sampling technique[J], Journal of Artificial Intelligence Research, 16, pp. 321-357, (2002)
[5]  
SHI H B, CHEN Y W,, CHEN X., Summary of research on SMOTE over sampling and its improved algorithms[J], CAAI Transactions on Intelligence Systems, 14, 6, pp. 1073-1083, (2019)
[6]  
HAN H, WANG W Y, MAO B H., Borderline-SMOTE:a new over-sampling method in imbalanced data sets learning[C], International Conference on Intelligent Computing, pp. 878-887, (2005)
[7]  
HE H, BAI Y, GARCIA E A, Et al., Adaptive synthetic sampling approach for imbalanced learning[C], IEEE International Joint Conference on Neural Networks, (2008)
[8]  
CHAWLA N V,, LAZAREVIC A, HALL L O, Et al., SMOTEBoost:improving prediction of the minority class in boosting[C], European Conference on Principles of Data Mining and Knowledge Discovery, pp. 107-119, (2003)
[9]  
GALAR M, FERNANDEZ A, BARRENECHEA E, Et al., A review on ensembles for the class imbalance problem:bagging-,boosting-,and hybrid-based approaches[J], IEEE Transactions on Systems,Man,and Cybernetics,Part C:Applications and Reviews, 42, 4, pp. 463-484, (2011)
[10]  
SEIFFERT C, KHOSHGOFTAAR T M, VAN HULSE J, Et al., RUSBoost:a hybrid approach to alleviating class imbalance[J], IEEE Transactions on Systems,Man,and Cybernetics,Part A:Systems and Humans, 40, 1, pp. 185-197, (2009)