An Improved Oversampling Method for imbalanced Data-SMOTE Based on Canopy and K-means

被引:0
作者
Guo, Chaoyou [1 ]
Ma, Yankun [1 ]
Xu, Zhe [1 ]
Cao, Mengmeng [1 ]
Yao, Qian [1 ]
机构
[1] Naval Univ Engn, Coll Power Engn, Wuhan, Peoples R China
来源
2019 CHINESE AUTOMATION CONGRESS (CAC2019) | 2019年
关键词
SMOTE; imbalanced data; oversampling; Canopy; K-means;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Synthetic Minority Oversampling Technique (SMOTE) is a preferable method used to solve the imbalanced data classification issues. However, its efficiency for resolving the issues of minority sample classification still need to be improved. In order to balance its value and shortcome, we designed a perfected algorithm called "C-K-SMOTE", which is a mixture clustering algorithm of the Canopy and K-means. For the final purpose of obtaining an approximately balanced data, first we use Canopy to achieve the approximate clustering, then use the K-means to obtain the accurate clustering, and after that we apply the SMOTE to increase the number of minority samples. The referential imbalanced data sets used in the article are selected from KEEL (Knowledge Extraction on Evolutionary Learning). By adopting random forest disaggregated model to carry experiments, SMOTE's efficiency of balancing the imbalanced databases is verified.
引用
收藏
页码:1467 / 1469
页数:3
相关论文
共 50 条
[21]   Canopy with k-means Clustering Algorithm for Big Data Analytics [J].
Sagheer, Noor S. ;
Yousif, Suhad A. .
FOURTH INTERNATIONAL CONFERENCE OF MATHEMATICAL SCIENCES (ICMS 2020), 2021, 2334
[22]   A novel oversampling method based on Wasserstein CGAN for imbalanced classification [J].
Zhou, Hongfang ;
Pan, Heng ;
Zheng, Kangyun ;
Wu, Zongling ;
Xiang, Qingyu .
CYBERSECURITY, 2025, 8 (01)
[23]   A Safe Zone SMOTE Oversampling Algorithm Used in Earthquake Prediction Based on Extreme Imbalanced Precursor Data [J].
Wang, Dongmei ;
Liang, Yiwen ;
Yang, Xinmin ;
Dong, Hongbin ;
Tan, Chengyu .
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (13)
[24]   PDR-SMOTE: an imbalanced data processing method based on data region partition and K nearest neighbors [J].
Zhou, Hongfang ;
Wu, Zongling ;
Xu, Ningning ;
Xiao, Hao .
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (12) :4135-4150
[25]   PDR-SMOTE: an imbalanced data processing method based on data region partition and K nearest neighbors [J].
Hongfang Zhou ;
Zongling Wu ;
Ningning Xu ;
Hao Xiao .
International Journal of Machine Learning and Cybernetics, 2023, 14 :4135-4150
[26]   Imbalanced fault classification of rolling bearing based on an improved oversampling method [J].
Han, Yanfang ;
Li, Baozhu ;
Huang, Yingkun ;
Li, Liang ;
Yan, Kang .
JOURNAL OF THE BRAZILIAN SOCIETY OF MECHANICAL SCIENCES AND ENGINEERING, 2023, 45 (04)
[27]   An improved naive Bayes algorithm based on k′k-means reclassification algorithm for imbalanced classification [J].
Zhang, Yanfeng ;
Wang, Lichun ;
Wang, Xin .
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2025,
[28]   Combining Random Subspace Approach with smote Oversampling for Imbalanced Data Classification [J].
Ksieniewicz, Pawel .
HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2019, 2019, 11734 :660-673
[29]   A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining [J].
Wongvorachan, Tarid ;
He, Surina ;
Bulut, Okan .
INFORMATION, 2023, 14 (01)
[30]   K-means Clustering based SVM Ensemble Methods for Imbalanced Data Problem [J].
Lee, Jaedong ;
Lee, Jee-Hyong .
2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2014, :614-617