An Improved Oversampling Method for imbalanced Data-SMOTE Based on Canopy and K-means

被引:0
作者
Guo, Chaoyou [1 ]
Ma, Yankun [1 ]
Xu, Zhe [1 ]
Cao, Mengmeng [1 ]
Yao, Qian [1 ]
机构
[1] Naval Univ Engn, Coll Power Engn, Wuhan, Peoples R China
来源
2019 CHINESE AUTOMATION CONGRESS (CAC2019) | 2019年
关键词
SMOTE; imbalanced data; oversampling; Canopy; K-means;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Synthetic Minority Oversampling Technique (SMOTE) is a preferable method used to solve the imbalanced data classification issues. However, its efficiency for resolving the issues of minority sample classification still need to be improved. In order to balance its value and shortcome, we designed a perfected algorithm called "C-K-SMOTE", which is a mixture clustering algorithm of the Canopy and K-means. For the final purpose of obtaining an approximately balanced data, first we use Canopy to achieve the approximate clustering, then use the K-means to obtain the accurate clustering, and after that we apply the SMOTE to increase the number of minority samples. The referential imbalanced data sets used in the article are selected from KEEL (Knowledge Extraction on Evolutionary Learning). By adopting random forest disaggregated model to carry experiments, SMOTE's efficiency of balancing the imbalanced databases is verified.
引用
收藏
页码:1467 / 1469
页数:3
相关论文
共 50 条
[31]   Radial-Based Approach to Imbalanced Data Oversampling [J].
Koziarski, Michal ;
Krawczyk, Bartosz ;
Wozniak, Michal .
HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2017, 2017, 10334 :318-327
[32]   Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling [J].
Luengo, Julian ;
Fernandez, Alberto ;
Garcia, Salvador ;
Herrera, Francisco .
SOFT COMPUTING, 2011, 15 (10) :1909-1936
[33]   Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling [J].
Julián Luengo ;
Alberto Fernández ;
Salvador García ;
Francisco Herrera .
Soft Computing, 2011, 15 :1909-1936
[34]   An Improved MAHAKIL Oversampling Method for Imbalanced Dataset Classification [J].
Zhang, Yong ;
Zuo, Tingting ;
Fang, Lichao ;
Li, Jun ;
Xing, Zongyi .
IEEE ACCESS, 2021, 9 :16030-16040
[35]   Undersampled K-means approach for handling imbalanced distributed data [J].
Kumar, N. Santhosh ;
Rao, K. Nageswara ;
Govardhan, A. ;
Reddy, K. Sudheer ;
Mahmood, Ali Mirza .
PROGRESS IN ARTIFICIAL INTELLIGENCE, 2014, 3 (01) :29-38
[36]   A new boundary-degree-based oversampling method for imbalanced data [J].
Chen, Yueqi ;
Pedrycz, Witold ;
Yang, Jie .
APPLIED INTELLIGENCE, 2023, 53 (22) :26518-26541
[37]   K-means Bayes algorithm for imbalanced fault classification and big data application [J].
Chen, Gecheng ;
Liu, Yue ;
Ge, Zhiqiang .
JOURNAL OF PROCESS CONTROL, 2019, 81 :54-64
[38]   Local Outlier Detection Method Based on Improved K-means [J].
Zhou, Yu ;
Xia, Hao ;
Yue, Xuezhen ;
Wang, Peichong .
Gongcheng Kexue Yu Jishu/Advanced Engineering Sciences, 2024, 56 (04) :66-77
[39]   A Novel Oversampling Technique for Imbalanced Learning Based on SMOTE and Genetic Algorithm [J].
Gong, Juan .
NEURAL INFORMATION PROCESSING, ICONIP 2021, PT III, 2021, 13110 :201-212
[40]   MKC-SMOTE: A Novel Synthetic Oversampling Method for Multi-Class Imbalanced Data Classification [J].
Wang, Jiao ;
Awang, Norhashidah .
IEEE ACCESS, 2024, 12 :196929-196938