An Improved Oversampling Method for imbalanced Data-SMOTE Based on Canopy and K-means

被引:0
|
作者
Guo, Chaoyou [1 ]
Ma, Yankun [1 ]
Xu, Zhe [1 ]
Cao, Mengmeng [1 ]
Yao, Qian [1 ]
机构
[1] Naval Univ Engn, Coll Power Engn, Wuhan, Peoples R China
来源
2019 CHINESE AUTOMATION CONGRESS (CAC2019) | 2019年
关键词
SMOTE; imbalanced data; oversampling; Canopy; K-means;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Synthetic Minority Oversampling Technique (SMOTE) is a preferable method used to solve the imbalanced data classification issues. However, its efficiency for resolving the issues of minority sample classification still need to be improved. In order to balance its value and shortcome, we designed a perfected algorithm called "C-K-SMOTE", which is a mixture clustering algorithm of the Canopy and K-means. For the final purpose of obtaining an approximately balanced data, first we use Canopy to achieve the approximate clustering, then use the K-means to obtain the accurate clustering, and after that we apply the SMOTE to increase the number of minority samples. The referential imbalanced data sets used in the article are selected from KEEL (Knowledge Extraction on Evolutionary Learning). By adopting random forest disaggregated model to carry experiments, SMOTE's efficiency of balancing the imbalanced databases is verified.
引用
收藏
页码:1467 / 1469
页数:3
相关论文
共 50 条
  • [1] Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE
    Douzas, Georgios
    Bacao, Fernando
    Last, Felix
    INFORMATION SCIENCES, 2018, 465 : 1 - 20
  • [2] A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data
    Xu, Zhaozhao
    Shen, Derong
    Nie, Tiezheng
    Kou, Yue
    Yin, Nan
    Han, Xi
    INFORMATION SCIENCES, 2021, 572 : 574 - 589
  • [3] LR-SMOTE - An improved unbalanced data set oversampling based on K-means and SVM
    Liang, X. W.
    Jiang, A. P.
    Li, T.
    Xue, Y. Y.
    Wang, G. T.
    KNOWLEDGE-BASED SYSTEMS, 2020, 196
  • [4] Imbalanced data optimization combining K-means and SMOTE
    Li W.
    International Journal of Performability Engineering, 2019, 15 (08): : 2173 - 2181
  • [5] A K-means Improved CTGAN Oversampling Method for Data Imbalance Problem
    An, Chunsheng
    Sun, Jingtong
    Wang, Yifeng
    Wei, Qingjie
    2021 IEEE 21ST INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS 2021), 2021, : 883 - 887
  • [6] An automatic identification method of imbalanced lithology based on Deep Forest and K-means SMOTE
    Zhu, Xinyi
    Zhang, Hongbing
    Ren, Quan
    Zhang, Dailu
    Zeng, Fanxing
    Zhu, Xinjie
    Zhang, Lingyuan
    GEOENERGY SCIENCE AND ENGINEERING, 2023, 224
  • [7] Improving Imbalanced Land Cover Classification with K-Means SMOTE: Detecting and Oversampling Distinctive Minority Spectral Signatures
    Fonseca, Joao
    Douzas, Georgios
    Bacao, Fernando
    INFORMATION, 2021, 12 (07)
  • [8] Improved K-means algorithm based on density Canopy
    Zhang, Geng
    Zhang, Chengchang
    Zhang, Huayu
    KNOWLEDGE-BASED SYSTEMS, 2018, 145 : 289 - 297
  • [9] The incremental SMOTE: A new approach based on the incremental k-means algorithm for solving imbalanced data set problem
    Turan, Duygu Selin
    Ordin, Burak
    INFORMATION SCIENCES, 2025, 711
  • [10] An AdaBoost Method with K'K-Means Bayes Classifier for Imbalanced Data
    Zhang, Yanfeng
    Wang, Lichun
    MATHEMATICS, 2023, 11 (08)