CLUSTERING-BASED SUBSET ENSEMBLE LEARNING METHOD FOR IMBALANCED DATA

被引:0
|
作者
Hu, Xiao-Sheng [1 ]
Zhang, Run-Jing [2 ]
机构
[1] Foshan Univ, Coll Elect & Informat Engn, Foshan 528000, Peoples R China
[2] Foshan Univ, Informat & Educ Technol Ctr, Foshan 528000, Peoples R China
关键词
Imbalanced data; Classification; Clustering; Ensemble learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent research, classification involving imbalanced datasets has received considerable attention. Most classification algorithms tend to predict that most of the incoming data belongs to the majority class, resulting in the poor classification performance in minority class instances, which are usually of much more interest. In this paper we propose a clustering-based subset ensemble learning method for handling class imbalanced problem. In the proposed approach, first, new balanced training datasets are produced using clustering-based under-sampling, then, further classification of new training sets are performed by applying four algorithms: Decision Tree, Naive Bayes, KNN and SVM, as the base algorithms in combined-bagging. An experimental analysis is carried out over a wide range of highly imbalanced data sets. The results obtained show that our method can improve imbalance classification performance of rare and normal classes stably and effectively.
引用
收藏
页码:35 / 39
页数:5
相关论文
共 50 条
  • [31] DBCSMOTE: a clustering-based oversampling technique for data-imbalanced warfarin dose prediction
    Tao, Yanyun
    Zhang, Yuzhen
    Jiang, Bin
    BMC MEDICAL GENOMICS, 2020, 13 (Suppl 10)
  • [32] Spark-based ensemble learning for imbalanced data classification
    Ding J.
    Wang S.
    Jia L.
    You J.
    Jiang Y.
    International Journal of Performability Engineering, 2018, 14 (05) : 945 - 964
  • [33] DBCSMOTE: a clustering-based oversampling technique for data-imbalanced warfarin dose prediction
    Yanyun Tao
    Yuzhen Zhang
    Bin Jiang
    BMC Medical Genomics, 13
  • [34] Clustering-based improved adaptive synthetic minority oversampling technique for imbalanced data classification
    Jin, Dian
    Xie, Dehong
    Liu, Di
    Gong, Murong
    INTELLIGENT DATA ANALYSIS, 2023, 27 (03) : 635 - 652
  • [35] Clustering-based selective neural network ensemble
    Fu Q.
    Hu S.-X.
    Zhao S.-Y.
    Journal of Zhejiang University-SCIENCE A, 2005, 6 (5): : 387 - 392
  • [36] A Clustering-Based Ensemble Technique for Shape Decomposition
    Lewin, Sergej
    Jiang, Xiaoyi
    Clausing, Achim
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2012, 7626 : 153 - 161
  • [37] Clustering-Based Subset Selection in Evolutionary Multiobjective Optimization
    Chen, Weiyu
    Ishibuchi, Hisao
    Shang, Ke
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 468 - 475
  • [38] EMRIL: Ensemble Method based on ReInforcement Learning for binary classification in imbalanced drifting data streams
    Usman, Muhammad
    Chen, Huanhuan
    NEUROCOMPUTING, 2024, 605
  • [39] A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data
    Song, Qinbao
    Ni, Jingjie
    Wang, Guangtao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (01) : 1 - 14
  • [40] A combination of clustering-based under-sampling with ensemble methods for solving imbalanced class problem in intelligent systems
    Shahabadi, Mohammad Saleh Ebrahimi
    Tabrizchi, Hamed
    Rafsanjani, Marjan Kuchaki
    Gupta, B. B.
    Palmieri, Francesco
    TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE, 2021, 169