CLUSTERING-BASED SUBSET ENSEMBLE LEARNING METHOD FOR IMBALANCED DATA

被引:0
|
作者
Hu, Xiao-Sheng [1 ]
Zhang, Run-Jing [2 ]
机构
[1] Foshan Univ, Coll Elect & Informat Engn, Foshan 528000, Peoples R China
[2] Foshan Univ, Informat & Educ Technol Ctr, Foshan 528000, Peoples R China
来源
PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4 | 2013年
关键词
Imbalanced data; Classification; Clustering; Ensemble learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent research, classification involving imbalanced datasets has received considerable attention. Most classification algorithms tend to predict that most of the incoming data belongs to the majority class, resulting in the poor classification performance in minority class instances, which are usually of much more interest. In this paper we propose a clustering-based subset ensemble learning method for handling class imbalanced problem. In the proposed approach, first, new balanced training datasets are produced using clustering-based under-sampling, then, further classification of new training sets are performed by applying four algorithms: Decision Tree, Naive Bayes, KNN and SVM, as the base algorithms in combined-bagging. An experimental analysis is carried out over a wide range of highly imbalanced data sets. The results obtained show that our method can improve imbalance classification performance of rare and normal classes stably and effectively.
引用
收藏
页码:35 / 39
页数:5
相关论文
共 50 条
  • [41] Clustering-based improved adaptive synthetic minority oversampling technique for imbalanced data classification
    Jin, Dian
    Xie, Dehong
    Liu, Di
    Gong, Murong
    INTELLIGENT DATA ANALYSIS, 2023, 27 (03) : 635 - 652
  • [42] Clustering-based selective neural network ensemble
    Fu Q.
    Hu S.-X.
    Zhao S.-Y.
    Journal of Zhejiang University-SCIENCE A, 2005, 6 (5): : 387 - 392
  • [43] GIR-based canonical forest: An ensemble method for imbalanced big data
    Han, Solji
    Myung, Jaesung
    Kim, Hyunjoong
    KOREAN JOURNAL OF APPLIED STATISTICS, 2024, 37 (05)
  • [44] Logistic regression for imbalanced learning based on clustering
    Guo, Huaping
    Wei, Tao
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2019, 18 (01) : 54 - 64
  • [45] EMRIL: Ensemble Method based on ReInforcement Learning for binary classification in imbalanced drifting data streams
    Usman, Muhammad
    Chen, Huanhuan
    NEUROCOMPUTING, 2024, 605
  • [46] A combination of clustering-based under-sampling with ensemble methods for solving imbalanced class problem in intelligent systems
    Shahabadi, Mohammad Saleh Ebrahimi
    Tabrizchi, Hamed
    Rafsanjani, Marjan Kuchaki
    Gupta, B. B.
    Palmieri, Francesco
    TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE, 2021, 169
  • [47] A Novel Ensemble-Learning-Based Convolution Neural Network for Handling Imbalanced Data
    Wu, Xianbin
    Wen, Chuanbo
    Wang, Zidong
    Liu, Weibo
    Yang, Junjie
    COGNITIVE COMPUTATION, 2024, 16 (01) : 177 - 190
  • [48] A Novel Ensemble-Learning-Based Convolution Neural Network for Handling Imbalanced Data
    Xianbin Wu
    Chuanbo Wen
    Zidong Wang
    Weibo Liu
    Junjie Yang
    Cognitive Computation, 2024, 16 : 177 - 190
  • [49] A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced Data
    Liu, Na
    Li, Xiaomei
    Qi, Ershi
    Xu, Man
    Li, Ling
    Gao, Bo
    IEEE ACCESS, 2020, 8 : 171263 - 171280
  • [50] A Clustering-Based Deep Learning Method for Water Level Prediction
    Wang, Chih-Ping
    Liu, Duen-Ren
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107 (12) : 1538 - 1541