CLUSTERING-BASED SUBSET ENSEMBLE LEARNING METHOD FOR IMBALANCED DATA

被引:0
|
作者
Hu, Xiao-Sheng [1 ]
Zhang, Run-Jing [2 ]
机构
[1] Foshan Univ, Coll Elect & Informat Engn, Foshan 528000, Peoples R China
[2] Foshan Univ, Informat & Educ Technol Ctr, Foshan 528000, Peoples R China
来源
PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4 | 2013年
关键词
Imbalanced data; Classification; Clustering; Ensemble learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent research, classification involving imbalanced datasets has received considerable attention. Most classification algorithms tend to predict that most of the incoming data belongs to the majority class, resulting in the poor classification performance in minority class instances, which are usually of much more interest. In this paper we propose a clustering-based subset ensemble learning method for handling class imbalanced problem. In the proposed approach, first, new balanced training datasets are produced using clustering-based under-sampling, then, further classification of new training sets are performed by applying four algorithms: Decision Tree, Naive Bayes, KNN and SVM, as the base algorithms in combined-bagging. An experimental analysis is carried out over a wide range of highly imbalanced data sets. The results obtained show that our method can improve imbalance classification performance of rare and normal classes stably and effectively.
引用
收藏
页码:35 / 39
页数:5
相关论文
共 50 条
  • [31] An Imbalanced Multi-Label Data Ensemble Learning Method Based on Safe Under-Sampling
    Sun, Zhong-Bin
    Diao, Yu-Xuan
    Ma, Su-Yang
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2024, 52 (10): : 3392 - 3408
  • [32] Ensemble learning based predictive modelling on a highly imbalanced multiclass data
    Vasti, Manka
    Dev, Amita
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2024, 45 (08) : 2141 - 2164
  • [33] imDC: an ensemble learning method for imbalanced classification with miRNA data
    Wang, C. Y.
    Hu, L. L.
    Guo, M. Z.
    Liu, X. Y.
    Zou, Q.
    GENETICS AND MOLECULAR RESEARCH, 2015, 14 (01): : 123 - 133
  • [34] Sample Subset Optimization Techniques for Imbalanced and Ensemble Learning Problems in Bioinformatics Applications
    Yang, Pengyi
    Yoo, Paul D.
    Fernando, Juanita
    Zhou, Bing B.
    Zhang, Zili
    Zomaya, Albert Y.
    IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (03) : 445 - 455
  • [35] Method for Incomplete and Imbalanced Data Based on Multivariate Imputation by Chained Equations and Ensemble Learning
    Li, Jiaxi
    Wang, Zhelong
    Wu, Lina
    Qiu, Sen
    Zhao, Hongyu
    Lin, Fang
    Zhang, Ke
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (05) : 3102 - 3113
  • [36] Ensemble Learning on Large Scale Financial Imbalanced Data
    Sanabila, H. R.
    Jatmiko, Wisnu
    2018 INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS), 2018, : 93 - 98
  • [37] Imbalanced Data Over-Sampling Method Based on ISODATA Clustering
    Lv, Zhenzhe
    Liu, Qicheng
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (09) : 1528 - 1536
  • [38] Ensemble based adaptive over-sampling method for imbalanced data learning in computer aided detection of microaneurysm
    Ren, Fulong
    Cao, Peng
    Li, Wei
    Zhao, Dazhe
    Zaiane, Osmar
    COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2017, 55 : 54 - 67
  • [39] An Improved Ensemble Learning Method for Classifying High-Dimensional and Imbalanced Biomedicine Data
    Yu, Hualong
    Ni, Jun
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (04) : 657 - 666
  • [40] Spark-based ensemble learning for imbalanced data classification
    Ding J.
    Wang S.
    Jia L.
    You J.
    Jiang Y.
    International Journal of Performability Engineering, 2018, 14 (05) : 945 - 964