Improving undersampling-based ensemble with rotation forest for imbalanced problem

被引:8
|
作者
Guo, Huaping [1 ]
Diao, Xiaoyu [1 ]
Liu, Hongbing [1 ]
机构
[1] Xinyang Normal Univ, Sch Comp & Informat Technol, Xinyang, Peoples R China
基金
中国国家自然科学基金;
关键词
Undersampling; ensemble; rotation forest; imbalanced problem; SMOTE; ALGORITHMS;
D O I
10.3906/elk-1805-159
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As one of the most challenging and attractive issues in pattern recognition and machine learning, the imbalanced problem has attracted increasing attention. For two-class data, imbalanced data are characterized by the size of one class (majority class) being much larger than that of the other class (minority class), which makes the constructed models focus more on the majority class and ignore or even misclassify the examples of the minority class. The undersampling-based ensemble, which learns individual classifiers from undersampled balanced data, is an effective method to cope with the class-imbalance data. The problem in this method is that the size of the dataset to train each classifier is notably small; thus, how to generate individual classifiers with high performance from the limited data is a key to the success of the method. In this paper, rotation forest (an ensemble method) is used to improve the performance of the undersampling-based ensemble on the imbalanced problem because rotation forest has higher performance than other ensemble methods such as bagging, boosting, and random forest, particularly for small-sized data. In addition, rotation forest is more sensitive to the sampling technique than some robust methods including SVM and neural networks; thus, it is easier to create individual classifiers with diversity using rotation forest. Two versions of the improved undersampling-based ensemble methods are implemented: 1) undersampling subsets from the majority class and learning each classifier using the rotation forest on the data obtained by combing each subset with the minority class and 2) similarly to the first method, with the exception of removing the majority class examples that are correctly classified with high confidence after learning each classifier for further consideration. The experimental results show that the proposed methods show significantly better performance on measures of recall, g-mean, f-measure, and AUC than other state-of-the-art methods on 30 datasets with various data distributions and different imbalance ratios.
引用
收藏
页码:1371 / 1386
页数:16
相关论文
共 50 条
  • [1] Embedding Undersampling Rotation Forest for Imbalanced Problem
    Guo, Huaping
    Diao, Xiaoyu
    Liu, Hongbing
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2018, 2018
  • [2] MUEnsemble: Multi-ratio Undersampling-Based Ensemble Framework for Imbalanced Data
    Komamizu, Takahiro
    Uehara, Risa
    Ogawa, Yasuhiro
    Toyama, Katsuhiko
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2020, PT II, 2020, 12392 : 213 - 228
  • [3] Recursive undersampling-based decision boundary alignment for imbalanced radiology image
    Kang, Jaewoong
    Sohn, Mye
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 425 - 428
  • [4] Improving Random Forest and Rotation Forest for highly imbalanced datasets
    Su, Chong
    Ju, Shenggen
    Liu, Yiguang
    Yu, Zhonghua
    INTELLIGENT DATA ANALYSIS, 2015, 19 (06) : 1409 - 1432
  • [5] Overlap-Based Undersampling for Improving Imbalanced Data Classification
    Vuttipittayamongkol, Pattaramon
    Elyan, Eyad
    Petrovski, Andrei
    Jayne, Chrisina
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2018, PT I, 2018, 11314 : 689 - 697
  • [6] Hashing-Based Undersampling Ensemble for Imbalanced Pattern Classification Problems
    Ng, Wing W. Y.
    Xu, Shichao
    Zhang, Jianjun
    Tian, Xing
    Rong, Tongwen
    Kwong, Sam
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1269 - 1279
  • [7] Using Fuzzy Undersampling and Fuzzy PCA to Improve Imbalanced Classification through Rotation Forest Algorithm
    Hosseinzadeh, Mehrdad
    Eftekhari, Mahdi
    CSSE 2015 20TH INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING, 2015,
  • [8] A Novel Selective Ensemble Algorithm for Imbalanced Data Classification Based on Exploratory Undersampling
    Yin, Qing-Yan
    Zhang, Jiang-She
    Zhang, Chun-Xia
    Ji, Nan-Nan
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014
  • [9] WEIGHTED ENSEMBLE OF DIVERSIFIED SENSITIVITY-BASED UNDERSAMPLING FOR IMBALANCED PATTERN CLASSIFICATION PROBLEMS
    Chai, Yulin
    Zhang, Jianjun
    Ng, Wing W. Y.
    PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 1, 2017, : 42 - 47
  • [10] Radial Undersampling-Based Interpolation Scheme for Multislice CSMRI Reconstruction Techniques
    Murad, Maria
    Jalil, Abdul
    Bilal, Muhammad
    Ikram, Shahid
    Ali, Ahmad
    Khan, Baber
    Mehmood, Khizer
    BIOMED RESEARCH INTERNATIONAL, 2021, 2021