A binary PSO-based ensemble under-sampling model for rebalancing imbalanced training data

被引:21
作者
Li, Jinyan [1 ]
Wu, Yaoyang [1 ]
Fong, Simon [1 ]
Tallon-Ballesteros, Antonio J. [3 ]
Yang, Xin-she [4 ]
Mohammed, Sabah [5 ]
Wu, Feng [2 ]
机构
[1] Univ Macau, Dept Comp & Informat Sci, Taipa, Macau Sar, Peoples R China
[2] Chinese Acad Sci, Zhuhai Inst Adv Technol, Zhuhai, Peoples R China
[3] Univ Seville, Dept Languages & Comp Syst, Seville, Spain
[4] Middlesex Univ, Sch Sci & Technol, Design Engn & Math, London, England
[5] Lakehead Univ, Dept Comp Sci, Thunder Bay, ON, Canada
关键词
Imbalanced classification; Ensemble; Under-sampling; Binary PSO; Multi-objective; Integrity; CLASSIFICATION; OPTIMIZATION; ALGORITHMS; SELECTION;
D O I
10.1007/s11227-021-04177-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Ensemble technique and under-sampling technique are both effective tools used for imbalanced dataset classification problems. In this paper, a novel ensemble method combining the advantages of both ensemble learning for biasing classifiers and a new under-sampling method is proposed. The under-sampling method is named Binary PSO instance selection; it gathers with ensemble classifiers to find the most suitable length and combination of the majority class samples to build a new dataset with minority class samples. The proposed method adopts multi-objective strategy, and contribution of this method is a notable improvement of the performances of imbalanced classification, and in the meantime guaranteeing a best integrity possible for the original dataset. We experimented the proposed method and compared its performance of processing imbalanced datasets with several other conventional basic ensemble methods. Experiment is also conducted on these imbalanced datasets using an improved version where ensemble classifiers are wrapped in the Binary PSO instance selection. According to experimental results, our proposed methods outperform single ensemble methods, state-of-the-art under-sampling methods, and also combinations of these methods with the traditional PSO instance selection algorithm.
引用
收藏
页码:7428 / 7463
页数:36
相关论文
共 45 条
  • [1] Optimal design of power-system stabilizers using particle swarm optimization
    Abido, MA
    [J]. IEEE TRANSACTIONS ON ENERGY CONVERSION, 2002, 17 (03) : 406 - 413
  • [2] Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
  • [3] SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation
    Blewitt, Marnie E.
    Gendrel, Anne-Valerie
    Pang, Zhenyi
    Sparrow, Duncan B.
    Whitelaw, Nadia
    Craig, Jeffrey M.
    Apedaile, Anwyn
    Hilton, Douglas J.
    Dunwoodie, Sally L.
    Brockdorff, Neil
    Kay, Graham F.
    Whitelaw, Emma
    [J]. NATURE GENETICS, 2008, 40 (05) : 663 - 669
  • [4] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [5] SMOTEBoost: Improving prediction of the minority class in boosting
    Chawla, NV
    Lazarevic, A
    Hall, LO
    Bowyer, KW
    [J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 : 107 - 119
  • [6] Chawla NV., 2004, ACM SIGKDD EXPLORATI, V6, P1, DOI [10.1145/1007730.1007733, DOI 10.1145/1007730.1007733]
  • [7] Chen C., 2004, USING RANDOM FOREST, V110, P1
  • [8] Ensemble feature selection using election methods and ranker clustering
    Drotar, Peter
    Gazda, Matej
    Vokorokos, Liberios
    [J]. INFORMATION SCIENCES, 2019, 480 : 365 - 380
  • [9] Drummond C, 2003, WORKSH LEARN IMB DAT, V11, P1
  • [10] Drummond C, 2000, ICML