ACO Resampling: Enhancing the performance of oversampling methods for class imbalance classification

被引:43
作者
Li, Min [1 ]
Xiong, An [1 ,2 ]
Wang, Lei [1 ,2 ]
Deng, Shaobo [1 ,2 ]
Ye, Jun [1 ,2 ]
机构
[1] Nanchang Inst Technol, Sch Informat Engn, Nanchang 330099, Jiangxi, Peoples R China
[2] Jiangxi Prov Key Lab Water Informat Cooperat Sens, Nanchang 330099, Jiangxi, Peoples R China
基金
美国国家科学基金会;
关键词
Machine learning; Imbalanced learning; Oversampling; Ant colony optimization resampling; CLASSIFIERS; PREDICTION; SELECTION; DATASETS; CANCER; TUMOR; SMOTE;
D O I
10.1016/j.knosys.2020.105818
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many sampling-based preprocessing methods have been proposed to solve the problem of unbalanced dataset classification. The fundamental principle of these methods is rebalancing an unbalanced dataset by a concrete strategy. Herein, we introduce a novel hybrid proposal named ant colony optimization resampling (ACOR) to overcome class imbalance classification. ACOR primarily includes two steps: first, it rebalances an imbalanced dataset by a specific oversampling algorithm; next, it finds an (sub)optimal subset from the balanced dataset by ant colony optimization. Unlike other oversampling techniques, ACOR does not focus on the mechanics of generating new samples. The main advantage of ACOR is that existing oversampling algorithms can be fully utilized and an ideal training set can be obtained by ant colony optimization. Therefore, ACOR can enhance the performance of existing oversampling algorithms. Experimental results on 18 real imbalanced datasets prove that ACOR yields significantly better results compared with four popular oversampling methods in terms of various assessment metrics, such as AUC, G-mean, and BACC. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:17
相关论文
共 59 条
  • [1] Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
    Alon, U
    Barkai, N
    Notterman, DA
    Gish, K
    Ybarra, S
    Mack, D
    Levine, AJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) : 6745 - 6750
  • [2] [Anonymous], 2013, CRITICAL VALUE TABLE
  • [3] Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers Under Imbalanced Data Sets
    Aridas, Christos K.
    Karlos, Stamatis
    Kanas, Vasileios G.
    Fazakis, Nikos
    Kotsiantis, Sotiris B.
    [J]. IEEE ACCESS, 2020, 8 : 2122 - 2133
  • [4] Asuncion Arthur, 2007, UCI machine learning repository
  • [5] MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning
    Barua, Sukarna
    Islam, Md. Monirul
    Yao, Xin
    Murase, Kazuyuki
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) : 405 - 425
  • [6] Batista G.E., 2004, ACM SIGKDD Explor. Newsl, V6, P20, DOI [DOI 10.1145/1007730.1007735, 10.1145/1007730.1007735]
  • [7] Automatically countering imbalance and its empirical relationship to cost
    Chawla, Nitesh V.
    Cieslak, David A.
    Hall, Lawrence O.
    Joshi, Ajay
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2008, 17 (02) : 225 - 252
  • [8] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [9] Chawla NV, 2004, ACM SIGKDD Explor. Newsl., V6, P1, DOI DOI 10.1145/1007730.1007733
  • [10] RAMOBoost: Ranked Minority Oversampling in Boosting
    Chen, Sheng
    He, Haibo
    Garcia, Edwardo A.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2010, 21 (10): : 1624 - 1642