ACO Resampling: Enhancing the performance of oversampling methods for class imbalance classification

被引:44
作者
Li, Min [1 ]
Xiong, An [1 ,2 ]
Wang, Lei [1 ,2 ]
Deng, Shaobo [1 ,2 ]
Ye, Jun [1 ,2 ]
机构
[1] Nanchang Inst Technol, Sch Informat Engn, Nanchang 330099, Jiangxi, Peoples R China
[2] Jiangxi Prov Key Lab Water Informat Cooperat Sens, Nanchang 330099, Jiangxi, Peoples R China
基金
美国国家科学基金会;
关键词
Machine learning; Imbalanced learning; Oversampling; Ant colony optimization resampling; CLASSIFIERS; PREDICTION; SELECTION; DATASETS; CANCER; TUMOR; SMOTE;
D O I
10.1016/j.knosys.2020.105818
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many sampling-based preprocessing methods have been proposed to solve the problem of unbalanced dataset classification. The fundamental principle of these methods is rebalancing an unbalanced dataset by a concrete strategy. Herein, we introduce a novel hybrid proposal named ant colony optimization resampling (ACOR) to overcome class imbalance classification. ACOR primarily includes two steps: first, it rebalances an imbalanced dataset by a specific oversampling algorithm; next, it finds an (sub)optimal subset from the balanced dataset by ant colony optimization. Unlike other oversampling techniques, ACOR does not focus on the mechanics of generating new samples. The main advantage of ACOR is that existing oversampling algorithms can be fully utilized and an ideal training set can be obtained by ant colony optimization. Therefore, ACOR can enhance the performance of existing oversampling algorithms. Experimental results on 18 real imbalanced datasets prove that ACOR yields significantly better results compared with four popular oversampling methods in terms of various assessment metrics, such as AUC, G-mean, and BACC. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:17
相关论文
共 59 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]  
[Anonymous], 2013, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods
[3]  
[Anonymous], 2013, CRITICAL VALUE TABLE
[4]   Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers Under Imbalanced Data Sets [J].
Aridas, Christos K. ;
Karlos, Stamatis ;
Kanas, Vasileios G. ;
Fazakis, Nikos ;
Kotsiantis, Sotiris B. .
IEEE ACCESS, 2020, 8 :2122-2133
[5]  
Asuncion A, 2007, UCI machine learning repository
[6]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[7]  
Batista G. E. A. P. A., 2004, ACM SIGKDD Explor Newsl, V6, P20, DOI [10.1145/1007730.1007735, DOI 10.1145/1007730.1007735]
[8]  
Chawla N., 2004, ACM SIGKDD explorations newsletter, V6, P1, DOI 10.1145/1007730.1007733
[9]   Automatically countering imbalance and its empirical relationship to cost [J].
Chawla, Nitesh V. ;
Cieslak, David A. ;
Hall, Lawrence O. ;
Joshi, Ajay .
DATA MINING AND KNOWLEDGE DISCOVERY, 2008, 17 (02) :225-252
[10]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)