CDSMOTE: class decomposition and synthetic minority class oversampling technique for imbalanced-data classification

被引:65
作者
Elyan, Eyad [1 ]
Moreno-Garcia, Carlos Francisco [1 ]
Jayne, Chrisina [2 ]
机构
[1] Robert Gordon Univ, Aberdeen, Scotland
[2] Teesside Univ, Middlesbrough, Cleveland, England
关键词
Machine learning; Class-imbalance; Classification; Undersampling; Oversampling; DATA SET; CLUSTER; PREDICTION; ALGORITHM; GAN;
D O I
10.1007/s00521-020-05130-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class-imbalanced datasets are common across several domains such as health, banking, security, and others. The dominance of majority class instances (negative class) often results in biased learning models, and therefore, classifying such datasets requires employing some methods to compact the problem. In this paper, we propose a new hybrid approach aiming at reducing the dominance of the majority class instances using class decomposition and increasing the minority class instances using an oversampling method. Unlike other undersampling methods, which suffer data loss, our method preserves the majority class instances, yet significantly reduces its dominance, resulting in a more balanced dataset and hence improving the results. A large-scale experiment using 60 public datasets was carried out to validate the proposed methods. The results across three standard evaluation metrics show the comparable and superior results with other common and state-of-the-art techniques.
引用
收藏
页码:2839 / 2851
页数:13
相关论文
共 49 条
[1]   Multiple Fake Classes GAN for Data Augmentation in Face Image Dataset [J].
Ali-Gombe, Adamu ;
Elyan, Eyad ;
Jayne, Chrisina .
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[2]   MFC-GAN: Class-imbalanced dataset classification using Multiple Fake Class Generative Adversarial Network [J].
Ali-Gombe, Adamu ;
Elyan, Eyad .
NEUROCOMPUTING, 2019, 361 :212-221
[3]  
[Anonymous], 2010, P 2010 INT JOINT C N, DOI DOI 10.1109/IJCNN.2010.5596486
[4]  
[Anonymous], 2017, IEEE T NEURAL NETWOR
[5]  
[Anonymous], 2013, P INT SCI C
[6]  
[Anonymous], 1997, P 14 INT C MACH LEAR
[7]  
[Anonymous], 2008, INT JT C NEUR NETW I
[8]  
Aranda R, 2003, A U-ARCHIT URBAN, P14
[9]   Strategies for learning in class imbalance problems [J].
Barandela, R ;
Sánchez, JS ;
García, V ;
Rangel, E .
PATTERN RECOGNITION, 2003, 36 (03) :849-851
[10]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425