Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks

被引:52
作者
Joloudari, Javad Hassannataj [1 ]
Marefat, Abdolreza [2 ]
Nematollahi, Mohammad Ali [3 ]
Oyelere, Solomon Sunday [4 ]
Hussain, Sadiq [5 ]
机构
[1] Univ Birjand, Fac Engn, Dept Comp Engn, Birjand 9717434765, Iran
[2] Islamic Azad Univ, Tech & Engn Fac, Dept Artificial Intelligence, South Tehran Branch, Tehran 1477893780, Iran
[3] Fasa Univ, Dept Comp Sci, Fasa 7461686131, Iran
[4] Lulea Univ Technol, Dept Comp Sci Elect & Space Engn, S-93187 Skelleftea, Sweden
[5] Dibrugarh Univ, Examinat Branch, Dibrugarh 786004, Assam, India
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 06期
关键词
imbalanced data; resampling; normalization; deep neural network; convolutional neural network; CORONARY-ARTERY-DISEASE; CLASSIFICATION; DIAGNOSIS; CLASSIFIERS; ALGORITHMS;
D O I
10.3390/app13064006
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Imbalanced Data (ID) is a problem that deters Machine Learning (ML) models from achieving satisfactory results. ID is the occurrence of a situation where the quantity of the samples belonging to one class outnumbers that of the other by a wide margin, making such models' learning process biased towards the majority class. In recent years, to address this issue, several solutions have been put forward, which opt for either synthetically generating new data for the minority class or reducing the number of majority classes to balance the data. Hence, in this paper, we investigate the effectiveness of methods based on Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs) mixed with a variety of well-known imbalanced data solutions meaning oversampling and undersampling. Then, we propose a CNN-based model in combination with SMOTE to effectively handle imbalanced data. To evaluate our methods, we have used KEEL, breast cancer, and Z-Alizadeh Sani datasets. In order to achieve reliable results, we conducted our experiments 100 times with randomly shuffled data distributions. The classification results demonstrate that the mixed Synthetic Minority Oversampling Technique (SMOTE)-Normalization-CNN outperforms different methodologies achieving 99.08% accuracy on the 24 imbalanced datasets. Therefore, the proposed mixed model can be applied to imbalanced binary classification problems on other real datasets.
引用
收藏
页数:34
相关论文
共 88 条
[1]   NE-Nu-SVC: A New Nested Ensemble Clinical Decision Support System for Effective Diagnosis of Coronary Artery Disease [J].
Abdar, Moloud ;
Acharya, U. Rajendra ;
Sarrafzadegan, Nizal ;
Makarenkov, Vladimir .
IEEE ACCESS, 2019, 7 :167605-167620
[2]   A new machine learning technique for an accurate diagnosis of coronary artery disease [J].
Abdar, Moloud ;
Ksiazek, Wojciech ;
Acharya, U. Rajendra ;
Tan, Ru-San ;
Makarenkov, Vladimir ;
Plawiak, Pawel .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2019, 179
[3]   Bagging Supervised Autoencoder Classifier for credit scoring [J].
Abdoli, Mahsan ;
Akbari, Mohammad ;
Shahrabi, Jamal .
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213
[4]  
Albawi S, 2017, I C ENG TECHNOL
[5]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[6]  
Alizadehsani R., 2012, EUR J SCI RES, V82, P542
[7]   Diagnosis of Coronary Artery Disease Using Cost-Sensitive Algorithms [J].
Alizadehsani, Roohallah ;
Hosseini, Mohammad Javad ;
Sani, Zahra Alizadeh ;
Ghandeharioun, Asma ;
Boghrati, Reihane .
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, :9-16
[8]   A data mining approach for diagnosis of coronary artery disease [J].
Alizadehsani, Roohallah ;
Habibi, Jafar ;
Hosseini, Mohammad Javad ;
Mashayekhi, Hoda ;
Boghrati, Reihane ;
Ghandeharioun, Asma ;
Bahadorian, Behdad ;
Sani, Zahra Alizadeh .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2013, 111 (01) :52-61
[9]  
[Anonymous], 2004, ACM SIGKDD Explorations Newsletter, DOI DOI 10.1145/1007730.1007736
[10]  
[Anonymous], 2009, Statistical Analysis and Data Mining: The ASA Data Science Journal, DOI DOI 10.1002/SAM.10061