Deep convolutional neural networks with genetic algorithm-based synthetic minority over-sampling technique for improved imbalanced data classification

被引:15
作者
Alex, Suja A. [1 ]
Nayahi, J. Jesu Vedha [2 ]
Kaddoura, Sanaa [3 ]
机构
[1] St Xaviers Catholic Coll Engn, Informat Technol, Nagercoil, India
[2] Anna Univ, Comp Sci & Engn, Reg Campus Tirunelveli, Tirunelveli, India
[3] Zayed Univ, Zayed, U Arab Emirates
关键词
Imbalanced data; Feature selection; Genetic Algorithm; SMOTE; Deep Learning; Convolutional Neural Network; FEATURE-SELECTION; SMOTE; INFORMATION; IMPACT;
D O I
10.1016/j.asoc.2024.111491
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imbalanced data classification presents a challenge in machine learning, inducing biased model learning. Moreover, data dimensionality poses another challenge as it highly impacts classifier performance. This paper proposes a new deep-learning method that combines feature selection with oversampling to address these challenges. The proposed approach, GA-SMOTE-DCNN, integrates a genetic algorithm (GA) for feature selection, SMOTE for oversampling, and a deep 1D-convolutional neural network (DCNN) for classification. This study reveals that pre-splitting the data into training and testing sets before applying SMOTE results in higher accuracy, showing an improvement in accuracy ranging between 1.94% and 3.98% compared to post-SMOTE splitting for each dataset. This method achieved accuracy rates of 86.81% for the Balance Scale dataset, 86.15% for the Oil Spill dataset, 89.21% for the Yeast dataset, 91.32% for the Mammography dataset, 88.23% for the Australian credit dataset, and 89.53% for the German Credit dataset when compared with benchmark methods, underscoring its significance in tackling high-dimensional and imbalanced data classification problems. This method demonstrates scalability in effectively addressing challenges associated with high-dimensional and imbalanced data classification across various domains.
引用
收藏
页数:16
相关论文
共 69 条
[1]  
Alex S.A., 2022, 10 INT C EM TRENDS E, P1, DOI [10.1109/ICETET-SIP-2254415.2022.9791638, DOI 10.1109/ICETET-SIP-2254415.2022.9791638]
[2]   Classification of Imbalanced Data Using SMOTE and AutoEncoder Based Deep Convolutional Neural Network [J].
Alex, Suja A. ;
Nayahi, J. Jesu Vedha .
INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2023, 31 (03) :437-469
[3]   Deep LSTM Model for Diabetes Prediction with Class Balancing by SMOTE [J].
Alex, Suja A. ;
Jhanjhi, N. Z. ;
Humayun, Mamoona ;
Ibrahim, Ashraf Osman ;
Abulfaraj, Anas W. .
ELECTRONICS, 2022, 11 (17)
[4]   Deep convolutional neural network for diabetes mellitus prediction [J].
Alex, Suja A. ;
Nayahi, J. Jesu Vedha ;
Shine, H. ;
Gopirekha, Vaisshalli .
NEURAL COMPUTING & APPLICATIONS, 2022, 34 (02) :1319-1327
[5]   Deep Incremental Learning for Big Data Stream Analytics [J].
Alex, Suja A. ;
Nayahi, J. Jesu Vedha .
PROCEEDING OF THE INTERNATIONAL CONFERENCE ON COMPUTER NETWORKS, BIG DATA AND IOT (ICCBI-2018), 2020, 31 :600-614
[6]   Detection of Fake Job Postings by Utilizing Machine Learning and Natural Language Processing Approaches [J].
Amaar, Aashir ;
Aljedaani, Wajdi ;
Rustam, Furqan ;
Ullah, Saleem ;
Rupapara, Vaibhav ;
Ludi, Stephanie .
NEURAL PROCESSING LETTERS, 2022, 54 (03) :2219-2247
[7]   Improving BCI-based emotion recognition by combining EEG feature selection and kernel classifiers [J].
Atkinson, John ;
Campos, Daniel .
EXPERT SYSTEMS WITH APPLICATIONS, 2016, 47 :35-41
[8]   Blood-based gene-expression biomarkers identification for the non-invasive diagnosis of Parkinson's disease using two-layer hybrid feature selection [J].
Augustine, Jisha ;
Jereesh, A. S. .
GENE, 2022, 823
[9]  
Brownlee J., 2019, Mammography dataset
[10]   Thyroid Disease Prediction Using Selective Features and Machine Learning Techniques [J].
Chaganti, Rajasekhar ;
Rustam, Furqan ;
De la Torre Diez, Isabel ;
Vidal Mazon, Juan Luis ;
Lili Rodriguez, Carmen ;
Ashraf, Imran .
CANCERS, 2022, 14 (16)