Imbalanced Data Classification Using SVM Based on Improved Simulated Annealing Featuring Synthetic Data Generation and Reduction

被引:4
作者
Hussein, Hussein Ibrahim [1 ]
Anwar, Said Amirul [2 ]
Ahmad, Muhammad Imran [2 ]
机构
[1] AlSafwa Univ Coll, Dept Comp Tech Engn, Karbala 56001, Iraq
[2] Univ Malaysia Perlis, Fac Elect Engn & Technol, Arau 02600, Perlis, Malaysia
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2023年 / 75卷 / 01期
关键词
Imbalanced data; resampling technique; data reduction; support vector machine; simulated annealing; SUPPORT VECTOR MACHINES; HYBRID METHOD; OPTIMIZATION; SMOTE; ALGORITHM;
D O I
10.32604/cmc.2023.036025
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced data classification is one of the major problems in machine learning. This imbalanced dataset typically has significant differ-ences in the number of data samples between its classes. In most cases, the per-formance of the machine learning algorithm such as Support Vector Machine (SVM) is affected when dealing with an imbalanced dataset. The classification accuracy is mostly skewed toward the majority class and poor results are exhibited in the prediction of minority-class samples. In this paper, a hybrid approach combining data pre-processing technique and SVM algorithm based on improved Simulated Annealing (SA) was proposed. Firstly, the data pre-processing technique which primarily aims at solving the resampling strategy of handling imbalanced datasets was proposed. In this technique, the data were first synthetically generated to equalize the number of samples between classes and followed by a reduction step to remove redundancy and duplicated data. Next is the training of a balanced dataset using SVM. Since this algorithm requires an iterative process to search for the best penalty parameter during training, an improved SA algorithm was proposed for this task. In this proposed improvement, a new acceptance criterion for the solution to be accepted in the SA algorithm was introduced to enhance the accuracy of the optimization process. Experimental works based on ten publicly available imbalanced datasets have demonstrated higher accuracy in the classification tasks using the proposed approach in comparison with the conventional implementation of SVM. Registering at an average of 89.65% of accuracy for the binary class classification has demonstrated the good performance of the proposed works.
引用
收藏
页码:547 / 564
页数:18
相关论文
共 38 条
[1]   A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios [J].
Alejo, R. ;
Valdovinos, R. M. ;
Garcia, V. ;
Pacheco-Sanchez, J. H. .
PATTERN RECOGNITION LETTERS, 2013, 34 (04) :380-388
[2]   MCBC-SMOTE: A Majority Clustering Model for Classification of Imbalanced Data [J].
Arora, Jyoti ;
Tushir, Meena ;
Sharma, Keshav ;
Mohan, Lalit ;
Singh, Aman ;
Alharbi, Abdullah ;
Alosaimi, Wael .
CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 73 (03) :4801-4817
[3]   The study of under- and over-sampling methods' utility in analysis of highly imbalanced data on osteoporosis [J].
Bach, M. ;
Werner, A. ;
Zywiec, J. ;
Pluskiewicz, W. .
INFORMATION SCIENCES, 2017, 384 :174-190
[4]   FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning [J].
Batuwita, Rukshan ;
Palade, Vasile .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2010, 18 (03) :558-571
[5]   Class prediction for high-dimensional class-imbalanced data [J].
Blagus, Rok ;
Lusa, Lara .
BMC BIOINFORMATICS, 2010, 11 :523
[6]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[7]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[8]   EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling [J].
Galar, Mikel ;
Fernandez, Alberto ;
Barrenechea, Edurne ;
Herrera, Francisco .
PATTERN RECOGNITION, 2013, 46 (12) :3460-3471
[9]   Understanding the apparent superiority of over-sampling through an analysis of local information for class-imbalanced data [J].
Garcia, V ;
Sanchez, J. S. ;
Marques, A., I ;
Florencia, R. ;
Rivera, G. .
EXPERT SYSTEMS WITH APPLICATIONS, 2020, 158
[10]   Robust optimization of SVM hyper-parameters for spillway type selection [J].
Gul, Enes ;
Alpaslan, Nuh ;
Emiroglu, M. Emin .
AIN SHAMS ENGINEERING JOURNAL, 2021, 12 (03) :2413-2423