Performance Analysis of Machine Learning Algorithms on Imbalanced Datasets Using SMOTE Technique

被引:0
作者
Kumar, Bala Santhosh [1 ]
Yadav, Pasupula Praveen [1 ]
Prasad, P. Penchala [1 ]
机构
[1] G Pulla Reddy Engn Coll, Comp Sci & Engn Dept, Kurnool, India
来源
PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE, MACHINE LEARNING AND APPLICATIONS, VOL 1, ICDSMLA 2023 | 2025年 / 1273卷
关键词
Machine Learning; SMOTE; Accuracy;
D O I
10.1007/978-981-97-8031-0_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This research paper aims to investigate the impact of using the Synthetic Minority Over-Sampling Technique (SMOTE) on the performance of several machine learning algorithms on imbalanced dataset. Imbalanced datasets are a common problem in many real-world applications, where one class is much more prevalent than the other class. This imbalance can lead to biased models, where the majority class dominates the model's predictions, and the minority class is often misclassified. To address this problem, we applied the SMOTE algorithm to generate synthetic data for the minority class. We evaluated the performance of several popular machine learning algorithms including logistic regression, decision trees, ensemble learning, support vector machines, Neural networks and Auto ML approach on both the original imbalanced dataset and the SMOTE-augmented dataset. The experimental results demonstrate that using SMOTE significantly improves the accuracy of the machine learning algorithms on imbalanced datasets. In conclusion, our research highlights the importance of considering the impact of imbalanced datasets on machine learning algorithm's performance and demonstrates the effectiveness of SMOTE in addressing this issue. Our results can be useful to practitioners working on imbalanced datasets to choose an appropriate machine-learning algorithm and to decide whether to use SMOTE to improve their model's performance.
引用
收藏
页码:147 / 156
页数:10
相关论文
共 29 条
  • [1] Adav S., 2019, J. Comput. Theor. Nanosci., V16, P2938
  • [2] Ang W., 2019, J. Comput. Theor. Nanosci., V16, P2285
  • [3] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [4] Chen S., 2019, J. Comput. Sci., V34, P50
  • [5] Ding Y., 2018, J. Intell. Fuzzy Syst., V35, P3043
  • [6] Douzas G., 2018, Expert Syst. Appl., V97, P205
  • [7] Du Q., 2020, Appl. Sci., V10
  • [8] Fernandez A., 2018, Knowl.-Based Syst., V161, P235
  • [9] SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary
    Fernandez, Alberto
    Garcia, Salvador
    Herrera, Francisco
    Chawla, Nitesh V.
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 61 : 863 - 905
  • [10] A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches
    Galar, Mikel
    Fernandez, Alberto
    Barrenechea, Edurne
    Bustince, Humberto
    Herrera, Francisco
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (04): : 463 - 484