Statistic deviation mode balancer (SDMB): A novel sampling algorithm for imbalanced data

被引:3
作者
Alimoradi, Mahmoud [1 ,2 ]
Sadeghi, Reza [3 ]
Daliri, Arman [4 ]
Zabihimayvan, Mahdieh [5 ]
机构
[1] Shafagh Inst Higher Educ, Dept Comp Engn, Tonekabon, Iran
[2] Islamic Azad Univ, Dept Comp Engn, Lahijan Branch, Lahijan, Iran
[3] Marist Coll, Dept Comp Sci, Poughkeepsie, NY USA
[4] Islamic Azad Univ, Dept Comp Engn, Karaj Branch, Karaj, Iran
[5] Cent Connecticut State Univ, Dept Comp Sci, New Britain, CT USA
关键词
Imbalanced Datasets; Classifier Performance; Data Sampling Techniques; Algorithmic Classification; Diagnostic Analytics; Data Balancing; FEATURE-SELECTION; CLASSIFICATION; SMOTE; PREDICTION;
D O I
10.1016/j.neucom.2025.129484
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In supervised learning, the efficacy of classifier algorithms is heavily dependent on the quality of data. Imbalanced datasets, where the class distribution is not uniform, pose a significant challenge, often leading to suboptimal classifier performance. Traditional approaches to rectifying this imbalance have relied on duplicating minority class instances or generating synthetic data, which can introduce bias or outliers. Our novel Statistic Deviation Mode Balancer (SDMB) algorithm addresses these issues by generating new instances that closely mirror the original data structure. Utilizing standard deviation and mode analysis, SDMB strategically synthesizes minority class data while avoiding the pitfalls of outlier generation. The result is a balanced dataset that facilitates more accurate learning by classifier algorithms. We have rigorously tested SDMB across various datasets and compared its performance against existing balancing methods. Our findings indicate that SDMB not only outperforms its counterparts but also significantly enhances the practical application of classifier algorithms in real-world datasets.
引用
收藏
页数:22
相关论文
共 61 条
[1]  
Ahmed Z, 2023, SN Computer Science, V5, DOI [10.1007/s42979-023-02357-0, 10.1007/s42979-023-02357-0, DOI 10.1007/S42979-023-02357-0]
[2]   Deep Neural Classification of Darknet Traffic [J].
Alimoradi, Mahmoud ;
Zabihimayvan, Mahdieh ;
Daliri, Arman ;
Sledzik, Ryan ;
Sadeghi, Reza .
ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT, 2022, 356 :105-114
[3]   Trees Social Relations Optimization Algorithm: A new Swarm-Based metaheuristic technique to solve continuous and discrete optimization problems [J].
Alimoradi, Mahmoud ;
Azgomi, Hossein ;
Asghari, Ali .
MATHEMATICS AND COMPUTERS IN SIMULATION, 2022, 194 :629-664
[4]   Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study [J].
Amin, Adnan ;
Anwar, Sajid ;
Adnan, Awais ;
Nawaz, Muhammad ;
Howard, Newton ;
Qadir, Junaid ;
Hawalah, Ahmad ;
Hussain, Amir .
IEEE ACCESS, 2016, 4 :7940-7957
[5]  
Anaissi A, 2020, 2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), P1257, DOI 10.1109/SSCI47803.2020.9308310
[6]   Task scheduling, resource provisioning, and load balancing on scientific workflows using parallel SARSA reinforcement learning agents and genetic algorithm [J].
Asghari, Ali ;
Sohrabi, Mohammad Karim ;
Yaghmaee, Farzin .
JOURNAL OF SUPERCOMPUTING, 2021, 77 (03) :2800-2828
[7]   Has the Credibility of the Social Sciences Been Credibly Destroyed? Reanalyzing the "Many Analysts, One Data Set" Project [J].
Auspurg, Katrin ;
Bruederl, Josef .
SOCIUS, 2021, 7
[8]   Anomaly analytics in data-driven machine learning applications [J].
Azimi, Shelernaz ;
Pahl, Claus .
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2025, 19 (01) :155-180
[9]   The study of under- and over-sampling methods' utility in analysis of highly imbalanced data on osteoporosis [J].
Bach, M. ;
Werner, A. ;
Zywiec, J. ;
Pluskiewicz, W. .
INFORMATION SCIENCES, 2017, 384 :174-190
[10]  
Beckmann M, 2015, Journal of Intelligent Learning Systems and Applications, V07, P104, DOI [10.4236/jilsa.2015.74010, 10.4236/jilsa.2015.74010, DOI 10.4236/JILSA.2015.74010, 10.4236/JILSA.2015.74010]