Exploratory parallel hybrid sampling framework for imbalanced data classification

被引:0
作者
Zheng, Ming [3 ,4 ]
Zhao, Zhuo [3 ]
Wang, Fei [3 ]
Hu, Xiaowen [3 ]
Xu, Sheng [3 ,4 ]
Li, Wanggen [3 ]
Li, Tong [1 ,2 ]
机构
[1] Yunnan Agr Univ, Big Data Sch, Kunming 650201, Peoples R China
[2] Yunnan Agr Univ, Key Lab Crop Prod & Smart Agr Yunnan Prov, Kunming 650201, Peoples R China
[3] Anhui Normal Univ, Sch Comp & Informat, Wuhu 241002, Peoples R China
[4] Anhui Prov Key Lab Ind Intelligence Data Secur, Wuhu 241002, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced data; Oversampling; Undersampling; Parallel hybrid sampling framework; Serial hybrid sampling frameworks; ENSEMBLE; SMOTE;
D O I
10.1016/j.engappai.2024.109428
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Current engineering application scenarios often face the challenge of imbalanced data, hybrid sampling is an effective method to deal with the imbalanced data classification issue, which can avoid the issues of overfitting and mistakenly deleting useful majority samples when using oversampling approach and undersampling approach alone. However, at present most of the hybrid sampling approaches are implemented serially, and the implementation of oversampling and undersampling approaches alone will cause mutual interference and influence between them. This study proposes a parallel hybrid sampling framework based on the idea of parallel engineering and theoretically analyzes its superiority. The experimental results show that when applied to five classification algorithms with three performance evaluation metrics,the proposed framework outperforms the two mainstream hybrid sampling frameworks. Moreover, the proposed framework can effectively reduce the time consumption of hybrid sampling process.
引用
收藏
页数:13
相关论文
共 47 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[3]   Surface defect detection methods for industrial products with imbalanced samples: A review of progress in the 2020s [J].
Bai, Dongxu ;
Li, Gongfa ;
Jiang, Du ;
Yun, Juntong ;
Tao, Bo ;
Jiang, Guozhang ;
Sun, Ying ;
Ju, Zhaojie .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 130
[4]   Do Preprocessing and Class Imbalance Matter to the Deep Image Classifiers for COVID-19 Detection? An Explainable Analysis [J].
Basu A. ;
Das S. ;
Mullick S.S. ;
Das S. .
IEEE Transactions on Artificial Intelligence, 2023, 4 (02) :229-241
[5]  
Batista G. E., 2004, ACM SIGKDD Explorations Newsletter, V6, P20, DOI [10.1145/1007730.1007735, DOI 10.1145/1007730.1007735]
[6]   ANALYSIS OF PROGRAMS FOR PARALLEL PROCESSING [J].
BERNSTEIN, AJ .
IEEE TRANSACTIONS ON ELECTRONIC COMPUTERS, 1966, EC15 (05) :757-+
[7]   A method for resampling imbalanced datasets in binary classification tasks for real-world problems [J].
Cateni, Silvia ;
Colla, Valentina ;
Vannucci, Marco .
NEUROCOMPUTING, 2014, 135 :32-41
[8]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[9]   A hybrid data-level ensemble to enable learning from highly imbalanced dataset [J].
Chen, Zhi ;
Duan, Jiang ;
Kang, Li ;
Qiu, Guoping .
INFORMATION SCIENCES, 2021, 554 :157-176
[10]  
Demsar J, 2006, J MACH LEARN RES, V7, P1