Self-paced ensemble and big data identification: a classification of substantial imbalance computational analysis

被引：0

作者：

Bano, Shahzadi ^{[1
]}

Zhi, Weimei ^{[1
]}

Qiu, Baozhi ^{[1
]}

Raza, Muhammad ^{[2
]}

Sehito, Nabila ^{[3
]}

Kamal, Mian Muhammad ^{[4
]}

Aldehim, Ghadah ^{[5
]}

Alruwais, Nuha ^{[6
]}

机构：

[1] Zhengzhou Univ, Sch Comp & Artificial Intelligence, 100 Sci Ave, Zhengzhou 450001, Peoples R China

[2] Xian Technol Univ, Xian, Peoples R China

[3] Zhengzhou Univ, Sch Elect Informat Engn, 100 Sci Ave, Zhengzhou 450001, Henan, Peoples R China

[4] Southeast Univ, Sch Elect Sci & Engn, Joint Int Res Lab Informat Display & Visualizat, Nanjing 210018, Peoples R China

[5] Princess Nourah Bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Informat Syst, POB 84428, Riyadh 11671, Saudi Arabia

[6] King Saud Univ, Coll Appl Studies & Community Serv, Dept Comp Sci & Engn, POB 22459, Riyadh 11495, Saudi Arabia

来源：

JOURNAL OF SUPERCOMPUTING | 2024年 / 80卷 / 07期

关键词：

Self-paced ensemble; Big data; Classification; Computational; Simulation; Substantial imbalance;

D O I：

10.1007/s11227-023-05828-6

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This research paper focuses on the challenges associated with learning classifiers from large-scale, highly imbalanced datasets prevalent in many real-world applications. Traditional algorithms learning often need better performance and high computational efficiency when dealing with imbalanced data. Factors such as class imbalance, noise, and class overlap make it demanding to learn effective classifiers. In this study, we propose a novel self-paced ensemble framework for classifying imbalanced data. The framework employs under-sampling to self-harmonize data hardness and build a robust ensemble. Extensive experimental testing demonstrates promising results in handling overlapping classes and skewed distributions while maintaining computational efficiency. The self-paced ensemble method addresses the challenges of high imbalance ratios, class overlap, and noise presence in large-scale imbalanced classification problems. By incorporating the knowledge of these challenges into our learning framework, we establish the concept of classification hardness distribution, and the self-paced ensemble is a revolutionary learning paradigm for massive imbalance categorization, capable of improving the performance of existing learning algorithms on imbalanced data and providing better results for future applications.

引用

页码：9848 / 9869

页数：22

共 50 条

[21] Intrusion detection based on ensemble learning for big data classification
Jemili, Farah
Meddeb, Rahma
Korbaa, Ouajdi
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (03): : 3771 - 3798
[22] Investigation on the use of ensemble learning and big data in crop identification
Ahmed, Sayed
Mahmoud, Amira S.
Farg, Eslam
Mohamed, Amany M.
Moustafa, Marwa S.
Abutaleb, Khaled
Saleh, Ahmed M.
AbdelRahman, Mohamed A. E.
AbdelSalam, Hisham M.
Arafat, Sayed M.
HELIYON, 2023, 9 (02)
[23] A Comprehensive Analysis of Classification Methods for Big Data Stream
Kaur, Amrinder
Kumar, Rakesh
ADVANCES IN COMPUTING AND INTELLIGENT SYSTEMS, ICACM 2019, 2020, : 213 - 222
[24] Predictive Analysis for Diabetes Using Big Data Classification
Rghioui, Amine
Oumnad, Abdelmajid
RECENT ADVANCES IN MATHEMATICS AND TECHNOLOGY, 2020, : 161 - 170
[25] An Efficient, Ensemble-Based Classification Framework for Big Medical Data
Khan, Firoz
Prasad, Balusupati Veera Venkata Siva
Syed, Salman Ali
Ashraf, Imran
Ramasamy, Lakshmana Kumar
BIG DATA, 2022, 10 (02) : 151 - 160
[26] Multiclass Self-Paced Motor Imagery Temporal Features Classification using Least-Square Support Vector Machine
Hamedi, M.
Salleh, Sh-H.
Ting, C. M.
Noor, A. B. Mohd
Rezazadeh, I. Mohammad
2014 IEEE 19TH INTERNATIONAL FUNCTIONAL ELECTRICAL STIMULATION SOCIETY ANNUAL CONFERENCE (IFESS), 2014,
[27] Empirical Analysis of Asymptotic Ensemble Learning for Big Data
Salloum, Salman
Huang, Joshua Zhexue
He, Yulin
2016 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES (BDCAT), 2016, : 8 - 17
[28] The Survey on Approaches to Efficient Clustering and Classification Analysis of Big Data
Gandhi, Bhagyashri S.
Deshpande, Leena A.
2016 INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2016,
[29] Big data in transportation: a systematic literature analysis and topic classification
Tzika-Kostopoulou, Danai
Nathanail, Eftihia
Kokkinos, Konstantinos
KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (08) : 5021 - 5046
[30] An Ensemble Random Forest Algorithm for Insurance Big Data Analysis
Lin, Weiwei
Wu, Ziming
Lin, Longxin
Wen, Angzhan
Li, Jin
IEEE ACCESS, 2017, 5 : 16568 - 16575

← 1 2 3 4 5 →