Self-paced ensemble and big data identification: a classification of substantial imbalance computational analysis

被引:0
|
作者
Bano, Shahzadi [1 ]
Zhi, Weimei [1 ]
Qiu, Baozhi [1 ]
Raza, Muhammad [2 ]
Sehito, Nabila [3 ]
Kamal, Mian Muhammad [4 ]
Aldehim, Ghadah [5 ]
Alruwais, Nuha [6 ]
机构
[1] Zhengzhou Univ, Sch Comp & Artificial Intelligence, 100 Sci Ave, Zhengzhou 450001, Peoples R China
[2] Xian Technol Univ, Xian, Peoples R China
[3] Zhengzhou Univ, Sch Elect Informat Engn, 100 Sci Ave, Zhengzhou 450001, Henan, Peoples R China
[4] Southeast Univ, Sch Elect Sci & Engn, Joint Int Res Lab Informat Display & Visualizat, Nanjing 210018, Peoples R China
[5] Princess Nourah Bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Informat Syst, POB 84428, Riyadh 11671, Saudi Arabia
[6] King Saud Univ, Coll Appl Studies & Community Serv, Dept Comp Sci & Engn, POB 22459, Riyadh 11495, Saudi Arabia
关键词
Self-paced ensemble; Big data; Classification; Computational; Simulation; Substantial imbalance;
D O I
10.1007/s11227-023-05828-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This research paper focuses on the challenges associated with learning classifiers from large-scale, highly imbalanced datasets prevalent in many real-world applications. Traditional algorithms learning often need better performance and high computational efficiency when dealing with imbalanced data. Factors such as class imbalance, noise, and class overlap make it demanding to learn effective classifiers. In this study, we propose a novel self-paced ensemble framework for classifying imbalanced data. The framework employs under-sampling to self-harmonize data hardness and build a robust ensemble. Extensive experimental testing demonstrates promising results in handling overlapping classes and skewed distributions while maintaining computational efficiency. The self-paced ensemble method addresses the challenges of high imbalance ratios, class overlap, and noise presence in large-scale imbalanced classification problems. By incorporating the knowledge of these challenges into our learning framework, we establish the concept of classification hardness distribution, and the self-paced ensemble is a revolutionary learning paradigm for massive imbalance categorization, capable of improving the performance of existing learning algorithms on imbalanced data and providing better results for future applications.
引用
收藏
页码:9848 / 9869
页数:22
相关论文
共 50 条
  • [21] Intrusion detection based on ensemble learning for big data classification
    Jemili, Farah
    Meddeb, Rahma
    Korbaa, Ouajdi
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (03): : 3771 - 3798
  • [22] Investigation on the use of ensemble learning and big data in crop identification
    Ahmed, Sayed
    Mahmoud, Amira S.
    Farg, Eslam
    Mohamed, Amany M.
    Moustafa, Marwa S.
    Abutaleb, Khaled
    Saleh, Ahmed M.
    AbdelRahman, Mohamed A. E.
    AbdelSalam, Hisham M.
    Arafat, Sayed M.
    HELIYON, 2023, 9 (02)
  • [23] A Comprehensive Analysis of Classification Methods for Big Data Stream
    Kaur, Amrinder
    Kumar, Rakesh
    ADVANCES IN COMPUTING AND INTELLIGENT SYSTEMS, ICACM 2019, 2020, : 213 - 222
  • [24] Predictive Analysis for Diabetes Using Big Data Classification
    Rghioui, Amine
    Oumnad, Abdelmajid
    RECENT ADVANCES IN MATHEMATICS AND TECHNOLOGY, 2020, : 161 - 170
  • [25] An Efficient, Ensemble-Based Classification Framework for Big Medical Data
    Khan, Firoz
    Prasad, Balusupati Veera Venkata Siva
    Syed, Salman Ali
    Ashraf, Imran
    Ramasamy, Lakshmana Kumar
    BIG DATA, 2022, 10 (02) : 151 - 160
  • [26] Multiclass Self-Paced Motor Imagery Temporal Features Classification using Least-Square Support Vector Machine
    Hamedi, M.
    Salleh, Sh-H.
    Ting, C. M.
    Noor, A. B. Mohd
    Rezazadeh, I. Mohammad
    2014 IEEE 19TH INTERNATIONAL FUNCTIONAL ELECTRICAL STIMULATION SOCIETY ANNUAL CONFERENCE (IFESS), 2014,
  • [27] Empirical Analysis of Asymptotic Ensemble Learning for Big Data
    Salloum, Salman
    Huang, Joshua Zhexue
    He, Yulin
    2016 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES (BDCAT), 2016, : 8 - 17
  • [28] The Survey on Approaches to Efficient Clustering and Classification Analysis of Big Data
    Gandhi, Bhagyashri S.
    Deshpande, Leena A.
    2016 INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2016,
  • [29] Big data in transportation: a systematic literature analysis and topic classification
    Tzika-Kostopoulou, Danai
    Nathanail, Eftihia
    Kokkinos, Konstantinos
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (08) : 5021 - 5046
  • [30] An Ensemble Random Forest Algorithm for Insurance Big Data Analysis
    Lin, Weiwei
    Wu, Ziming
    Lin, Longxin
    Wen, Angzhan
    Li, Jin
    IEEE ACCESS, 2017, 5 : 16568 - 16575