Self-paced ensemble and big data identification: a classification of substantial imbalance computational analysis

被引:0
|
作者
Bano, Shahzadi [1 ]
Zhi, Weimei [1 ]
Qiu, Baozhi [1 ]
Raza, Muhammad [2 ]
Sehito, Nabila [3 ]
Kamal, Mian Muhammad [4 ]
Aldehim, Ghadah [5 ]
Alruwais, Nuha [6 ]
机构
[1] Zhengzhou Univ, Sch Comp & Artificial Intelligence, 100 Sci Ave, Zhengzhou 450001, Peoples R China
[2] Xian Technol Univ, Xian, Peoples R China
[3] Zhengzhou Univ, Sch Elect Informat Engn, 100 Sci Ave, Zhengzhou 450001, Henan, Peoples R China
[4] Southeast Univ, Sch Elect Sci & Engn, Joint Int Res Lab Informat Display & Visualizat, Nanjing 210018, Peoples R China
[5] Princess Nourah Bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Informat Syst, POB 84428, Riyadh 11671, Saudi Arabia
[6] King Saud Univ, Coll Appl Studies & Community Serv, Dept Comp Sci & Engn, POB 22459, Riyadh 11495, Saudi Arabia
关键词
Self-paced ensemble; Big data; Classification; Computational; Simulation; Substantial imbalance;
D O I
10.1007/s11227-023-05828-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This research paper focuses on the challenges associated with learning classifiers from large-scale, highly imbalanced datasets prevalent in many real-world applications. Traditional algorithms learning often need better performance and high computational efficiency when dealing with imbalanced data. Factors such as class imbalance, noise, and class overlap make it demanding to learn effective classifiers. In this study, we propose a novel self-paced ensemble framework for classifying imbalanced data. The framework employs under-sampling to self-harmonize data hardness and build a robust ensemble. Extensive experimental testing demonstrates promising results in handling overlapping classes and skewed distributions while maintaining computational efficiency. The self-paced ensemble method addresses the challenges of high imbalance ratios, class overlap, and noise presence in large-scale imbalanced classification problems. By incorporating the knowledge of these challenges into our learning framework, we establish the concept of classification hardness distribution, and the self-paced ensemble is a revolutionary learning paradigm for massive imbalance categorization, capable of improving the performance of existing learning algorithms on imbalanced data and providing better results for future applications.
引用
收藏
页码:9848 / 9869
页数:22
相关论文
共 50 条
  • [11] Experimental evaluation of ensemble classifiers for imbalance in Big Data
    Juez-Gil M.
    Arnaiz-González Á.
    Rodríguez J.J.
    García-Osorio C.
    Applied Soft Computing, 2021, 108
  • [12] Complex Scene Classification of PoLSAR Imagery Based on a Self-Paced Learning Approach
    Chen, Wenshuai
    Gou, Shuiping
    Wang, Xinlin
    Jiao, Licheng
    Jiao, Changzhe
    Zare, Alina
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2018, 11 (12) : 4818 - 4825
  • [13] Classification of PolSAR Images Using Multilayer Autoencoders and a Self-Paced Learning Approach
    Chen, Wenshuai
    Gou, Shuiping
    Wang, Xinlin
    Li, Xiaofeng
    Jiao, Licheng
    REMOTE SENSING, 2018, 10 (01)
  • [14] From Big to Smart Data: Iterative ensemble filter for noise filtering in Big Data classification
    Garcia-Gil, Diego
    Luque-Sanchez, Francisco
    Luengo, Julian
    Garcia, Salvador
    Herrera, Francisco
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2019, 34 (12) : 3260 - 3274
  • [15] SPE-SHAP: Self-paced ensemble with Shapley additive explanation for the analysis of aviation turbulence triggered by wind shear events
    Khattak, Afaq
    Chan, Pak-wai
    Zhang, Jianping
    Chen, Feng
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 254
  • [16] Cost-Sensitive Self-Paced Learning With Adaptive Regularization for Classification of Image Time Series
    Li, Hao
    Li, Jianzhao
    Zhao, Yue
    Gong, Maoguo
    Zhang, Yujing
    Liu, Tongfei
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 11713 - 11727
  • [17] Learning from feedback training data at a self-paced brain-computer interface
    Zhang, Haihong
    Liyanage, Sidath Ravindra
    Wang, Chuanchu
    Guan, Cuntai
    JOURNAL OF NEURAL ENGINEERING, 2011, 8 (04)
  • [18] Hybrid Firefly Optimised Ensemble Classification for Drifting Data Streams with Imbalance
    Pepsi, M. Blessa Binolin
    Kumar, N. Senthil
    KNOWLEDGE-BASED SYSTEMS, 2024, 288
  • [19] NIRS Data Augmentation Technique to Detect Hemodynamic Peaks During Self-Paced Motor Imagery
    Phillips, V. Zephaniah
    Paik, Seung-Ho
    Lee, Seung-Hyun
    Choi, Eun-Jeong
    Kim, Beop-Min
    IEEE ACCESS, 2023, 11 : 37313 - 37323
  • [20] MSPL: Multimodal Self-Paced Learning for Multi-Omics Feature Selection and Data Integration
    Yang, Zi-Yi
    Xia, Liang-Yong
    Zhang, Hui
    Liang, Yong
    IEEE ACCESS, 2019, 7 : 170513 - 170524