Self-paced ensemble and big data identification: a classification of substantial imbalance computational analysis

被引:0
|
作者
Bano, Shahzadi [1 ]
Zhi, Weimei [1 ]
Qiu, Baozhi [1 ]
Raza, Muhammad [2 ]
Sehito, Nabila [3 ]
Kamal, Mian Muhammad [4 ]
Aldehim, Ghadah [5 ]
Alruwais, Nuha [6 ]
机构
[1] Zhengzhou Univ, Sch Comp & Artificial Intelligence, 100 Sci Ave, Zhengzhou 450001, Peoples R China
[2] Xian Technol Univ, Xian, Peoples R China
[3] Zhengzhou Univ, Sch Elect Informat Engn, 100 Sci Ave, Zhengzhou 450001, Henan, Peoples R China
[4] Southeast Univ, Sch Elect Sci & Engn, Joint Int Res Lab Informat Display & Visualizat, Nanjing 210018, Peoples R China
[5] Princess Nourah Bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Informat Syst, POB 84428, Riyadh 11671, Saudi Arabia
[6] King Saud Univ, Coll Appl Studies & Community Serv, Dept Comp Sci & Engn, POB 22459, Riyadh 11495, Saudi Arabia
关键词
Self-paced ensemble; Big data; Classification; Computational; Simulation; Substantial imbalance;
D O I
10.1007/s11227-023-05828-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This research paper focuses on the challenges associated with learning classifiers from large-scale, highly imbalanced datasets prevalent in many real-world applications. Traditional algorithms learning often need better performance and high computational efficiency when dealing with imbalanced data. Factors such as class imbalance, noise, and class overlap make it demanding to learn effective classifiers. In this study, we propose a novel self-paced ensemble framework for classifying imbalanced data. The framework employs under-sampling to self-harmonize data hardness and build a robust ensemble. Extensive experimental testing demonstrates promising results in handling overlapping classes and skewed distributions while maintaining computational efficiency. The self-paced ensemble method addresses the challenges of high imbalance ratios, class overlap, and noise presence in large-scale imbalanced classification problems. By incorporating the knowledge of these challenges into our learning framework, we establish the concept of classification hardness distribution, and the self-paced ensemble is a revolutionary learning paradigm for massive imbalance categorization, capable of improving the performance of existing learning algorithms on imbalanced data and providing better results for future applications.
引用
收藏
页码:9848 / 9869
页数:22
相关论文
共 50 条
  • [41] Identification of patients with atrial fibrillation: a big data exploratory analysis of the UK Biobank
    Oster, Julien
    Hopewell, Jemma C.
    Ziberna, Klemen
    Wijesurendra, Rohan
    Camm, Christian F.
    Casadei, Barbara
    Tarassenko, Lionel
    PHYSIOLOGICAL MEASUREMENT, 2020, 41 (02)
  • [42] Identification of crossed meters based on big data analysis
    Liu, Fei
    Yan, Yonghui
    Wang, Liming
    Yu, Wei
    Li, Xinjia
    Xu, Bo
    PROCEEDINGS OF 2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC 2019), 2019, : 445 - 449
  • [43] Self-paced learning based multi-kernel KRR for brain structure analysis in patients with different blood pressure levels
    Peng, Bo
    Yu, Xinying
    Ma, Xinwei
    Zhu, Jianbing
    Dai, Yakang
    2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 169 - 174
  • [44] A Distributed Arabic Text Classification Approach Using Latent Semantic Analysis for Big data
    Alazzam, Hadeel
    Alsmady, Abdulsalam
    PROCEEDINGS OF THE 2017 12TH INTERNATIONAL SCIENTIFIC AND TECHNICAL CONFERENCE ON COMPUTER SCIENCES AND INFORMATION TECHNOLOGIES (CSIT 2017), VOL. 1, 2017, : 58 - 61
  • [45] USING ENSEMBLE MARGIN TO EXPLORE ISSUES OF TRAINING DATA IMBALANCE AND MISLABELING ON LARGE AREA LAND COVER CLASSIFICATION
    Mellor, Andrew
    Boukir, Samia
    Haywood, Andrew
    Jones, Simon
    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 5067 - 5071
  • [46] Analysis of Bayesian optimization algorithms for big data classification based on Map Reduce framework
    Banchhor, Chitrakant
    Srinivasu, N.
    JOURNAL OF BIG DATA, 2021, 8 (01)
  • [47] Analysis of Bayesian optimization algorithms for big data classification based on Map Reduce framework
    Chitrakant Banchhor
    N. Srinivasu
    Journal of Big Data, 8
  • [48] Railway Maintenance Analysis based on big data and condition classification
    Song, Boyang
    Zhong, Yan
    Liu, Rengkui
    Wang, Futian
    ADVANCED CONSTRUCTION TECHNOLOGIES, 2014, 919-921 : 1134 - +
  • [49] Literature review and analysis on big data stream classification techniques
    Srivani, B.
    Sandhya, N.
    Rani, B. Padmaja
    INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS, 2020, 24 (03) : 205 - 215
  • [50] CLASSIFICATION ALGORITHMS FOR BIG DATA ANALYSIS, A MAP REDUCE APPROACH
    Ayma, V. A.
    Ferreira, R. S.
    Happ, P.
    Oliveira, D.
    Feitosaa, R.
    Costa, G.
    Plaza, A.
    Gamba, P.
    PIA15+HRIGI15 - JOINT ISPRS CONFERENCE, VOL. I, 2015, 40-3 (W2): : 17 - 21