Experimental evaluation of ensemble classifiers for imbalance in Big Data

被引:18
|
作者
Juez-Gil M. [1 ]
Arnaiz-González Á. [1 ]
Rodríguez J.J. [1 ]
García-Osorio C. [1 ]
机构
[1] Escuela Politécnica Superior, University of Burgos, Burgos
关键词
Big Data; Ensemble; Imbalance; Resampling; Spark; Unbalance;
D O I
10.1016/j.asoc.2021.107447
中图分类号
学科分类号
摘要
Datasets are growing in size and complexity at a pace never seen before, forming ever larger datasets known as Big Data. A common problem for classification, especially in Big Data, is that the numerous examples of the different classes might not be balanced. Some decades ago, imbalanced classification was therefore introduced, to correct the tendency of classifiers that show bias in favor of the majority class and that ignore the minority one. To date, although the number of imbalanced classification methods have increased, they continue to focus on normal-sized datasets and not on the new reality of Big Data. In this paper, in-depth experimentation with ensemble classifiers is conducted in the context of imbalanced Big Data classification, using two popular ensemble families (Bagging and Boosting) and different resampling methods. All the experimentation was launched in Spark clusters, comparing ensemble performance and execution times with statistical test results, including the newest ones based on the Bayesian approach. One very interesting conclusion from the study was that simpler methods applied to unbalanced datasets in the context of Big Data provided better results than complex methods. The additional complexity of some of the sophisticated methods, which appear necessary to process and to reduce imbalance in normal-sized datasets were not effective for imbalanced Big Data. © 2021 The Author(s)
引用
收藏
相关论文
共 50 条
  • [21] An Ensemble Random Forest Algorithm for Insurance Big Data Analysis
    Wu, Ziming
    Lin, Weiwei
    Zhang, Zilong
    Wen, Angzhan
    Lin, Longxin
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE) AND IEEE/IFIP INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (EUC), VOL 1, 2017, : 531 - 536
  • [22] An experimental survey on big data frameworks
    Inoubli, Wissem
    Aridhi, Sabeur
    Mezni, Haithem
    Maddouri, Mondher
    Nguifo, Engelbert Mephu
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 86 : 546 - 564
  • [23] Ensemble Meta Classifier with Sampling and Feature Selection for Data with Multiclass Imbalance Problem
    Sainin, Mohd Shamrie
    Alfred, Rayner
    Ahmad, Faudziah
    JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA, 2021, 20 (02): : 103 - 133
  • [24] Online sparse class imbalance learning on big data
    Maurya, Chandresh Kumar
    Toshniwal, Durga
    Venkoparao, Gopalan Vijendran
    NEUROCOMPUTING, 2016, 216 : 250 - 260
  • [25] Performance evaluation of oversampling algorithm: MAHAKIL using ensemble classifiers
    Arun C.
    Lakshmi C.
    International Journal of Business Intelligence and Data Mining, 2022, 22 (1-2) : 1 - 15
  • [26] A Method for Entity Resolution in High Dimensional Data Using Ensemble Classifiers
    Liu Yi
    Diao Xing-chun
    Cao Jian-jun
    Zhou Xing
    Shang Yu-ling
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2017, 2017
  • [27] The Effects of Random Undersampling with Simulated Class Imbalance for Big Data
    Hasanin, Tawfiq
    Khoshgoftaar, Taghi M.
    2018 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2018, : 70 - 79
  • [28] A survey on addressing high-class imbalance in big data
    Leevy J.L.
    Khoshgoftaar T.M.
    Bauder R.A.
    Seliya N.
    Journal of Big Data, 5 (1)
  • [29] Big data analytics, order imbalance and the predictability of stock returns
    Akyildirim, Erdinc
    Sensoy, Ahmet
    Gulay, Guzhan
    Corbet, Shaen
    Salari, Hajar Novin
    JOURNAL OF MULTINATIONAL FINANCIAL MANAGEMENT, 2021, 62
  • [30] Big Data Classification Using the SVM Classifiers with the Modified Particle Swarm Optimization and the SVM Ensembles
    Demidova, Liliya
    Nikulchev, Evgeny
    Sokolova, Yulia
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (05) : 294 - 312