Evolutionary Undersampling for Extremely Imbalanced Big Data Classification under Apache Spark

被引:0
|
作者
Triguero, I. [1 ]
Galar, M. [3 ]
Merino, D. [2 ]
Maillo, J. [2 ]
Bustince, H. [3 ]
Herrera, F. [2 ]
机构
[1] Univ Ghent, Dept Internal Med, B-9052 Zwijnaarde, Belgium
[2] Univ Granada, CITIC UGR, Dept Comp Sci & Artificial Intelligence, Granada 18071, Spain
[3] Univ Publ Navarra, Dept Automat & Computat, Campus Arrosadia S-N, Pamplona 31006, Spain
来源
2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC) | 2016年
关键词
MAPREDUCE; INSIGHT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The classification of datasets with a skewed class distribution is an important problem in data mining. Evolutionary undersampling of the majority class has proved to be a successful approach to tackle this issue. Such a challenging task may become even more difficult when the number of the majority class examples is very big. In this scenario, the use of the evolutionary model becomes unpractical due to the memory and time constrictions. Divide-and-conquer approaches based on the MapReduce paradigm have already been proposed to handle this type of problems by dividing data into multiple subsets. However, in extremely imbalanced cases, these models may suffer from a lack of density from the minority class in the subsets considered. Aiming at addressing this problem, in this contribution we provide a new big data scheme based on the new emerging technology Apache Spark to tackle highly imbalanced datasets. We take advantage of its in-memory operations to diminish the effect of the small sample size. The key point of this proposal lies in the independent management of majority and minority class examples, allowing us to keep a higher number of minority class examples in each subset. In our experiments, we analyze the proposed model with several data sets with up to 17 million instances. The results show the goodness of this evolutionary undersampling model for extremely imbalanced big data classification.
引用
收藏
页码:640 / 647
页数:8
相关论文
共 50 条
  • [1] Evolutionary Undersampling for Imbalanced Big Data Classification
    Triguero, I.
    Galar, M.
    Vluymans, S.
    Cornelis, C.
    Bustince, H.
    Herrera, F.
    Saeys, Y.
    2015 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2015, : 715 - 722
  • [2] A First Attempt on Global Evolutionary Undersampling for Imbalanced Big Data
    Triguero, I.
    Galar, M.
    Bustince, H.
    Herrera, F.
    2017 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2017, : 2054 - 2061
  • [3] Efficient hybrid oversampling and intelligent undersampling for imbalanced big data classification
    Vairetti, Carla
    Assadi, Jose Luis
    Maldonado, Sebastian
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 246
  • [4] A Hybrid Surrogate Model for Evolutionary Undersampling in Imbalanced Classification
    Le, Hoang Lam
    Landa-Silva, Dario
    Galar, Mikel
    Garcia, Salvador
    Triguero, I
    2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
  • [5] Evolutionary Undersampling for Classification with Imbalanced Datasets: Proposals and Taxonomy
    Garcia, Salvador
    Herrera, Francisco
    EVOLUTIONARY COMPUTATION, 2009, 17 (03) : 275 - 306
  • [6] An Iterative Undersampling of Extremely Imbalanced Data Using CSVM
    Lee, Jong Bum
    Lee, Jee-Hyong
    SEVENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2014), 2015, 9445
  • [7] A distributed evolutionary multivariate discretizer for Big Data processing on Apache Spark
    Ramirez-Gallego, S.
    Garcia, S.
    Benitez, J. M.
    Herrera, F.
    SWARM AND EVOLUTIONARY COMPUTATION, 2018, 38 : 240 - 250
  • [8] Multi-class imbalanced big data classification on Spark
    Sleeman, William C.
    Krawczyk, Bartosz
    KNOWLEDGE-BASED SYSTEMS, 2021, 212
  • [9] Big data classification using deep learning and apache spark architecture
    Anilkumar V. Brahmane
    B. Chaitanya Krishna
    Neural Computing and Applications, 2021, 33 : 15253 - 15266
  • [10] Big data classification using deep learning and apache spark architecture
    Brahmane, Anilkumar, V
    Krishna, B. Chaitanya
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (22): : 15253 - 15266