Imbalanced data classification using MapReduce and relief

被引:6
|
作者
Jedrzejowicz, Joanna [1 ]
Kostrzewski, Robert [1 ]
Neumann, Jakub [1 ]
Zakrzewska, Magdalena [1 ]
机构
[1] Univ Gdansk, Fac Math Phys & Informat, Inst Informat, PL-80308 Gdansk, Poland
关键词
Imbalanced data; classification; parallelization; feature selection;
D O I
10.1080/24751839.2018.1440454
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Classification of imbalanced data has been reported to require modification of standard classification algorithms and lately has attracted a lot of attention due to practical applications in industry, banking and finance. The aim of the paper is to examine algorithms known from literature when two modifications are introduced: MapReduce to parallelize computations and Relief to select most valuable attributes. Both modifications are needed in Big Data area. Also two new algorithms are considered.
引用
收藏
页码:217 / 230
页数:14
相关论文
共 50 条
  • [31] Using Genetic Algorithm to Improve Classification Accuracy on Imbalanced Data
    Cervantes, Jair
    Li, Xiaoou
    Yu, Wen
    2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 2659 - 2664
  • [32] Data reduction and stacking for imbalanced data classification
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 37 (06) : 7239 - 7249
  • [33] Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data
    Lopez, Victoria
    del Rio, Sara
    Manuel Benitez, Jose
    Herrera, Francisco
    FUZZY SETS AND SYSTEMS, 2015, 258 : 5 - 38
  • [34] Classification of Imbalanced Auction Fraud Data
    Ganguly, Swati
    Sadaoui, Samira
    ADVANCES IN ARTIFICIAL INTELLIGENCE, CANADIAN AI 2017, 2017, 10233 : 84 - 89
  • [35] Imbalanced Data Classification Based on Clustering
    Li, Hu
    Zou, Peng
    Han, Weihong
    Xia, Rongze
    COMPUTER-AIDED DESIGN, MANUFACTURING, MODELING AND SIMULATION III, 2014, 443 : 741 - 745
  • [36] Adaptive Oversampling for Imbalanced Data Classification
    Ertekin, Seyda
    INFORMATION SCIENCES AND SYSTEMS 2013, 2013, 264 : 261 - 269
  • [37] Graph Classification with Imbalanced Data Sets
    Xiao, Gang-Song
    Chen, Xiao-Yun
    2011 FIRST ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR), 2011, : 57 - 61
  • [38] Classification of Wine Quality with Imbalanced Data
    Hu, Gongzhu
    Xi, Tan
    Mohammed, Faraz
    Miao, Huaikou
    PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), 2016, : 1712 - 1717
  • [39] Classification of imbalanced data with transparent kernels
    Lee, KK
    Gunn, SR
    Harris, CJ
    Reed, PAS
    IJCNN'01: INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2001, : 2410 - 2415
  • [40] A Novel Model for Imbalanced Data Classification
    Yin, Jian
    Gan, Chunjing
    Zhao, Kaiqi
    Lin, Xuan
    Quan, Zhe
    Wang, Zhi-Jie
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 6680 - 6687