Review of Classification Methods on Unbalanced Data Sets

被引:121
作者
Wang, Le [1 ]
Han, Meng [1 ]
Li, Xiaojuan [1 ]
Zhang, Ni [1 ]
Cheng, Haodong [1 ]
机构
[1] North Minzu Univ, Sch Comp Sci & Engn, Yinchuan 750021, Ningxia, Peoples R China
关键词
Classification algorithms; Sampling methods; Support vector machines; Security; Safety; Deep learning; Boosting; Unbalanced data sets; classification; sampling methods; algorithm level; feature level; FEATURE-SELECTION; DEFECT PREDICTION; SUPPORT; ENSEMBLE;
D O I
10.1109/ACCESS.2021.3074243
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper studies the classification of unbalanced data sets. First, this kind of data sets is briefly introduced, and then the classification methods of unbalanced data sets are analyzed in detail from different perspectives such as data sampling method, algorithm level, feature level, cost-sensitive function, and deep learning. In addition, the data sampling methods are divided into different technologies for introduction: unbalanced data set classification method based on synthetic minority over-sampling technology (SMOTE), support vector machine (SVM) technology, and k-nearest neighbor (KNN) technology, etc. Then, the advantages and disadvantages of these methods are compared. Finally, the evaluation criteria of the unbalanced data set classifier are summarized, and the future work directions are prospected and summarized.
引用
收藏
页码:64606 / 64628
页数:23
相关论文
共 103 条
  • [1] Alqatawna J., 2015, Int. J. Commun. Network Syst. Sci, V8, P118
  • [2] [Anonymous], SIGKDD Explor. Newsl, DOI DOI 10.1145/1007730.1007738
  • [3] New applications of ensembles of classifiers
    Barandela, R
    Sánchez, JS
    Valdovinos, RM
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2003, 6 (03) : 245 - 256
  • [4] Batista GE., 2004, ACM SIGKDD EXPL NEWS, V6, P20, DOI [DOI 10.1145/1007730.1007735, 10.1145/1007730.1007735]
  • [5] Bikku Thulasi, 2019, International Journal of Business Intelligence and Data Mining, V14, P25
  • [6] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [7] Brown G., 2005, Information Fusion, V6, P5, DOI 10.1016/j.inffus.2004.04.004
  • [8] DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique
    Bunkhumpornpat, Chumphol
    Sinapiromsaran, Krung
    Lursinsap, Chidchanok
    [J]. APPLIED INTELLIGENCE, 2012, 36 (03) : 664 - 684
  • [9] Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43
  • [10] Chawla N.V., 2004, ACM SIGKDD Explor. Newsl., V6, P1, DOI DOI 10.1145/1007730.1007733