A novel synthetic minority oversampling technique based on relative and absolute densities for imbalanced classification

被引:11
作者
Liu, Ruijuan [1 ]
机构
[1] Chongqing Jianzhu Coll, Dept Publ Course, Chongqing 400072, Peoples R China
基金
英国科研创新办公室;
关键词
Class-imbalance learning; Class-imbalance classification; Oversampling; K nearest neighbors; Relative density; BORDERLINE-SMOTE; SAMPLING METHOD; ALGORITHM;
D O I
10.1007/s10489-022-03512-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning a classifier from class-imbalance data is an important challenge. Among the existing solutions, SMOTE has received great praise and features an extensive range of practical applications. However, SMOTE and its extensions usually degrade due to noise generation and within-class imbalances. Although multiple variations of SMOTE are developed, few of them can solve the above problems at the same time. Besides, many improvements of SMOTE are based on advanced models with introducing external parameters. To solve imbalances between and within classes while overcoming noise generation, a novel synthetic minority oversampling technique based on relative and absolute densities is proposed. First, a novel noise filter based on relative density is proposed to remove noise and smooth class boundary. Second, sparsity and boundary weights are proposed and calculated by relative and absolute densities, respectively. Third, normalized weights based on absolute and sparse weights are proposed to generate more synthetic minority class samples in the class boundary and sparse regions. The main advantages of the proposed algorithm are that: (a) It can effectively avoid noise generation while removing noise and smoothing class the boundary in original data. (b) It generates more synthetic samples in class boundaries and sparse regions; (c) No additional parameters are introduced. Intensive experiments prove that SMOTE-RD outperforms 7 popular oversampling methods in average AUC, average F-measure and average G-mean on real data sets with the acceptable time cost.
引用
收藏
页码:786 / 803
页数:18
相关论文
共 43 条
  • [1] Alqatawna J., 2015, Int. J. Commun. Network Syst. Sci, V8, P118, DOI 10.4236/ijcns.2015.85014
  • [2] Batista GEAPA., 2004, ACM SIGKDD EXPL NEWS, V6, P20, DOI [DOI 10.1145/1007730.1007735, 10.1145/1007730.1007735, 10.1145/1007730.1007735.2]
  • [3] Breiman L., 1984, CLASSIFICATION REGRE, V40, P358, DOI [10.2307/2530946, DOI 10.1002/WIDM.8]
  • [4] Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43
  • [5] SMOTEBoost: Improving prediction of the minority class in boosting
    Chawla, NV
    Lazarevic, A
    Hall, LO
    Bowyer, KW
    [J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 : 107 - 119
  • [6] RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise
    Chen, Baiyun
    Xia, Shuyin
    Chen, Zizhong
    Wang, Binggui
    Wang, Guoyin
    [J]. INFORMATION SCIENCES, 2021, 553 : 397 - 428
  • [7] A concurrency control algorithm for nearest neighbor query
    Chen, JK
    Chin, YH
    [J]. INFORMATION SCIENCES, 1999, 114 (1-4) : 187 - 204
  • [8] Demsar J, 2006, J MACH LEARN RES, V7, P1
  • [9] Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE
    Douzas, Georgios
    Bacao, Fernando
    Last, Felix
    [J]. INFORMATION SCIENCES, 2018, 465 : 1 - 20
  • [10] A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance
    Elreedy, Dina
    Atiya, Amir F.
    [J]. INFORMATION SCIENCES, 2019, 505 : 32 - 64